03:30HdkR: Nice, I love hardware tests that test behaviour of instructions
08:38marysaka[d]: Amazing!
10:19f_: dwfreed: actually if the og message is multiline their regex that is supposed to strip the fallback doesn't match so it sends the original message twice before the reply
10:20f_: I fixed it in my MR to m-a-i, along with some other problems, over a month ago, but they never reviewed it 🙃
10:21f_: karolherbst: And by the way, IRCv3 does have a (draft) specification about reply-to.
15:12gfxstrand[d]: HdkR: Yeah, there are some ops with very subtle behaviors that aren't at all obvious and aren't something you can figure out from reading the PTX docs.
15:13gfxstrand[d]: My first crack at that was like a year ago when I was trying to figure out the semantics of `iadd3`. The way the overflow bits work is not at all obvious.
15:14gfxstrand[d]: This second attempt (which I'm going to land once we sort out the CI build) was motivated by not having a clue how `shf` works. It kinda does what it says on the tin but also very much not.
15:18gfxstrand[d]: And the way I built it is such that we can also get constant folding out of it if we want to. IDK how much I care about constant-folding in a back-end but it's there if we want it.
15:19HdkR: Interesting, I don't remember shf having any quirks, but maybe it does and I didn't hit it. Good to have the infrastructure regardless :D
15:25gfxstrand[d]: It's less that it has quirks and more that the implications of the specified type aren't clear.
15:26gfxstrand[d]: It always does a 64-bit shift. It's just that the type specifier affects whether or not right-shift sign-extends and affects how the shift value is clamped/masked.
15:26gfxstrand[d]: Which certainly makes sense. It's just not obvious.
16:37HdkR: Ah interesting, I don't think I used the signed variant, probably why I didn't notice it :D
19:42gfxstrand[d]: Also, `shf` is well and thoroughly cursed on Maxwell:
19:42gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30275/diffs?commit_id=0e32f242bba429bbc4aa3d40ef08f8d5aadc3305
19:46gfxstrand[d]: Piles of cases where bits are just ignored, one where the hardware just chokes. Good fun all around. 😅
19:47gfxstrand[d]: I should run the tests on Pascal just to sanity check that nothing randomly changed.
21:33gfxstrand[d]: Yup. Pascal and Maxwell match.
21:33gfxstrand[d]: This is why I love having good tests
23:02redsheep[d]: gfxstrand[d]: Hopefully that will make the Blackwell work easier when the time comes
23:08redsheep[d]: The public docs don't make the architecture sound terribly different aside from Datacenter and ai stuff, but naturally they wouldn't talk about much else yet
23:08gfxstrand[d]: Generally they seem to be making the hardware nicer as time goes on
23:09gfxstrand[d]: But yes, if we want to RE a specific thing, this gives us the tools to do it.