06:21demonkingofsalvation[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1340206433999913032/image.png?ex=67b18401&is=67b03281&hm=b9d7aba195a7874df9f1d93a53f6488307a58fe0136baa28fa31660a56fed1bd&
06:21demonkingofsalvation[d]: anyone ever seen this?
06:21demonkingofsalvation[d]: I'm running arch linux with a RTX 3050 Ti (mobile)
15:10snowycoder[d]: What's missing in NVK to enable shaderSharedInt64Atomics?
15:10snowycoder[d]: I tried to force-enable it and it seems to pass all u64 shared-atomic CTS, there doesn't seem to be any obvious missing code in NAK, what am I missing?
15:11mohamexiety[d]: snowycoder[d]: https://discord.com/channels/1033216351990456371/1034184951790305330/1319683267322183740 last time it was mentioned here this was brought up
15:12mohamexiety[d]: (no clue if this still holds though)
15:19snowycoder[d]: mohamexiety[d]: Thanks, but how can it still pass all related CTS?
15:19snowycoder[d]: Is it emitting normal load/stores and CTS can't compare the differences?
15:20mohamexiety[d]: not sure. it could have been implemented already since then, or the CTS didn't catch it :thonk:
16:53snowycoder[d]: Sorry to bother you all on a weekend, but do you need a hand with some little/medium difficulty task where I can help with?
19:27redsheep[d]: snowycoder[d]: Based on looking at the gitlab issues it looks like you already found the difficulty tags, but there are a couple untouched ones on medium that maybe you haven't seen:
19:27redsheep[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/?sort=created_date&state=opened&label_name%5B%5D=NVK&label_name%5B%5D=difficulty%3A%20medium&first_page_size=20
19:34mhenning[d]: snowycoder[d]: Oh, that's intertesting. There was some talk that we only had comp-exchange on 64-bit shared atomics, which meant that other atomics would need to be lowered to comp-exchange loops
19:35mhenning[d]: Maybe the hardware gained support though on some generation? https://kuterdinel.com/nv_isa_sm89/ATOMS.html shows us having all the relevant bits
19:35mhenning[d]: What generation are you testing on?
19:36mhenning[d]: If this was added in a specific gen, you could try to reverse engineer which generation added it, and then turn it on starting at that gen
19:56snowycoder[d]: redsheep[d]: They seem quite hard, I can try one (I may ask questions from time to time)
19:57snowycoder[d]: mhenning[d]: I'm testing on Turing (GTX 1660)
19:58redsheep[d]: snowycoder[d]: I'm not really one of the developers so I can't speak on their behalf, but even faith asks questions here so I think you're fine doing that 🙂
20:00snowycoder[d]: redsheep[d]: Thanks! Just don't want to waste time of you experts😅
20:45mhenning[d]: snowycoder[d]: Hmm, that's interesting. Want to make an MR for turning it on then?
20:46mhenning[d]: The commit message for https://gitlab.freedesktop.org/mesa/mesa/-/commit/9b60a1c00e938bfeb4e3e2419960fa1c9e00c77a just says that it doesn't work and doesn't elaborate. Maybe something fixed it in the meantime
20:49mhenning[d]: snowycoder[d]: Also, questions are welcome! Don't worry too much about wasting our time
21:07snowycoder[d]: mhenning[d]: Huh, that's interesting, are there any harder tests that I can check to be safer? (they might work probabilistically)
21:12mhenning[d]: CTS is the main thing we test with
21:13mhenning[d]: As for other tests, gpuharbor comes to mind, although that might focus more on other parts of the memory model than atomics https://gpuharbor.ucsc.edu/webgpu-mem-testing/
21:16mhenning[d]: gfxstrand[d]: Do you remember why you wrote that shared int 64 atomics were broken in https://gitlab.freedesktop.org/mesa/mesa/-/commit/9b60a1c00e938bfeb4e3e2419960fa1c9e00c77a ? Was it just cts fails, or was something else broken?
21:17mhenning[d]: It's also worth testing all of cts, beyond just the tests that are obviously affected
22:46snowycoder[d]: mhenning[d]: Is there any similar spec for nv70? Or have you reverse engineered that from a PTX compiler/nvdisasm?
22:49redsheep[d]: snowycoder[d]: IIUC that was generated using this: https://github.com/kuterd/nv_isa_solver
22:51mhenning[d]: Yeah, no-one has run that on earlier generations than ada and hopper yet, but a similar page could likely be generated
22:53mhenning[d]: snowycoder[d]: You could also reverse engineer it by writing a cuda program that uses shared atomics and then compiling/disassembling it for different generations
22:55snowycoder[d]: redsheep[d]: It could be useful to have a complete spec like that page, but I don't have a PC that powerful, it would likely run for a lot
22:55snowycoder[d]: Tomorrow I'll try
23:52gfxstrand[d]: snowycoder[d]: We need lowering to cmpxchg for all the ALU stuff. I think the issue talks about that.
23:55gfxstrand[d]: mhenning[d]: I was just parroting what Karol said. But also I could have sworn we failed tests. Maybe you're just not finding them?
23:56mhenning[d]: gfxstrand[d]: https://kuterdinel.com/nv_isa_sm89/ATOMS.html shows us as having all of that stuff
23:56mhenning[d]: And yeah, it's worth doing a full cts run for it