IRC Logs of #nouveau on irc.freenode.net for 2025-02-16

00:13 gfxstrand[d]: Yeah. I'm pretty sure the tests are hidden away
00:23 gfxstrand[d]: I really should have made a note of what tests fail when I made the issue. I'm really bad at that.
00:40 snowycoder[d]: mhenning[d]: I'm also running the script to generate the ISA for sm70 so that we can find what atomic instructions we can encode
00:41 snowycoder[d]: (I hope i did it right, there's not much documentation, guess I'll find out in a few hours)
01:17 gfxstrand[d]: mhenning[d]: Yes, atoms has all the ops but I don't think they work for 64-bit. I'm not sure that the assembler tells us that, though.
01:18 gfxstrand[d]: I can kick off a CTS run in an hour or so. No promises on reporting the results yet tonight, though.
01:42 gfxstrand[d]: zmike[d]: I have most of a piglit test now. I'll finish it up on Monday and either fix Zink or give you something very concrete to make work.
02:00 gfxstrand[d]: gfxstrand[d]: It's running now. We'll see if I'm still awake in half an hour to look at the results.
02:06 gfxstrand[d]: I'm seeing a bunch of shader exceptions in dmesg. I don't have test names yet but I will in half an hour or so. I'll dump the list on the bug when I have it.
02:31 gfxstrand[d]: snowycoder[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10330
02:31 gfxstrand[d]: List of failing tests is there now
03:29 gfxstrand[d]: zmike[d]: https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/992
03:30 gfxstrand[d]: ANV+Iris fails it, too. :frog_upside_down:
03:46 gfxstrand[d]: I suspect that's because ANV isn't properly resetting semaphores and isn't doing WAIT_PENDING, either.
03:47 gfxstrand[d]: Though nothing is ever threaded so I'm not sure that matters
03:48 gfxstrand[d]: Oh, well. I'll look on Monday.
11:08 snowycoder[d]: gfxstrand[d]: Thanks! I filtered my tests and I missed those.
11:08 snowycoder[d]: Reading nvcc/nvdisasm seems like nvidia compiles 64-bit atomics with Compare-and-swap, guess their hardware doesn't support them completely
11:10 karolherbst[d]: ohh yeah.. there are restrictions
11:10 karolherbst[d]: 64 bit besides global is all cas
11:11 karolherbst[d]: shared has a special CAS operation
11:19 snowycoder[d]: Makes sense. We're emitting `ATOMS.ADD.64` right now, that's valid assembly but the hardware crashes when we send it.
11:19 snowycoder[d]: Now i "just" need to replace that with a CAS loop, wish me luck😂
11:20 karolherbst[d]: 🥲
11:21 karolherbst[d]: be happy you don't work on intel hardware which doesn't even have 64 bit CAS for shared
11:22 snowycoder[d]: ahahahah, how do they implement atomics then?
11:22 karolherbst[d]: locks
11:22 snowycoder[d]: oh god
11:22 karolherbst[d]: yeah..
11:22 karolherbst[d]: have to reserve some shared storage to lock and then do whatever you need to do to emulate 64 bit atomics
11:53 gfxstrand[d]: Yeah, it's pretty dire
11:54 gfxstrand[d]: But yeah, replacing atomics with CAS loops is a pretty good "my first NIR pass" project.
11:57 gfxstrand[d]: We also need a CAS loop if we want to do `f16vec2` atomics on shared.
12:51 karolherbst[d]: yeah
13:05 HdkR: Didn't some hardware implement some atomic operation natively which is why Unreal Engine's Nanite abuses it?
13:08 HdkR: For some reason I'm thinking float minimum but I could be conjouring random answers
13:09 HdkR: Or maybe float add
13:09 karolherbst[d]: for fp16?
13:10 karolherbst[d]: some fp16 atomic ops exist natively for global memory accesses on nvidia
13:10 HdkR: Pretty sure fp32
13:11 karolherbst[d]: well fp32 atomics exist for quite some while, but yeah nvidia has it for add e.g.
13:12 HdkR: ah
16:00 gfxstrand[d]: HdkR: Image int64 atomic min/max
16:01 HdkR: ah
16:02 HdkR: Definitely a spicy one since it's an image operation as well
16:03 gfxstrand[d]: They use it for parallel depth testing. Depth goes in the top 32 bits and a primitive ID goes in the bottom 32 bits. The primitive ID both ensures determinism and let's them know who won.
16:04 gfxstrand[d]: It's pretty clever
16:05 gfxstrand[d]: But yeah, Intel has to implement it by doing address calculations in the shader. 🤡
16:06 HdkR: Sounds like they need to hurry and get that feature added to hardware :D
16:06 mohamexiety[d]: yeah was about to say that I wonder if that was a contributing factor into why Alchemist struggles a bit with UE5, given it doesn't have int64 support in the first place
16:06 mohamexiety[d]: battlemage I think has int64, but not sure about atomics
16:07 gfxstrand[d]: I think battlemage should but I'm so disconnected from all that that I don't really know anymore.
16:07 kayliemoony[d]: imo the real problem is just nanite /hj
16:08 kayliemoony[d]: they've been doing good so far tho with iterative hw improvement though so i have hope
16:13 karolherbst[d]: I highly doubt that int64 is where you lose any significant amount of perf
16:14 karolherbst[d]: it's not like that nvidia has much int64 support either
16:14 gfxstrand[d]: It is a serious not path but a few address calculation instructions aren't going to kill you.
16:14 karolherbst[d]: yeah..
16:15 karolherbst[d]: and you can probably use the die space for more important things instead
16:15 karolherbst[d]: and you can do most address calculations in 32 bit if you really care
16:16 gfxstrand[d]: But I think it's more that "OMG! All the geometry!" sounds great and looks good in a tech demo but isn't as good an idea as it sounds for an actual dynamic game. Maybe one day...
16:17 karolherbst[d]: I wonder if it makes sense to pimp `vm_utils` a bit so that allocations never cause overflows as long as you access within the allocation
16:18 karolherbst[d]: address calculations within that allocation I mean
16:18 karolherbst[d]: but then again.. does it even matter all that much
16:19 gfxstrand[d]: 🤷🏻‍♀️
16:20 gfxstrand[d]: Oh, I remembered something that might be slowing down D3D12: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12622
16:20 gfxstrand[d]: I'm gonna implement on Monday. I'm not sure if they still use it now that they have EDB, though. 🤷🏻‍♀️
17:10 juri_: karolherbst[d]: I just read your patch. Thank you.
17:18 juri_: I used to be into kernel development a long time ago, under a different name, when alan cox was holding his own making an inclusive linux community. I really miss it.
17:19 juri_: Now, I am living in an increasingly-less-foreign country, having to avoid my country's embassy, to make sure they don't intentionally screw up my passport.
17:21 karolherbst: thanks for your words, it's... been a week and I'm quite exhausted
17:22 juri_: we all are. rest.
17:52 djdeath3483[d]: gfxstrand[d]: It does not
17:53 gfxstrand[d]: 😭
17:54 djdeath3483[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32676 it doesn't look too bad in emulation
17:55 djdeath3483[d]: for now we stick to one tiling which prevents sparse support
18:00 snowycoder[d]: gfxstrand[d]: There's already a pass for that in nir that we could use, but should I implement it in nir_cg or nak? Because it doesn't seem like nak is calling any of the nir passes
18:06 mhenning[d]: snowycoder[d]: Focus on nak, not codegen - codegen is deprecated
18:07 snowycoder[d]: So I should write a new pass? I'd love that!
18:07 mhenning[d]: nak uses plenty of nir passes - take a look eg. at nak_postprocess_nir in src/nouveau/compiler/nak_nir.c
18:08 mhenning[d]: If there's an existing nir pass that implements what you need, then it's best to make use of that. If not, then go ahead and write a new one
18:08 snowycoder[d]: Oh sorry, I was looking at the rust side and I missed it
18:09 mhenning[d]: No need to be sorry - it can be tricky to navigate codebases that you're not familiar with
19:30 gfxstrand[d]: Yeah, it should be in NIR. NAK is a bit too low level guy complex lowering.
19:35 gfxstrand[d]: There are NIR passes for atomics but I'm not aware of one that does exactly what we need. But also there are NIR passes I don't remember existing. 🙃
19:36 karolherbst[d]: from a quick look `nir_lower_atomics` kinda sounds like what's needed here, no?
19:36 gfxstrand[d]: I guess there is.
19:36 karolherbst[d]: it only handles ssbos and global, but that could be changed
19:36 gfxstrand[d]: We might need to add a couple bits but it does look about right.
19:36 karolherbst[d]: might need a callback added so drivers can select what's lowered
19:36 karolherbst[d]: yeah
19:37 gfxstrand[d]: Yup. Looks like we just need to add shared support.
19:37 karolherbst[d]: ohh it already has a callback, nice
19:39 djdeath3483[d]: the callback is implemented in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32676
19:39 djdeath3483[d]: since intel also just needs to lower 64bit
19:40 djdeath3483[d]: I guess I should land it since Alyssa reviewed it
19:55 snowycoder[d]: gfxstrand[d]: Yes, that's exactly what I did
19:56 snowycoder[d]: I'm just running the full CTS to be sure that I didn't break anything else
20:21 airlied[d]: gfxstrand[d]: does https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33521 need anything else?
20:22 airlied[d]: I have started transcribe excel spreadsheets to rust, but I think that is definitely a larger MR 😛
20:59 gfxstrand[d]: airlied[d]: I don't know. I haven't looked at it yet. I was at Vulkanised last week.
21:02 gfxstrand[d]: I also need to rework the dep tracker a bit before we'll be ready for the full Excel spreadsheet.
21:03 gfxstrand[d]: Planning to hack on that next week
21:03 karolherbst[d]: it's going to be fun for sure
21:08 airlied[d]: gfxstrand[d]: did you get access to the spreadsheets? because I've started transcribing it, and I can probably fix some of the dep tracker on the way
21:24 airlied[d]: https://gitlab.freedesktop.org/airlied/mesa/-/commit/69d1c4209362d1e8bb5b28988cc2af3ae928fb20 is where I started typing in Turing, it's got raw and waw for registers done so far
21:24 airlied[d]: (not hooked up, just typed in)
21:26 gfxstrand[d]: airlied[d]: I should have it soon.
21:27 airlied[d]: I'm going to keep typing turing at least, I think I need it to have any real chance of making coop matrix work
21:28 gfxstrand[d]: Ok
22:37 snowycoder[d]: How do you run deqp-runner without going Out-of-memory? do you have 4GiB per processor?
22:38 snowycoder[d]: Also, is it weird that I can't hide deqp windows even with visibility=hidden and surface-type-pbuffer?
22:39 gfxstrand[d]: snowycoder[d]: I have about 2GB/thread
22:40 gfxstrand[d]: snowycoder[d]: Configure your Mesa with `-Dplatforms=` and the WSI tests won't run.
22:41 gfxstrand[d]: Also, for Vulkan, you should be using `deqp-vk` and it doesn't pop up windows for anything but WSI tests.
22:45 snowycoder[d]: Oh, that explains a lot, thank you
23:07 anholt: snowycoder[d]: if you end up doing any zink testing, you'll want to take a look at the .tomls in the Mesa tree for how we test without popping up (many) windows (we do that because windows are significant overhead for simple tests).