01:54 fdobridge: <g​fxstrand> Once again debating whether I want uniform ops to be different opcodes or re-use good ol' iadd, etc.
01:57 fdobridge: <z​mike.> don't overthink it
02:01 fdobridge: <r​edsheep> Just found this in the docs, isn't this the answer?
02:01 fdobridge: <r​edsheep>
02:01 fdobridge: <r​edsheep> ```On Turing, the compiler has the option to push these integer operations
02:01 fdobridge: <r​edsheep> onto the separate uniform datapath, out of the way of the main datapath. To
02:01 fdobridge: <r​edsheep> do so, the compiler must emit uniform datapath instructions.```
02:01 fdobridge: <g​fxstrand> Not really
02:02 fdobridge: <g​fxstrand> That's a description of maybe how NVIDIA's compiler works on the inside, not a mandate for what we need do.
02:02 fdobridge: <r​edsheep> So you can still get full utilization of parallel int and fp without that on turing?
02:03 fdobridge: <g​fxstrand> It's not really about parallelism AFAIK
02:03 fdobridge: <g​fxstrand> It just uses a different register file
02:03 fdobridge: <r​edsheep> I am reading 3.5.2 here: https://arxiv.org/pdf/1903.07486
02:04 fdobridge: <r​edsheep> The extra path turing added to get integer instructions out of the way of the fp pipes needs the separate instructions, unless I am completely misunderstanding
02:04 fdobridge: <r​edsheep> Otherwise you'd get ~pascal perf with mixed int and fp workloads
02:05 fdobridge: <r​edsheep> Nvidia had said at the time it improved IPC 1.36x
02:06 fdobridge: <g​fxstrand> I mean, it can increase float throughput in the sense that it lets you reduce register pressure and increase occupancy.
02:11 fdobridge: <r​edsheep> It would be interesting to see what instructions nvidia spits out for RT shaders, because IIRC this stuff was added with the aim towards RT related stuff
02:14 fdobridge: <r​edsheep> Just reread, I see what you mean. Using the dedicated int path doesn't require them. So you're just saying that increased occupancy is a result of less register pressure, and not from actually depending on using these instructions
02:17 fdobridge: <g​fxstrand> Yup
02:19 fdobridge: <g​fxstrand> I don't know that it was added for RT. Maybe? I think it's more targeted towards bindless UBOs and other address calculations
02:19 fdobridge: <r​edsheep> IIUC it was address calculations for RT stuff because that part of the workload got much heavier than traditional raster?
02:20 fdobridge: <r​edsheep> I dunno exactly how it works, I do remember address calculations being mentioned somewhere though
02:21 fdobridge: <r​edsheep> I still don't understand what half this stuff is, so take all this with a grain of salt of course. Trying to understand gpu architecture from scratch is really hard...
02:29 fdobridge: <r​edsheep> What have you been using to figure out what works with your NAK improvements? Do you have some way to test or profile things?
02:33 fdobridge: <g​fxstrand> Not really. Mostly just my own mental model of the hardware.
02:48 fdobridge: <M​isyl with Max-Q Design> Worth making something like the fossil metrics on aco?
06:14 fdobridge: <m​arysaka> that would be a great idea
10:45 fdobridge: <a​huillet> Please share complete instructions on how to repro. You can DM me, as this probably doesn't belong here
11:08 fdobridge: <m​agic_rb.> will do
22:49 fdobridge: <r​edsheep> So is the plan to merge the MR defaulting zink now that CI for NVK+Zink exists?
22:49 fdobridge: <r​edsheep>
22:49 fdobridge: <r​edsheep> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29232
22:52 fdobridge: <r​edsheep> I'm kind of split on what I think of this at the moment. Obviously there are issues, given I've opened 4 myself, but merging it would prompt quite a bit of testing on more varied setups before 24.2
22:57 fdobridge: <a​irlied> it's still relies on a kernel patch that hasn't landed anywhere useful yet, so I'm not going to do it until after that
22:58 fdobridge: <r​edsheep> Ok. Hopefully it's not too much of a flood of issues getting opened, hopefully people testing mesa git know how to disable it if things break for them
23:00 fdobridge: <a​irlied> I'll enable it in rawhide at some point, I think I've seen some triangle tearing on gdm
23:01 fdobridge: <r​edsheep> Right, I'm pretty sure there are sync issues somewhere
23:07 fdobridge: <r​edsheep> Hmm. I wonder if the sync bug causing flicker for me is actually the same as the chromium/discord corruption? Multi process rendering means the application has to sync internally, right?
23:17 fdobridge: <r​edsheep> Come to think of it, it seems plausible to me that all 4 of my issues could be the same big since my stack trace for session crashes mentions DRI
23:33 fdobridge: <r​edsheep> I have never used gdb before, that is the appropriate tool to try to understand what went wrong with my session crash coredumps, right?
23:39 fdobridge: <r​edsheep> Ah right as I was running into issues trying to use GDB because I had rebuilt mesa since the last crash the debug gods smiled upon me and crashed my session 🤣
23:46 fdobridge: <m​henning> yes, gdb is the right tool to get a backtrace from a coredump
23:50 fdobridge: <r​edsheep> Hmm yeah this appears to confirm this is a sync issue, it is segfaulting at vulkan/runtime/vk_synchronization.c:399
23:54 fdobridge: <r​edsheep> Am I interpreting that right?
23:54 fdobridge: <r​edsheep>
23:54 fdobridge: <r​edsheep> ```Core was generated by `/usr/bin/plasmashell --no-respawn'.
23:54 fdobridge: <r​edsheep> Program terminated with signal SIGSEGV, Segmentation fault.
23:54 fdobridge: <r​edsheep> #0 0x00007f46a1e9db73 in vk_common_QueueSubmit (_queue=<optimized out>, submitCount=1, pSubmits=<optimized out>, fence=0x7f4680001030) at ../mesa/src/vulkan/runtime/vk_synchronization.c:399
23:54 fdobridge: <r​edsheep> 399 .semaphore = pSubmits[s].pWaitSemaphores[i],
23:54 fdobridge: <r​edsheep> [Current thread is 1 (Thread 0x7f4687e006c0 (LWP 86065))]```