IRC Logs of #nouveau on irc.freenode.net for 2026-03-07

00:00 _lyude[d]: gotcha
10:38 joobei[d]: phomes_[d]: phomes_[d] what is this tool?
10:52 phomes_[d]: It is perfetto. We can trace to a file and view it in perfetto web ui
11:07 karolherbst[d]: okay.. let me figure out this constant table mess... 😄
11:09 karolherbst[d]: phomes_[d]: what tool did you use to pipe the spv through nvk in https://gitlab.freedesktop.org/mesa/mesa/-/issues/14993 ?
11:17 phomes_[d]: I grabbed the spv from renderdoc and recompile it with nvdump
11:24 joobei[d]: So, I'm totally new to linux driver dev and would like a simple task to gain some experience. Can anyone recommend something to do? 😄
12:03 joobei[d]: something relevant with shader compilation would be nice
12:40 karolherbst[d]: phomes_[d]: nvdump can compile with nvk?
12:55 karolherbst[d]: ohh I can probably use `fossilize-synth` here..
13:06 loryruta[d]: I was reading about cuda graphs, which afaiu are a way to record multiple kernels and barriers and submit the workload once to the gpu.
13:06 loryruta[d]: I was wondering how this maps to the Vulkan API at a driver level. I believe it’s the same as recording one large command buffer and submitting it (?)
13:08 marysaka[d]: karolherbst[d]: I mean you use it and then have NAK_PRINT/NIR_PRINT I guess
13:08 marysaka[d]: but yeah fossilize-synth is great
13:13 karolherbst[d]: mhh something in nir_opt_large_constants doesn't work correctly 🥲
13:16 karolherbst[d]: ohh wait..
13:16 karolherbst[d]: nir_opt_large_constants is the wrong thing..
13:17 karolherbst[d]: mhhh
13:28 karolherbst[d]: okay.. it doesn't like the array of vectors...
15:47 karolherbst[d]: okay... fixed
16:00 mhenning[d]: loryruta[d]: haven't looked at cuda graphs, but there's some info on emulating dx12 workgraphs on vulkan here: https://github.com/HansKristian-Work/vkd3d-proton/blob/master/docs/workgraphs.md
16:00 karolherbst[d]: just need to figure out how I can reorder passes so that things don't blow up :blobcatnotlikethis:
16:16 karolherbst[d]: fixing that gives me: `SLM Size: 2595464 -> 652548 (-74.86%)` (still collecting stats)
16:16 karolherbst[d]: it's really silly 😄
16:19 karolherbst[d]: phomes_[d]: here, I've fixed your issue: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40282
16:20 karolherbst[d]: it _should_ help performance in games, because it appears that quite a lot of actually run into this problem..
16:21 karolherbst[d]: though mostly dx11 and dx12 games it seems
16:21 karolherbst[d]: also a few native ones
16:23 karolherbst[d]: hopefully we'll find a couple of more of those kind of perf issues 🙃
16:25 karolherbst[d]: I don't suspect the CTS to run into any issues here, but looks good so far
16:45 glehmann: fyi, I just merged nir_opt_fp_math_ctrl, it might be useful for nvk in dxvk games too.
16:45 glehmann: although probably not as much as for radv
16:46 glehmann: amd hardware is a bit special about signed zeros
16:50 karolherbst[d]: yeah not sure we really have much there
16:50 karolherbst[d]: or anything at all...
16:51 karolherbst[d]: glehmann: what's special on AMD hardware there?
16:52 glehmann: even if you don't use the information in the backend, nir_opt_algebraic still benefits
16:52 glehmann: amd is special because we have a free output modifier for fmul with 0.5, 2.0 or 4.0
16:53 glehmann: but that still uses a dx9 multiplication so -0.0 turns into +0.0
16:53 karolherbst[d]: ohh I see
16:53 karolherbst[d]: mhhhh
16:54 karolherbst[d]: we also have that on nvidia, but it's a input multiplier 🙃
16:54 karolherbst[d]: but it might have the same restriction
16:54 karolherbst[d]: I don't think it's wired up in NAK tho
16:55 karolherbst[d]: yeah, it's not
16:55 karolherbst[d]: can do 0.125, 0.25, 0.5, 2, 4 or 8
16:56 karolherbst[d]: we have a .FMZ modifier that forces 0.0 to be +0.0, but no idea how that interacts with the input multiplier
17:09 hatfielde: hey i was just looking at that. yeah exponents [-3,3]
17:13 karolherbst[d]: I think this will require what the hardware is doing there precisely, because it doesn't appear to be IEEE compliant
17:13 karolherbst[d]: whatever that means
18:42 hatfielde: Implementing the input multiplier for fmul on nvk -- do we think this should be another field on OpFMul or a new enum variant (OpFMulPDiv?)
18:50 karolherbst[d]: hatfielde: I'd add it as another field, because afaik that one existed since forever
18:51 karolherbst[d]: Seems like Since Fermi
18:51 karolherbst[d]: which means every GPU we care about 🙃
18:53 karolherbst[d]: though one thing to consider is to maybe add a nir_op_fmul_nv so we can do those optimizations in nir
19:05 hatfielde: as a replacement for the existing fmul? i have an implementation "working" but with adding a special fmul_pdiv_nv (3 operands) that is used only in this situation, otherwise we use normal fmul. this needs to be more rigorously verified against the intricacies of fmul semantics. 2src_commutative is questionable for this 3 operand instruction, as is the correct associativity. also the constant generation under rtz, unclear if we rtz
19:05 hatfielde: in the intermediate calculation or only at the end.
19:06 hatfielde: karolherbst[d]: also, not sure if we want to call this "PDIV" (which I just got from a seemingly related TODO) or something like "scale"
19:08 karolherbst[d]: hatfielde: yeah I guess either way is fine
19:08 karolherbst[d]: all I know is that the constant factor is applied to the first argument
19:09 karolherbst[d]: but somebody needs to figure out the exact precision there
19:09 karolherbst[d]: and how in which sense it doesn't match IEEE
19:37 mhenning[d]: hatfielde: only using fmul_pdiv_nv where relevant (instead of replading all fmuls) sounds reasonable to me
19:38 mhenning[d]: and yeah, we'll probably want to add tests to hw_tests.rs to check what "not IEEE" means in this case (does it handle denorms? inf? nan?)
19:41 phomes_[d]: karolherbst[d]: thank you Karol. It fixed it and improved a few other games as well
19:42 hatfielde: mhenning[d]: is hw_tests the place to do this? seems maybe more convenient than using piglit or writing my own vulkan program
19:45 mhenning[d]: yes, it's a test for specific instruction semantics so hw_tests is the right place. a vulkan program would have some trouble ensuring that the optimization triggers, which makes testing this kind of thing at a vulkan level annoying
19:49 hatfielde: HAHA you're telling me
19:50 karolherbst[d]: phomes_[d]: not by much tho 🙃
19:50 karolherbst[d]: but I might have a few games that should be impacted more
19:51 karolherbst[d]: what games do you have access to anyway?
19:52 karolherbst[d]: aztec-ruins might benefit a lot...
19:52 karolherbst[d]: maybe I should get gfxbench and test myself there
20:13 phomes_[d]: karolherbst[d]: I will send you the list of games in my steam collection in a dm
20:14 karolherbst[d]: if it's public you can also give me your steam id 😄
20:15 karolherbst[d]: which makes it easier, because then I'll just fetch it via the API
20:55 karolherbst[d]: mhenning[d]: wanna review the newest version or should I just merge it? (I applied Faiths suggestion)
21:54 jannau: karolherbst[d]: no gfxbench binaries but a github repo: https://github.com/Kishonti-Opensource/gfxbench
21:55 karolherbst[d]: yeah.. tried to buil dit
21:55 karolherbst[d]: turns out, it needs openssl which is ancient
21:55 karolherbst[d]: should probably setup a container or something
21:58 jannau: https://github.com/Kishonti-Opensource/gfxbench/issues/4
22:03 karolherbst[d]: ohhh let me try that
22:06 karolherbst[d]: there are other issues uhh.. and they aren't described in the issue
22:09 karolherbst[d]: well maybe I just delete code until it compiles heh
22:26 karolherbst[d]: this benchmark is such a pain... even built it doens't work
22:35 esdrastarsis[d]: phomes_[d]: Add to the spreadsheet :happy_gears:
22:38 mhenning[d]: karolherbst[d]: reviewed
22:44 jannau: karolherbst[d]: works for me on asahi after adding a few unistd.h include in zlib and disabling Net in poco. full diff https://paste.centos.org/view/a54e0087
22:45 jannau: with that at least the sample commands from the issue work
22:46 karolherbst[d]: ohh yeah. maybe I should use the command..
22:47 karolherbst[d]: mhh getting `File not found: /home/kherbst/git/gfxbench/out/install/linux/config/gl_trex.json`
22:47 karolherbst[d]: but maybe I need to use the gl config
22:49 karolherbst[d]: mhh
22:49 karolherbst[d]: ohh wait.. I used the wrong binary
22:50 karolherbst[d]: it crashes inside strlen 🥲
22:57 karolherbst[d]: okay.. got it
23:12 karolherbst[d]: okay.. I think I got the vulkan version to run
23:13 karolherbst[d]: not that it runs well tho
23:13 karolherbst[d]: 😄
23:14 karolherbst[d]: okay... anv: 10 fps, nvk: 310 fps
23:20 karolherbst[d]: yeah well.. I don't really see any perf improvement, but I also haven't verified if the shader run into the issue or not..
23:20 karolherbst[d]: oh well
23:20 karolherbst[d]: nvm
23:21 karolherbst[d]: I wouldn't be surprised if local memory is aggressively L1 cached