00:00_lyude[d]: gotcha
10:38joobei[d]: phomes_[d]: phomes_[d] what is this tool?
10:52phomes_[d]: It is perfetto. We can trace to a file and view it in perfetto web ui
11:07karolherbst[d]: okay.. let me figure out this constant table mess... 😄
11:09karolherbst[d]: phomes_[d]: what tool did you use to pipe the spv through nvk in https://gitlab.freedesktop.org/mesa/mesa/-/issues/14993 ?
11:17phomes_[d]: I grabbed the spv from renderdoc and recompile it with nvdump
11:24joobei[d]: So, I'm totally new to linux driver dev and would like a simple task to gain some experience. Can anyone recommend something to do? 😄
12:03joobei[d]: something relevant with shader compilation would be nice
12:40karolherbst[d]: phomes_[d]: nvdump can compile with nvk?
12:55karolherbst[d]: ohh I can probably use `fossilize-synth` here..
13:06loryruta[d]: I was reading about cuda graphs, which afaiu are a way to record multiple kernels and barriers and submit the workload once to the gpu.
13:06loryruta[d]: I was wondering how this maps to the Vulkan API at a driver level. I believe it’s the same as recording one large command buffer and submitting it (?)
13:08marysaka[d]: karolherbst[d]: I mean you use it and then have NAK_PRINT/NIR_PRINT I guess
13:08marysaka[d]: but yeah fossilize-synth is great
13:13karolherbst[d]: mhh something in nir_opt_large_constants doesn't work correctly 🥲
13:16karolherbst[d]: ohh wait..
13:16karolherbst[d]: nir_opt_large_constants is the wrong thing..
13:17karolherbst[d]: mhhh
13:28karolherbst[d]: okay.. it doesn't like the array of vectors...
15:47karolherbst[d]: okay... fixed
16:00mhenning[d]: loryruta[d]: haven't looked at cuda graphs, but there's some info on emulating dx12 workgraphs on vulkan here: https://github.com/HansKristian-Work/vkd3d-proton/blob/master/docs/workgraphs.md
16:00karolherbst[d]: just need to figure out how I can reorder passes so that things don't blow up :blobcatnotlikethis:
16:16karolherbst[d]: fixing that gives me: `SLM Size: 2595464 -> 652548 (-74.86%)` (still collecting stats)
16:16karolherbst[d]: it's really silly 😄
16:19karolherbst[d]: phomes_[d]: here, I've fixed your issue: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40282
16:20karolherbst[d]: it _should_ help performance in games, because it appears that quite a lot of actually run into this problem..
16:21karolherbst[d]: though mostly dx11 and dx12 games it seems
16:21karolherbst[d]: also a few native ones
16:23karolherbst[d]: hopefully we'll find a couple of more of those kind of perf issues 🙃
16:25karolherbst[d]: I don't suspect the CTS to run into any issues here, but looks good so far
16:45glehmann: fyi, I just merged nir_opt_fp_math_ctrl, it might be useful for nvk in dxvk games too.
16:45glehmann: although probably not as much as for radv
16:46glehmann: amd hardware is a bit special about signed zeros
16:50karolherbst[d]: yeah not sure we really have much there
16:50karolherbst[d]: or anything at all...
16:51karolherbst[d]: glehmann: what's special on AMD hardware there?
16:52glehmann: even if you don't use the information in the backend, nir_opt_algebraic still benefits
16:52glehmann: amd is special because we have a free output modifier for fmul with 0.5, 2.0 or 4.0
16:53glehmann: but that still uses a dx9 multiplication so -0.0 turns into +0.0
16:53karolherbst[d]: ohh I see
16:53karolherbst[d]: mhhhh
16:54karolherbst[d]: we also have that on nvidia, but it's a input multiplier 🙃
16:54karolherbst[d]: but it might have the same restriction
16:54karolherbst[d]: I don't think it's wired up in NAK tho
16:55karolherbst[d]: yeah, it's not
16:55karolherbst[d]: can do 0.125, 0.25, 0.5, 2, 4 or 8
16:56karolherbst[d]: we have a .FMZ modifier that forces 0.0 to be +0.0, but no idea how that interacts with the input multiplier
17:09hatfielde: hey i was just looking at that. yeah exponents [-3,3]
17:13karolherbst[d]: I think this will require what the hardware is doing there precisely, because it doesn't appear to be IEEE compliant
17:13karolherbst[d]: whatever that means
18:42hatfielde: Implementing the input multiplier for fmul on nvk -- do we think this should be another field on OpFMul or a new enum variant (OpFMulPDiv?)
18:50karolherbst[d]: hatfielde: I'd add it as another field, because afaik that one existed since forever
18:51karolherbst[d]: Seems like Since Fermi
18:51karolherbst[d]: which means every GPU we care about 🙃
18:53karolherbst[d]: though one thing to consider is to maybe add a nir_op_fmul_nv so we can do those optimizations in nir
19:05hatfielde: as a replacement for the existing fmul? i have an implementation "working" but with adding a special fmul_pdiv_nv (3 operands) that is used only in this situation, otherwise we use normal fmul. this needs to be more rigorously verified against the intricacies of fmul semantics. 2src_commutative is questionable for this 3 operand instruction, as is the correct associativity. also the constant generation under rtz, unclear if we rtz
19:05hatfielde: in the intermediate calculation or only at the end.
19:06hatfielde: karolherbst[d]: also, not sure if we want to call this "PDIV" (which I just got from a seemingly related TODO) or something like "scale"
19:08karolherbst[d]: hatfielde: yeah I guess either way is fine
19:08karolherbst[d]: all I know is that the constant factor is applied to the first argument
19:09karolherbst[d]: but somebody needs to figure out the exact precision there
19:09karolherbst[d]: and how in which sense it doesn't match IEEE
19:37mhenning[d]: hatfielde: only using fmul_pdiv_nv where relevant (instead of replading all fmuls) sounds reasonable to me
19:38mhenning[d]: and yeah, we'll probably want to add tests to hw_tests.rs to check what "not IEEE" means in this case (does it handle denorms? inf? nan?)
19:41phomes_[d]: karolherbst[d]: thank you Karol. It fixed it and improved a few other games as well
19:42hatfielde: mhenning[d]: is hw_tests the place to do this? seems maybe more convenient than using piglit or writing my own vulkan program
19:45mhenning[d]: yes, it's a test for specific instruction semantics so hw_tests is the right place. a vulkan program would have some trouble ensuring that the optimization triggers, which makes testing this kind of thing at a vulkan level annoying
19:49hatfielde: HAHA you're telling me
19:50karolherbst[d]: phomes_[d]: not by much tho 🙃
19:50karolherbst[d]: but I might have a few games that should be impacted more
19:51karolherbst[d]: what games do you have access to anyway?
19:52karolherbst[d]: aztec-ruins might benefit a lot...
19:52karolherbst[d]: maybe I should get gfxbench and test myself there
20:13phomes_[d]: karolherbst[d]: I will send you the list of games in my steam collection in a dm
20:14karolherbst[d]: if it's public you can also give me your steam id 😄
20:15karolherbst[d]: which makes it easier, because then I'll just fetch it via the API
20:55karolherbst[d]: mhenning[d]: wanna review the newest version or should I just merge it? (I applied Faiths suggestion)
21:54jannau: karolherbst[d]: no gfxbench binaries but a github repo: https://github.com/Kishonti-Opensource/gfxbench
21:55karolherbst[d]: yeah.. tried to buil dit
21:55karolherbst[d]: turns out, it needs openssl which is ancient
21:55karolherbst[d]: should probably setup a container or something
21:58jannau: https://github.com/Kishonti-Opensource/gfxbench/issues/4
22:03karolherbst[d]: ohhh let me try that
22:06karolherbst[d]: there are other issues uhh.. and they aren't described in the issue
22:09karolherbst[d]: well maybe I just delete code until it compiles heh
22:26karolherbst[d]: this benchmark is such a pain... even built it doens't work
22:35esdrastarsis[d]: phomes_[d]: Add to the spreadsheet :happy_gears:
22:38mhenning[d]: karolherbst[d]: reviewed
22:44jannau: karolherbst[d]: works for me on asahi after adding a few unistd.h include in zlib and disabling Net in poco. full diff https://paste.centos.org/view/a54e0087
22:45jannau: with that at least the sample commands from the issue work
22:46karolherbst[d]: ohh yeah. maybe I should use the command..
22:47karolherbst[d]: mhh getting `File not found: /home/kherbst/git/gfxbench/out/install/linux/config/gl_trex.json`
22:47karolherbst[d]: but maybe I need to use the gl config
22:49karolherbst[d]: mhh
22:49karolherbst[d]: ohh wait.. I used the wrong binary
22:50karolherbst[d]: it crashes inside strlen 🥲
22:57karolherbst[d]: okay.. got it
23:12karolherbst[d]: okay.. I think I got the vulkan version to run
23:13karolherbst[d]: not that it runs well tho
23:13karolherbst[d]: 😄
23:14karolherbst[d]: okay... anv: 10 fps, nvk: 310 fps
23:20karolherbst[d]: yeah well.. I don't really see any perf improvement, but I also haven't verified if the shader run into the issue or not..
23:20karolherbst[d]: oh well
23:20karolherbst[d]: nvm
23:21karolherbst[d]: I wouldn't be surprised if local memory is aggressively L1 cached