02:21 fdobridge: <a​irlied> Guess we better file that one as well
02:22 fdobridge: <a​irlied> Guess I have to do a round of every Turing card, though I assume Ben did one at some point, there was some early cards with nvdec issues but I think they solved that
02:24 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> It's too slow right now
16:30 fdobridge: <g​fxstrand> vkcube perf will get fixed. It'll be okay.
16:33 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> I was even able to load Overwatch 2 (so it's not just vkcube having issues)
16:34 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> The non-existent performance literally prevented me from getting into the tutorial in that game
16:34 fdobridge: <g​fxstrand> Sure. We'll get there.
16:35 fdobridge: <g​fxstrand> Correct first. Fast will come.
17:03 fdobridge: <g​fxstrand> I wonder if I just need to back off on the cache coherency for `load_global`... 🤔
17:14 fdobridge: <m​henning> If you want to just tinker with something that's faster, my branch at https://gitlab.freedesktop.org/gfxstrand/mesa/-/merge_requests/53 helps
17:14 fdobridge: <m​henning> although that's using a feature that codegen doesn't have, so it doesn't explain the perf gap and also I still need to re-run cts on that now that we have memory model stuff
17:18 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> zmike-level improvements are beginning 🐸
18:33 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> GSP doesn't really bring a major performance improvement (5-6->7 FPS in PPSSPP at 5x resolution)
18:37 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> This is with codegen BTW (NAK is probably going to be slower)
18:40 fdobridge: <k​arolherbst🐧🦀> checked how the CPU load looks like?
18:41 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> FPS does scale down the higher the resolution I pick (I haven't really looked at the CPU usage)
18:42 fdobridge: <k​arolherbst🐧🦀> we should also wire up GPU engine counters to know how the load on the GPU looks like...
18:43 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> This is what I've been waiting for a long time (so we could have better performance monitoring)
18:44 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Here's how MangoHUD stats look without GSP
18:44 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1172970043055816744/Screenshot_20231111_204248.png?ex=6562407a&is=654fcb7a&hm=3e44589c790804670940272020025b32b3b3116160c6c59984b2d833f4154229&
18:44 fdobridge: <k​arolherbst🐧🦀> I doubt it's hard to figure out with GSP...
18:44 fdobridge: <k​arolherbst🐧🦀> mhhh
18:45 fdobridge: <k​arolherbst🐧🦀> maybe something is up with GSP?
18:46 fdobridge: <k​arolherbst🐧🦀> maybe need to do something special.. dunno
18:56 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> And with GSP
18:56 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1172973161348857887/Screenshot_20231111_205610.png?ex=65624362&is=654fce62&hm=151b048c44aa8b25c45d6ecd16c168e72b40ac2f1b9ad4aa664bd278033fb981&
19:00 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> And with `NV50_PROG_OPTIMIZE=4` on top (definitely very stable™️)
19:00 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1172974139049529354/Screenshot_20231111_205940.png?ex=6562444b&is=654fcf4b&hm=05fc78d446f666570eecaad2489f99cb7fc4711a108695351e6ce1adf380ed19&
19:16 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> The pipeline caching MR (!25550) makes no real difference in framerate with codegen at least
19:47 fdobridge: <a​irlied> I think just looking at the shaders from something like Sascha gears or triangle would probably spot the issue but not sure, since it might be just global load caching
19:50 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> vkcube definitely doesn't run at 144 FPS with NAK so it should be an easy test case
20:11 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Removing the WFI in CmdPipelineBarrier() doesn't improve the performance either (so that's not the issue here)
20:15 fdobridge: <g​fxstrand> Nice! I think we may want to bring back load_global_constant instead of trusting CAN_REORDER but, yeah, we should do that.
20:21 fdobridge: <m​henning> Yeah, I was debating whether we should represent it with load_global_constant or load_global+CAN_REORDER at the nir level. CUDA always infers it for `restrict` pointers that are never written, which is why I chose to do it this way and add a call to nir_opt_access
20:21 fdobridge: <m​henning> to be honest though I don't understand why nir has both load_global_constant and load_global+CAN_REORDER, so I might be missing something subtle about the semantics.
20:22 fdobridge: <k​arolherbst🐧🦀> mhhhhhhhhhhhhhh... kinda feels like maybe the memory speed isn't at the max...
20:22 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> So maybe GSP reclocking isn't working properly?
20:22 fdobridge: <k​arolherbst🐧🦀> that would be my assumption
20:22 fdobridge: <k​arolherbst🐧🦀> it would definetly help to wire up reporting the set clocks and stuff
20:23 fdobridge: <k​arolherbst🐧🦀> @asdqueerfromeu could you check GSP with pixmark_piano?
20:23 fdobridge: <k​arolherbst🐧🦀> that's a very very very very ALU heavy benchmark
20:23 fdobridge: <k​arolherbst🐧🦀> and memory speed doesn't really impact the fps there
20:24 fdobridge: <k​arolherbst🐧🦀> anyway, thanks for testing GSP performance
20:25 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Is it a part of GpuTest?
20:25 fdobridge: <k​arolherbst🐧🦀> correct
20:25 fdobridge: <k​arolherbst🐧🦀> it's GL, but shouldn't matter for GSP perf testing
20:25 fdobridge: <g​fxstrand> It's more an artifact of history than anything.
20:25 fdobridge: <k​arolherbst🐧🦀> might also want to check furmark
20:26 fdobridge: <k​arolherbst🐧🦀> that's heavy on everything
20:26 fdobridge: <g​fxstrand> I'm just worried that if it has caching implications we might not want to turn it on for arbitrary SSBO access
20:26 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> I still have the modifier issue which prevents me from testing nouveau OpenGL
20:27 fdobridge: <k​arolherbst🐧🦀> pain...
20:27 fdobridge: <k​arolherbst🐧🦀> even when just doing the `DRI_PRIME=1` thing?
20:27 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> glxgears works but more complex stuff fails
20:28 fdobridge: <k​arolherbst🐧🦀> x or wayland?
20:28 fdobridge: <k​arolherbst🐧🦀> I know it works on my laptop with gnome+wayland
20:28 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Wayland
20:29 fdobridge: <k​arolherbst🐧🦀> mhhh
20:29 fdobridge: <k​arolherbst🐧🦀> maybe try a different compositor
20:29 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> I hit the DXVK GSP bug with SuperTuxKart 💥
20:29 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1172996570023731220/message.txt?ex=6562592f&is=654fe42f&hm=aa2d899b2d3a68505a0bad7c0c5ec076a795f8a9ab3a4852521b5944e3c219c9&
20:30 fdobridge: <k​arolherbst🐧🦀> @asdqueerfromeu do you have a normal intel + nvidia setup or something else?
20:30 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> AMD + NVIDIA laptop
20:30 fdobridge: <k​arolherbst🐧🦀> mhhhhhh
20:30 fdobridge: <k​arolherbst🐧🦀> yeah.. I haven't tried that
20:30 fdobridge: <k​arolherbst🐧🦀> could be broken
20:31 fdobridge: <k​arolherbst🐧🦀> if you have a display you could also disable the amdgpu driver
20:31 fdobridge: <k​arolherbst🐧🦀> well
20:31 fdobridge: <k​arolherbst🐧🦀> if you have connectors wired to the nvidia GPU
20:31 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> nvidia-open recently got fixed
20:31 fdobridge: <k​arolherbst🐧🦀> mhhh
20:32 fdobridge: <k​arolherbst🐧🦀> so something we might have to fix as well
20:32 fdobridge: <k​arolherbst🐧🦀> do you know what fixed it?
20:32 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> A driver update (previously the nvidia-open driver would error out)
20:32 fdobridge: <k​arolherbst🐧🦀> mmhhhh
20:32 fdobridge: <k​arolherbst🐧🦀> not much to go on then
20:34 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> https://github.com/NVIDIA/open-gpu-kernel-modules/issues/282#issuecomment-1536484012
20:34 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> I'm not sure if I still have the modifier issue (because the GSP issue overshadowed it)
20:35 fdobridge: <k​arolherbst🐧🦀> ohh, I see
20:35 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> I'll reboot to a non-GSP kernel and see what happens there
20:37 fdobridge: <k​arolherbst🐧🦀> but yeah, if it's a gsp related issue might not be super hard to figure out what's wrong
20:39 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> That GSP error also happens on DXVK after commit https://github.com/doitsujin/dxvk/commit/0709c5f5c7e4f5f25e92e7cef263bc2edf9128b4 and I don't know why that random commit causes the issue
20:44 fdobridge: <m​henning> @gfxstrand For clarity, is the concern about correctness or performance?
20:46 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Anyway the modifier bugs are still there after disabling GSP (so I have to recompile Mesa with some reverts)
21:03 fdobridge: <g​fxstrand> Correctness
21:03 fdobridge: <g​fxstrand> Well, and perf if we have to do crazy things to get correctness
21:39 fdobridge: <m​henning> Yeah, that's fair.
21:55 fdobridge: <g​fxstrand> Like, non-writeable from a shader PoV might still mean a VS is writing and an FS is reading and that needs to work.
21:56 fdobridge: <g​fxstrand> Well, maybe... I'm actually not sure if that's supposed to work or not.
21:57 fdobridge: <g​fxstrand> But it affects barriers and stuff if SSBOs can come in through the constant cache.
21:57 fdobridge: <g​fxstrand> But certainly any UBOs we promote to global should.
22:02 fdobridge: <b​enjaminl> thinking about how to do carry flag deps in nak... I have ~zero compilers experience, and am guessing there are standard approaches for this that I just don't know about
22:03 fdobridge: <b​enjaminl> but the two ways I can think of would be to handle them as special case in all the passes that need to know instruction deps, or to make the carry flag an SSA register file and then assign uses to the 1 available carry register in regalloc
22:06 fdobridge: <k​arolherbst🐧🦀> how many UBOs do vulkan require in shaders?
22:06 fdobridge: <k​arolherbst🐧🦀> *does
22:07 fdobridge: <g​fxstrand> Bindless
22:07 fdobridge: <k​arolherbst🐧🦀> and what's the sane default everybody should advertise?
22:07 fdobridge: <k​arolherbst🐧🦀> mhhh
22:07 fdobridge: <k​arolherbst🐧🦀> so infinite
22:09 fdobridge: <g​fxstrand> I've been thinking about this... Really, the important thing isn't so much allocating it (there's only one) as describing the dependency so DCE doesn't kill the producer off. We also need to ensure that the scheduler we don't have yet won't interleave things that use carry in invalid ways.
22:09 fdobridge: <k​arolherbst🐧🦀> I think LD(G).CONSTANT just uses the normal data caches tho
22:09 fdobridge: <k​arolherbst🐧🦀> but dunno
22:10 fdobridge: <g​fxstrand> That's not a bad plan, honestly. Just add a new "carry" file which has one register. Then RA will fail if the invariants are ever violated.
22:10 fdobridge: <g​fxstrand> I think it does, it just throws coherency to the wind which, honestly, is probably fine.
22:11 fdobridge: <k​arolherbst🐧🦀> yeah
22:11 HdkR: ldg.constant changed behaviour in a recent generation
22:11 HdkR: So it's good to use for constant behaviour :)
22:11 fdobridge: <k​arolherbst🐧🦀> well since Volta it claims that writes have unpredictable results
22:12 fdobridge: <k​arolherbst🐧🦀> so we really should only use it for read only data
22:12 fdobridge: <k​arolherbst🐧🦀> what I'm wondering tho is if a ssbo read_only flag is good enough
22:12 HdkR: Yea, don't use it for stores, that sounds nuts
22:12 fdobridge: <k​arolherbst🐧🦀> like.. if it's per stage or not
22:12 fdobridge: <k​arolherbst🐧🦀> nah.. the bigger concern is, what if a vertex shader writes to it, but a fragment reads
22:12 fdobridge: <k​arolherbst🐧🦀> like.. it's read_only only in the fp stage
22:13 HdkR: ah, spooky
22:13 HdkR: Should be per-stage though, so cross-stage will be fine
22:13 fdobridge: <k​arolherbst🐧🦀> yeah.. probably
22:13 fdobridge: <k​arolherbst🐧🦀> otherwise that flag would be tooooo limiting
22:13 fdobridge: <k​arolherbst🐧🦀> and pointless
22:13 HdkR: Indeed
22:15 HdkR: Be warned though, if the access offset is not wave-uniform, you mind end up being subject to the same terrible performance of a non-wave uniform UBO load
22:18 fdobridge: <b​enjaminl> cool, that makes sense
22:18 fdobridge: <b​enjaminl> I remember seeing some instructions a while ago to move carry to/from GPR, so theoretically we could spill that way
22:19 fdobridge: <b​enjaminl> I don't think you'd ever want to actually do that though
22:19 fdobridge: <g​fxstrand> Yeah, I don't think so
22:19 fdobridge: <g​fxstrand> But, yeah, let's make a new file for it for now.
23:16 fdobridge: <t​heevilskeleton> https://cdn.discordapp.com/attachments/1166831468350292060/1173030758097621053/2023-11-11_17-45-04.mp4
23:16 fdobridge: <t​heevilskeleton> One step at a time