00:18_lyude[d]: karolherbst[d]: yeah I'm not totally sure, but at this point at least I can say for sure that somehow these two things are getting swapped. now I just need to figure out why 🙂
00:19karolherbst[d]: _lyude[d]: blame nvidia, because who calls linear "pitch" and tiled "blocklinear"
05:27airlied[d]: _lyude[d]: some sort of handle reuse problem?
08:21gfxstrand[d]: karolherbst[d]: NIR doesn’t have multiple destinations
08:21gfxstrand[d]: asdqueerfromeu[d]: Yeah, Kepler has some unsolved compiler bugs
08:35GB206_User: hi
08:36GB206_User: just a follow-up observation to the 5060 TI GB206 that would not load the current GSP firmware: That card seemed a recent production batch and had the https://nvidia.custhelp.com/app/answers/detail/a_id/5665/~/nvidia-gpu-uefi-firmware-update-tool-for-rtx-5060-series applied from factory
08:41GB206_User: while this may provide a hint at why older cards of the same type produced before this patch have no problem loading the current GSP firmware, it's mostly about a word of caution: since there is no downgrade path offered publicly, only try applying the NVIDIA GPU UEFI Firmware Update Tool for RTX 5060 Series if you are willing to lose use of the card
09:38airlied[d]: I had a wierd one recently where neither NVIDIA or nouveau would load on a laptop 5080
09:39airlied[d]: It looks related to large bar, but the fw was giving einval
10:37gfxstrand[d]: :frog_clown:
12:25karolherbst[d]: gfxstrand[d]: could do a vec2
12:25karolherbst[d]: but yeah, not perfect
12:27karolherbst[d]: Anyway, I have a prototype in nak that seems to work out
12:29gfxstrand[d]: Yeah, we cold vec2
12:29gfxstrand[d]: But ugh…
13:00karolherbst[d]: yeah...
13:00karolherbst[d]: I was doing vec2 to vectorize f2f@16 🙃 even though it's two sources in F2FP
13:02karolherbst[d]: ~~and isn't the plop3 two dest problems just vectorization with .xx swizzles 🙃~~
15:57karolherbst[d]: you'll hate the code 🙃
16:03karolherbst[d]: gfxstrand[d]: I hope you hate this: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41041 🙃
16:06karolherbst[d]: mhhhh
16:06karolherbst[d]: there seem to be a couple of issues with the current code in regards to dead values and copy proped ones...
21:39karolherbst[d]: okay.. so I think there is some problem with lower_cf...
21:55karolherbst[d]: mhenning[d]: soo.. I was looking more into that shader and I wonder if we need to improve lower_cf to deal with this so we don't end up with those rather useless? phis on the nak side: https://gist.githubusercontent.com/karolherbst/063d28134c2dbff04142f7dd40c6dbbe/raw/e70f4dcb0cac7279cbd70caafbb33534a2e5fb0b/gistfile1.txt
21:55karolherbst[d]: but yeah.. lower_cf seems to turn those single use phis into phis with undefs
21:56karolherbst[d]: or I wonder if that's `nir_lower_phis_to_regs_block` doing actually..
21:59karolherbst[d]: yeah... that seems to be the case...
22:03karolherbst[d]: yeah.. so `nir_lower_phis_to_regs_block` turns those single source phis into regs with a single write and `nir_lower_reg_intrinsics_to_ssa_impl` turns those into undef/value phi things
22:55karolherbst[d]: yoooo...
22:55karolherbst[d]: okay..
22:56karolherbst[d]: that dropped GPRs usage by a lot
22:58karolherbst[d]: +30% fps
22:58karolherbst[d]: nice nice
23:02karolherbst[d]: Instruction count: 2806 -> 2641
23:02karolherbst[d]: Static cycle count: 164711 -> 159110
23:02karolherbst[d]: Max warps/SM: 8 -> 16
23:02karolherbst[d]: Spills to mem: 0
23:02karolherbst[d]: Spills to reg: 42 -> 3
23:02karolherbst[d]: Fills from mem: 0
23:02karolherbst[d]: Fills from reg: 39 -> 3
23:02karolherbst[d]: Num GPRs: 184 -> 104
23:02karolherbst[d]: SLM size: 0
23:03karolherbst[d]: Score: 1689 -> 2391 points (FPS: 84 -> 119)
23:03karolherbst[d]: okay.. let me fossil that 🙃
23:05karolherbst[d]: phomes_[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41042
23:09karolherbst[d]: it's funny, because with that I match nvidia blob performance there
23:54karolherbst[d]: yeah.... that's gonna makea difference across games..