00:27 karolherbst[d]: right.. I wanted to at least improve the occupancy calculation and shader stats in nvk to see how many games are impacted here, but I also think it could help the instruction scheduler if it knows that it can use more registers without hurting occupancy
00:30 karolherbst[d]: but I already saw a bunch of shaders where it would help quite a bit, just not quite sure how much in total
03:43 UnknownArtistBot: Hi all! check this out, An Unknown Artist - https://www.anunknownartist.com/
03:48 HdkR: Wow, Spam that's quite considerate.
09:59 phomes_[d]: I bisected the recent performance drop in Serious Sam to https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39679
10:02 phomes_[d]: before it was running at 92% of prop. It seems to hit whatever is slow less than other games
10:56 marysaka[d]: phomes_[d]: this is very weird what Serious Sam is this?
10:56 phomes_[d]: Serious Sam Fusion 2017
10:57 marysaka[d]: we should have vk_x11_ignore_suboptimal set for Serious Sam on Fusion
10:57 phomes_[d]: it is set via driconf. Removing that also gets perf back
10:57 marysaka[d]: ah wait it changed suboptimal behavior
12:03 karolherbst[d]: mhenning[d]: any plans to continue with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38319 ? If you want I could pick it up and work in all the changes I've been working on to improve address calcs based on uub
14:11 mhenning[d]: karolherbst[d]: You can pick it up if you want. I probably won't get back to it very soon.
14:11 karolherbst[d]: okay
14:13 karolherbst[d]: the benefit seems to be way bigger with the fossils I have
14:13 karolherbst[d]: `SLM Size: 1203000 -> 1200276 (-0.23%); split: -0.23%, +0.00%` across all
14:13 karolherbst[d]: but that might be also because others improved uub quite a bit recently
14:54 gfxstrand[d]: phomes_[d]: What path are we hitting with that? Linear? If so, we might get direct scanout but we're eating a blit to do it.
23:20 loryruta[d]: guys i've hit an _issue_ which might be interesting, you could've already encountered
23:20 loryruta[d]: so i was rewriting a backend of a software that was on opencl, in vulkan with slang (=> spv)
23:21 loryruta[d]: i'm testing it with an algorithm that fires 10000 sequential computations and prints out the result as well as intermediate values
23:21 loryruta[d]: i have the same algorithm written for cpu
23:22 loryruta[d]: i tested it on Desktop (NV RTX 3090) and on Mobile (Adreno 830)
23:22 loryruta[d]: so what happens is:
23:22 loryruta[d]: - cpu (desktop) = cl (desktop) = cpu (mobile) = cl (mobile) = vk/spv (mobile)
23:22 loryruta[d]: - vk/spv (desktop)
23:23 loryruta[d]: on vk desktop, there's a small error which accumulates and diverges the algorithm at some point
23:24 loryruta[d]: so my question is: what's the nv driver doing? 😅 I've tried adding NoContraction to every line of the spv, FPFastModeDefault None, RoundingModeRTE 32 and SignedZeroInfNanPreserve 32
23:24 loryruta[d]: none of them helped (they didn't even change the values...)
23:31 karolherbst[d]: nvidia is pretty aggressive with optimizations and depending on the instruction you might just rely on higher precision guarantees
23:37 loryruta[d]: 🙁
23:38 loryruta[d]: but i don't quite understand... those optimizations eventually lead my program computations to diverge or to converge to a different result
23:39 loryruta[d]: i'm afraid that's another sign that nvidia want people to do gpu computing on cuda rather than anything else....
23:40 loryruta[d]: small errors might be fine for visualization/graphics but not for e.g. gradients and optimization :/
23:55 sonicadvance1[d]: Ah the joys of learning about floating point precision and error tolerance
23:57 loryruta[d]: sonicadvance1[d]: here is more about learning what’s in the mind of nvidia developers
23:57 sonicadvance1[d]: As long as what they do is within tolerance as specified in spir-v then it's all good.
23:58 loryruta[d]: so i’m afraid spirv is not the right tool for gp gpu computing?
23:58 loryruta[d]: in the sense that is not a replacement for opencl
23:58 sonicadvance1[d]: It can be fine as long as you've reviewed all the ALU operations being used to fit within your error margins
23:59 sonicadvance1[d]: Because OpenCL has the same problem
23:59 sonicadvance1[d]: Float is a hot mess even if you're just going to compare CPU results