00:27 karolherbst[d]: Totals from 50621 (4.17% of 1212873) affected shaders:
00:27 karolherbst[d]: CodeSize: 1605273744 -> 1621029728 (+0.98%); split: -0.34%, +1.32%
00:27 karolherbst[d]: Number of GPRs: 4673586 -> 4067935 (-12.96%); split: -12.97%, +0.01%
00:27 karolherbst[d]: SLM Size: 263428 -> 258176 (-1.99%)
00:27 karolherbst[d]: Static cycle count: 2599838439 -> 2586392435 (-0.52%); split: -1.11%, +0.59%
00:27 karolherbst[d]: Spills to memory: 23512 -> 15527 (-33.96%)
00:27 karolherbst[d]: Fills from memory: 23512 -> 15527 (-33.96%)
00:27 karolherbst[d]: Spills to reg: 64590 -> 57328 (-11.24%); split: -13.83%, +2.58%
00:27 karolherbst[d]: Fills from reg: 55559 -> 44319 (-20.23%); split: -22.66%, +2.42%
00:27 karolherbst[d]: Max warps/SM: 1189396 -> 1347600 (+13.30%)
00:27 karolherbst[d]: yeah....
01:44 esdrastarsis[d]: karolherbst[d]: In which game?
01:46 karolherbst[d]: benchmark, pixmark_piano
01:46 karolherbst[d]: but that's like a single shader really
08:12 gfxstrand[d]: Wow. Good find! Now we just have to figure out if it’s correct and/or how to make it correct.
10:05 mohamexiety[d]: Why does it make such a big difference?
10:15 gfxstrand[d]: We do too much control flow, probably.
11:27 karolherbst[d]: undefs
11:28 karolherbst[d]: most of the GPRs removed were undefs getting a GPR allocated for some reason and RA does a poor job aliasing with the phi
11:28 karolherbst[d]: or rather.. most of the shaders helped massively, also have tons of undefs
11:29 karolherbst[d]: but getting rid of the phis also helps copy propagation, which was the original thing I was looking at 🙃
11:31 karolherbst[d]: gfxstrand[d]: what kind of example are you looking for there? Just a shader where it helps that isn't too huge?
11:33 gfxstrand[d]: Yup
11:33 gfxstrand[d]: So we can see what case it helps
11:33 karolherbst[d]: ahh yeah, it's just undef stuff really
11:33 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1495386917972217886/IMG_2244.jpg?ex=69e60f18&is=69e4bd98&hm=b84c03f921dc86564ffb194c5bf8838c41d1c46c0c9dc5052f412ab41c773b00&
11:33 gfxstrand[d]: But also, don’t expect too much from me toss week.
11:34 karolherbst[d]: it's kinda simple.. single source phi -> converted to reg -> lowered to SSA, now you have phi val, undef and NAK doesn't like it too much
11:36 karolherbst[d]: anyway, will comment on the MR with an example
11:38 karolherbst[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41042#note_3432143
11:41 karolherbst[d]: maybe the better solution would be improve NAK in regards to those undefs, but from what I've seen those were pretty annoying in the sense of they usually go multiple levels deep
11:43 karolherbst[d]: ehh maybe I should have picked an example where it actually helps GPRs significantly and not just a bit 🙃
11:45 karolherbst[d]: but those aren't simple 🙃
11:50 karolherbst[d]: ohh I found a good one
11:58 karolherbst[d]: I should test borderlands 3 with this, because I have the suspicion that it ran into a similar perf problem
12:15 karolherbst[d]: well not helping much that one...
13:13 karolherbst[d]: gfxstrand[d]: worst case I adjust the change to only do it on phis with a divergent source...
13:13 gfxstrand[d]: Yeah.
13:14 karolherbst[d]: or we just make `nak_nir_mark_lcssa_invariants` smarter if there are gaps
13:14 karolherbst[d]: but I honestly haven't found any...
13:14 gfxstrand[d]: The case where it breaks is when you have a uniform destination but the source comes from a non-uniform block. Then we need a copy for the R2U
13:15 karolherbst[d]: yeah but `nak_nir_mark_lcssa_invariants` already takes care of that, no?
13:15 gfxstrand[d]: Probably. I’d need to read it back into cache
13:16 karolherbst[d]: well it inserts those `as_uniform` intrinsics
13:16 karolherbst[d]: but I don't know if NAK could be overeager there...
13:17 karolherbst[d]: anyway, it makes NVK match nvidia's performance in a shader heavy benchmark, therefore I'm super sure it's correct 😛
14:33 jStefan: Hello, if I can't get nouveau working on my system (privring errors). Is there hope this can be solved by tweaking settings, or just an incompatibility ?
14:42 jStefan: and by incompatibility i don't mean the card itself (fermi), i mean the system as a whole, maybe from the motherboard/chipset
14:43 karolherbst: jStefan: is the GPU the only one in the system?
14:43 karolherbst: because if it's a fermi you might be better off not using it anyway..
14:44 karolherbst: but it kinda depends on the error you are facing there specifically..
14:45 jStefan: yes, only one gpu. if it had an igp i would have preferred that
14:46 jStefan: the system is old, i use it as a headless server, i tested ubuntu server 26.04 and reverted to 22.04 for testing as well.
14:49 jStefan: https://pastebin.com/VZtmNtZu
14:51 karolherbst: jStefan: mhh.. feels like a VBIOS parsing bug maybe? It seems to be stuck initializing display outputs
14:53 jStefan: What could I do about that?
14:55 karolherbst: file a bug with your full dmesg attached running with nouveau.debug=disp=trace and your "/sys/kernel/debug/dri/0/vbios.rom" file attached
14:56 jStefan: will do, ty