10:59snowycoder[d]: mhenning[d]: mhenning[d] How many cores and memory does your system have? Running my patch using `--num-threads 1` shows a 3% compile-time slowdown and on my 32-threads 64GiB workstation running with `--num-threads 32` gives a geomean slowdown of 6%.
10:59snowycoder[d]: My only guess is that the pass is memory-bound and really dislikes being run in parallel within a memory-constrained system?
11:46karolherbst[d]: Soo.. if I need to convert a Src from 32 to 64 bit inside `legalize`, what's the proper way of doing that?
12:59karolherbst[d]: Somebody want to do some game perf testing with this MR? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36113
13:01karolherbst[d]: phomes_[d]: up for it? Not sure it's like.. not causing issues, but it should give some good perf gains across the board
13:03phomes_[d]: I'm up for it. I could use a break from this shader compile crash in Enshrouded I am debugging
13:03karolherbst[d]: I should do fossil db stats actually 🙃
13:03karolherbst[d]: heh
13:03karolherbst[d]: I hope this MR doens't cause more crashes lol
13:05karolherbst[d]: also haven't tested it on anything besides my ampere card
13:07phomes_[d]: fossil db stats is probably a good idea too. The tests I do are more intended to track and understand the perf gap to the prop driver. So if the win does not result in more than a few fps then I might not see it
13:08karolherbst[d]: well.. gives me 5% in some testing
13:08karolherbst[d]: I'm just curious how much games are impacted
13:10karolherbst[d]: I have another opt I want to do later on top of it, but that one has to wait 😄
13:20phomes_[d]: `thread '<unnamed>' panicked at ../src/nouveau/compiler/nak/sm70_encode.rs:3056:39:
13:20phomes_[d]: called `Option::unwrap()` on a `None` value`
13:23karolherbst[d]: what game?
13:24phomes_[d]: X4 Foundations
13:24phomes_[d]: on Ada
13:24karolherbst[d]: okay.. hopefully I hit it on the fossils as well
13:27phomes_[d]: it is not specific to that game. I get it on everything
13:27karolherbst[d]: mhhhh
13:28phomes_[d]: wait. I rebased on main to have a baseline to compare to. I will try without a rebase
13:28karolherbst[d]: I rebased it on something from yesterday
13:28karolherbst[d]: so not sure that's gonna change anything
13:32phomes_[d]: I get the same error using your MR directly. I need to run soon but I can run the tests later tonight
13:35karolherbst[d]: yeah, I'm gonna find out what's wrong there
13:43karolherbst[d]: mhhhh
13:43karolherbst[d]: maybe it is something ada specific but that's like super weird?
13:45karolherbst[d]: still comparing, but for me it shows some impact: https://gist.githubusercontent.com/karolherbst/f250a8d3aca9e4743d25b780a0800c42/raw/f47ccfa9f4e2f95ba23a25b2a276b7d001436fa2/gistfile1.txt and doesn't crash
13:47karolherbst[d]: phomes_[d]: I pushed a new version tho
13:47karolherbst[d]: maybe that helps?
14:08karolherbst[d]: mhh yeah no crashes on my side
14:08karolherbst[d]: annoying
14:09karolherbst[d]: ehh wait a second...
14:10karolherbst[d]: there are apparently more shaders in the baseline 🙃
14:10karolherbst[d]: but I don't see any crashes
14:11karolherbst[d]: anyway, looks promising so far: https://gist.githubusercontent.com/karolherbst/070f62ca47263d56d2b13b0511edb256/raw/aaeb10f282a3d97c53320e59786cd16be2da473b/gistfile1.txt
14:39karolherbst[d]: ohh.. I forgot to enable the panic thing
14:39karolherbst[d]: oh well.. debugging is for later
16:30mhenning[d]: snowycoder[d]: Oh, I'm also running additional pipelines, it's possible that's making the difference
16:31mhenning[d]: but I have 12 physical cores (24 with hyperthreading, running with 12 threads) and 32 gb ram. I don't think it's swapping
17:27karolherbst[d]: phomes_[d]: fixed another crash and now I'm hitting yours 🙂
17:34karolherbst[d]: aaaaand fixed and pushed
17:53karolherbst[d]: If I use OpPin on a ugpr value, will it stay a ugpr no matter even while spilling is happeing?
17:53karolherbst[d]: I wonder if I want to use it on the ugpr sources of IO ops, because spilling those is most likely more expensive than spilling other values
17:54karolherbst[d]: especially if you have a 32 bit gpr + 64 bit ugpr and now need to spill it to a 64 bit add
17:54karolherbst[d]: congrats, you need 3 registers 🙂
17:55karolherbst[d]: (and a predicate)
17:56mhenning[d]: karolherbst[d]: yes, but it does that by making spilling illegal, so again beware that you can get into situations where spilling is impossible
17:57karolherbst[d]: yeah...
17:57karolherbst[d]: I'll play around with the idea and see if it's worth it
17:58karolherbst[d]: my gut feeling says yes, because the 32 + 64 combination happens a lot
17:58karolherbst[d]: and it saves so much
17:58karolherbst[d]: alternatively could add a flag to pin or something to make it "well but only if you really must"
17:59karolherbst[d]: or use a new op to declare hints to the spiller or something or just hard code it in the spiller 🙃
17:59mhenning[d]: Yeah, it might be better to just try and improve the spill heuristic
18:01karolherbst[d]: nice
18:01karolherbst[d]: no more crashes
18:01karolherbst[d]: https://gist.githubusercontent.com/karolherbst/070f62ca47263d56d2b13b0511edb256/raw/aec9fbb6fe7edfe62073050eb13df2700ba502d8/gistfile1.txt
18:02karolherbst[d]: I still have using ULDC for load_global_constant on my todo list... I wonder if I want to get that in first..
18:03karolherbst[d]: phomes_[d]: should be good to test now
18:04karolherbst[d]: some shaders see a `-60.00%` reduction in GPRs ðŸ˜
18:04karolherbst[d]: and like like the 24 ->16 type of change
18:04karolherbst[d]: 160 -> 64
18:06karolherbst[d]: final fantasy XV spilling: -89.60%
18:06karolherbst[d]: -44.68% SLM size 😄
18:07snowycoder[d]: mhenning[d]: You mean `nvk-fossils-foss`? I tested that too, that's the strange part 0_o.
18:08mhenning[d]: no, I have some shaders from closed source games here too
18:08karolherbst[d]: did you got access to be big db?
18:09snowycoder[d]: Ok I surely don't have those
18:09karolherbst[d]: doesn't seem like either of you have
18:11karolherbst[d]: might want to ask rhys or emma
18:12karolherbst[d]: my fossil run output files are like `222MiB` big with it 🥲
18:12snowycoder[d]: karolherbst[d]: How can I contact them?
18:12karolherbst[d]: IRC
18:12karolherbst[d]: I have no idea what the policies are around that repo tho
18:13karolherbst[d]: 1.1million shaders are in it it seems
18:13HdkR: I'm surprised the output is only 222MB
18:14karolherbst[d]: I mean it's just lines with shader stats
18:20HdkR: I'm still surprised it's only that big, must be fairly carefully pruned :D
18:20karolherbst[d]: 😄
18:20karolherbst[d]: yeah probably
18:28mhenning[d]: snowycoder[d]: Looking at the data, there are some foss shaders that are hard hit
18:28mhenning[d]: eg. both of the following more than doubled in compile time:
18:28mhenning[d]: pipeline hash file
18:28mhenning[d]: a27fb263ebd92591 fossils/parallel-rdp/uber_subgroup.foz
18:28mhenning[d]: 767c22f87dd208c6 ../nvk-fossils-foss/fossils/skia/skia-viewer.693e50b986f938d7.1.foz
18:29mhenning[d]: although I haven't double checked if that reproduces here yet
20:52phomes_[d]: karolherbst[d]: Lego Builders Journey improved from 28 fps to 30 fps
20:52phomes_[d]: but three games now have artifacts. Things not rendered. Here is an example from Beyond a steel sky:
20:54phomes_[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1418339434364928101/image.png?ex=68cdc308&is=68cc7188&hm=6a313359876b7d310222f8a618849834e3158411ecd13b9b26cf1ffc0b56059f&
20:55phomes_[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1418339626648731709/image.png?ex=68cdc336&is=68cc71b6&hm=1ebca2d0d059207ba0bc38129af209ecdef1214c49889fc8b6d0d34805f2a184&
20:56karolherbst[d]: the price you need to pay for more perf
20:56phomes_[d]: Beyond a steel sky, Atomic Heart, and Deep rock galactic are all affected
20:56karolherbst[d]: mhhhh
20:57phomes_[d]: like, all three games have stuff that is not being rendered. Only Lego builders journey improved in a measurable amount
20:57karolherbst[d]: hopefully a CTS run catches it
20:58phomes_[d]: is the MR bisectable? Would it help?
20:58karolherbst[d]:maybe
20:58sobadcaddy: Well actually i got into position where i could store small int's very well, didn't I? Hence there is an improvement over java/scala the least, so if you use chatgpt yourself, you would recognize that or what? Let's move back into history a bit. Does scala pack small ints limited to bitlength to the bitfield of machine word like as done with variable length decoder? So it is very unfortunate
20:58sobadcaddy: to have me, but very fortunate to have karolherbst kind of fecalist bullies around? 1fps addon prankster. Well i have no time to listen such crap honestly it disgusts me what a tone you use to present your idiots to the world.
20:58karolherbst[d]: I suspect `nak: address calc optimizations` breaks it
20:59karolherbst[d]: like all the other commits should be safe
20:59phomes_[d]: heh. Do I need the "HACK" patch perhaps?
20:59karolherbst[d]: but who knows
20:59karolherbst[d]: mhhhh
20:59karolherbst[d]: no
20:59karolherbst[d]: the HACK just prevents some infinite compilation loop
20:59karolherbst[d]: however
20:59karolherbst[d]: maybe it compilers forever? 🙃
20:59karolherbst[d]: *compiles
21:00karolherbst[d]: Anyway, I'm sure `nak: address calc optimizations` is the commit breaking stuff
21:00karolherbst[d]: though
21:00karolherbst[d]: who knows
21:00karolherbst[d]: feel free to bisect
21:04phomes_[d]: karolherbst[d]: it was indeed that one
21:06gfxstrand[d]: karolherbst[d]: Rendering less things is usually faster...
21:08mhenning[d]: gfxstrand[d]: Game Developers Hate This! Make Your Driver Faster With One Weird Trick
21:09karolherbst[d]: Right.. because I'm not in the mood of debugging games, (or use this new fancy shader bisect tool) I hope the CTS gives me failues 😄
21:09karolherbst[d]: but thanks for testing!
21:10ristovski[d]: theres a new fancy shader bisect tool?
21:13gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37468
22:19karolherbst[d]: mhhh, found a CTS test failing, but not really seeing why my changes are causing it?...
22:22karolherbst[d]: ohhh mhhh
22:22karolherbst[d]: I think it's not fucked up addresses, but fucked up value written with stores
22:27gfxstrand[d]: Hey, all! I need topics for my nouveau/NVK update talk. If you have anything you'd like called out, please speak up.
22:27karolherbst[d]: compute perf stuff 😛
22:36HdkR: Tegra support :D
22:56karolherbst[d]: mhhhhhhhhh
22:56karolherbst[d]: sooo
22:56karolherbst[d]: something is up with the .32 form
22:57karolherbst[d]: `STG.E.STRONG.GPU [R8.64+UR0], R1 ;` works, `STG.E.STRONG.GPU [R5.U32+UR0], R1 ;` doesn't
22:57karolherbst[d]: I _wonder_ if the offset address still needs to be 64 bit aligned...
22:58karolherbst[d]: `PRMT R6, RZ, RZ, RZ ;`?!?
22:58karolherbst[d]: sometimes NAK does funky thing
22:59karolherbst[d]: but it's still weird
22:59karolherbst[d]: it feels like something else is going wrong
23:02karolherbst[d]: oh no...
23:02karolherbst[d]: oh no oh no oh no
23:02karolherbst[d]: please don't be it as silly as I think it is
23:04karolherbst[d]: I should do less hand-waving when prototyping stuff like this 😄
23:05karolherbst[d]: no actually.. I thought it's some weird swizzle thing but that's not it either...
23:08karolherbst[d]: huh.. so all stores end up with the same address, but _how_
23:09snowycoder[d]: It's a feature, Improved cache efficiency!
23:09karolherbst[d]: okay it's something else 😄
23:10karolherbst[d]: okay.. it's `nir_opt_offsets`
23:11karolherbst[d]: ahhhhhhhh
23:11karolherbst[d]: `@store_global_nv (%7, %12, %15) (base=12, access=writeonly, align_mul=16, align_offset=12, wrmask=x)`
23:11karolherbst[d]: I ignored the `base` 🙃
23:12karolherbst[d]: I swear...
23:14karolherbst[d]: phomes_[d]: I've pushed a new version that shouldn't be broken anymore 😄
23:17karolherbst[d]: mhenning[d]: yeah... I need to find a solution for this `if !block_uniform {` path, because it does prevent the GPR+UGPR form sometimes 😢
23:18karolherbst[d]: peak STG: `st.global.a64.strong.gpu.b32 [r5+ur0..2+0xc] r4 // delay=2 rd:0`
23:19karolherbst[d]: the CTS test I was debugging: https://gist.github.com/karolherbst/a1efc364e8d56147fc1f20fb4fc1e647
23:19karolherbst[d]: that's with the `if !block_uniform {` nuked
23:19karolherbst[d]: otherwise the ugpr gets turned into gpr and it sucks
23:20phomes_[d]: beyond a steel sky is fixed. Deep rock galactic now crashes with:
23:20phomes_[d]: `thread '<unnamed>' panicked at src/compiler/rust/libcompiler.rlib.p/structured/nir.rs:274:9:
23:20phomes_[d]: assertion failed: idx > 0`
23:20karolherbst[d]: ....
23:20karolherbst[d]: *sigh*
23:21karolherbst[d]: mhhh
23:21karolherbst[d]: really?
23:21karolherbst[d]: huhu
23:22karolherbst[d]: phomes_[d]: mind sharing a full stacktrace?
23:23karolherbst[d]: did I mess up my patch? mhh
23:23karolherbst[d]: ohhhhhhhh
23:23karolherbst[d]: wait a sec
23:23karolherbst[d]: I know what's up
23:23karolherbst[d]: silly me
23:25karolherbst[d]: phomes_[d]: pushed a fix
23:27karolherbst[d]: CTS also looks clean now
23:28karolherbst[d]: I know, bold statement after 1000 tests
23:32karolherbst[d]: there are a few shader stat regressions I still want to look at, maybe I'll manage to figure some of them out tomorrow
23:34phomes_[d]: all there games that broke are now working. Lego builders journey still wins performance but less than the first version
23:34karolherbst[d]: mhhh, yeah...
23:34karolherbst[d]: FFXV should see big wins
23:39karolherbst[d]: why do I get divergent `load_const`s 🙃
23:40karolherbst[d]: ...
23:40karolherbst[d]: they were dead
23:42karolherbst[d]: phomes_[d]: though games seeing reduced performance would also be interesting to know about, but if it's just a single game for now, maybe I need to clean up all the regressions first.. though it should still see a bit of perf improvements everywhere still..
23:44phomes_[d]: no reduced performance. Just that the lego game gained 2 fps with the first version of your MR and now only gains 1. But I will retest them all and also update the baseline as that is a few days older
23:49karolherbst[d]: okay, nice