00:24karolherbst[d]: oof.. I have a shader that goes from 100 blocks down to 4 🙃
00:53karolherbst[d]: we should run more opts after `nak_nir_lower_non_uniform_ldcx`...
00:57mhenning[d]: A lot of opts are illegal after nak_nir_lower_non_uniform_ldcx, since you can't change block uniformity
00:57karolherbst[d]: yeah.. cse helps
00:57karolherbst[d]: even cse?
00:57mhenning[d]: cse is probably fine
00:57karolherbst[d]: I found a shader where a lot of htose blocks had the same source buffer
00:57mhenning[d]: mhh actually I need to think about that more
00:57karolherbst[d]: and the address calc got duplicated all over again
00:58karolherbst[d]: remove like 200 instructions from a 578 instruction shader
00:58karolherbst[d]: 250 actually
00:58karolherbst[d]: so would be really good to at least be able to run CSE
01:00karolherbst[d]: that shader was originally 870 instructions, but the input pred on ldg helped bring it down by 300 as well 🙃
01:01karolherbst[d]: mostly due to the bssy+bsync pairs
01:02karolherbst[d]: anyway.. if we can't run CSE, we might want to cache addresses or something.. dunno
01:15karolherbst[d]: anyway.. tomorrow I kind of plan to look into lowering ssbos to bounded variants.. that's gonna be fun, especially because I only want that for loads...
01:15karolherbst[d]: but I guess I could lower bounded stores ...
01:17karolherbst[d]: but not sure what the additional source of bounded even is.. I hope it's a boolean
01:18karolherbst[d]: ohh there is `has_load_global_bounded` mhh
01:18karolherbst[d]: nice
01:19karolherbst[d]: ohh it only exists for loads anyway.. that will make things a lot easier then
01:34mhenning[d]: karolherbst[d]: no, the additional source is the buffer size
01:34karolherbst[d]: pain..
09:28karolherbst[d]: ohh but I get the base + offset + size.. yeah that should be good enough
09:46karolherbst[d]: okay, that was simpler than expected
18:40karolherbst[d]: Totals:
18:40karolherbst[d]: CodeSize: 9419632496 -> 8681676992 (-7.83%); split: -7.84%, +0.00%
18:40karolherbst[d]: Number of GPRs: 47360657 -> 47568342 (+0.44%); split: -0.15%, +0.58%
18:40karolherbst[d]: SLM Size: 5409320 -> 5409220 (-0.00%); split: -0.02%, +0.02%
18:40karolherbst[d]: Static cycle count: 5994911052 -> 4724641933 (-21.19%); split: -21.19%, +0.01%
18:40karolherbst[d]: Spills to memory: 44482 -> 44903 (+0.95%); split: -1.74%, +2.69%
18:40karolherbst[d]: Fills from memory: 44482 -> 44903 (+0.95%); split: -1.74%, +2.69%
18:40karolherbst[d]: Spills to reg: 184873 -> 146402 (-20.81%); split: -22.95%, +2.14%
18:40karolherbst[d]: Fills from reg: 223943 -> 168316 (-24.84%); split: -26.55%, +1.71%
18:40karolherbst[d]: Max warps/SM: 50638772 -> 50561816 (-0.15%); split: +0.03%, -0.19%
18:40karolherbst[d]: 🙃 there is a bug in it isn't there
18:55karolherbst[d]: phomes_[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40272 the hack but in proper
19:07karolherbst[d]: ehh it's broken
19:07karolherbst[d]: nvm
19:24karolherbst[d]: okay, fixed now 😄
19:24karolherbst[d]: I forgot a `/ 8`
19:26mhenning[d]: karolherbst[d]: classic
19:27karolherbst[d]: `Pass: 47180, Crash: 6, Warn: 1, Skip: 49813, Duration: 3:26, Remaining: 1:39:30`
19:27karolherbst[d]: I just saw your comment
19:27karolherbst[d]: so you are telling me, there is even better stats to get from this 🙃
19:27karolherbst[d]: *are
19:27mhenning[d]: potentially yes
19:28karolherbst[d]: somebody should figure out instruction predication next... but I'm impressed how much it matters for loads alone
19:29mhenning[d]: Faith and I both have old braches with attempts at that. eg. there's https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33683
19:34karolherbst[d]: yeah... no idea what's a good approach there.. codegen simply folded if/else trees into predication. I guess that MR does something similar?
19:36mhenning[d]: Yeah, that MR folds `if`s after RA. Faith favors doing it before RA
19:36karolherbst[d]: mhhhhhhhhh
19:36karolherbst[d]: I'm leaning towards that it has to be done post RA
19:37karolherbst[d]: like once you nest things enough, and also do crazy things like loop merging, running out of predicates happens often enough that not knowing if you have available predicates or not will hurt
19:37karolherbst[d]: probably
19:37karolherbst[d]: like I've seen shaders by nvidia that used 3-4 predicates just for control flow stuff
19:37karolherbst[d]: and spilling that mess is going to hurt
19:37mhenning[d]: well, we're not doing anything that crazy yet
19:38karolherbst[d]: on the other hand.. doing it pre RA could mean "I reserve those 3 predicates for control flow"
19:38karolherbst[d]: and then you only spill preds from alu ops
19:38karolherbst[d]: uhm.. for alu ops
19:38mhenning[d]: plus SSA RA means that you can tell if you're going to spill before you get to RA
19:39karolherbst[d]: yeah..
19:39karolherbst[d]: I think predicated control flow is gonna be a bigger benefit than keeping alu pred values as pred...
19:39karolherbst[d]: probably
19:39karolherbst[d]: like the benefit isn't just ditching branches, but also barrier synchronization + scheduling
20:04karolherbst[d]: Maybe I should add a `has_load_global_constant_bounded` flag so we can get rid of the code in nvk_shader.c...
20:04karolherbst[d]: yeah.. I think I like that
20:05karolherbst[d]: then we'd only need to lower `nir_intrinsic_load_global_constant_offset` but that's trivial
20:29phomes_[d]: karolherbst[d]: nice. I can give it a test now if it is ready
20:29karolherbst[d]: yeah should be
20:29karolherbst[d]: CTS run looks quite good
20:48karolherbst[d]: phomes_[d]: the crashes that you are seeing, are those like real game crashes or just asserts being hit somewhere?
20:49karolherbst[d]: ohh.. I probably have to remove `nir_metadata_loop_analysis` ...
20:50phomes_[d]: deep rock galactic just crashes without any assert or message in dmesg
20:51karolherbst[d]: mhh
20:51phomes_[d]: builders journey had: src/vulkan/runtime/vk_pipeline_cache.c:680: vk_pipeline_cache_destroy: Assertion `cache->object_cache->entries == 0' failed.
20:51phomes_[d]: let me just clear the cache and try again
20:52karolherbst[d]: mhhh
20:52karolherbst[d]: phomes_[d]: I've pushed a fix to the branch, that might help
20:52karolherbst[d]: but not sure
20:52karolherbst[d]: apparently I have a handful of test crashing to due an assert
20:54karolherbst[d]: ohh also maybe remove the WIP commit I pushed, because I didn't want to push that one 🥲
20:54karolherbst[d]: but that one should be harmless
20:54karolherbst[d]: but who knows
20:56karolherbst[d]: I've removed that one from the MR anyway
20:56karolherbst[d]: ahh lego builder doesn't crash anymore? nice 😄
20:57phomes_[d]: I am rerunnig things now
20:57karolherbst[d]: there are a few games that show a 20% reduction in cycles, but doesn't seem like any of them are the ones you test
20:58karolherbst[d]: well in 1.5 hours I'll know if there are any other CTS issues, but it looks fairly clean..
21:04karolherbst[d]: phomes_[d]: mhhhhh... mind running that one with a debug build later?
21:04karolherbst[d]: but +10% in lego builder is nice 🙂
21:05phomes_[d]: I will run the full test and then bisect the commit in debug build
21:09karolherbst[d]: wondering what's up with x4...
21:09karolherbst[d]: like generally I mean
21:09karolherbst[d]: seems kinda slow
21:10karolherbst[d]: xcom2 as well.. mhh
21:11karolherbst[d]: phomes_[d]: well bisect doesn't really help as much and more why it crashes 😄
21:11karolherbst[d]: like it's either `6e7128a9d60e2229710c8a5c92d28fa277a84eac` or `475ecd6425db5272ab954f080aa5b57ca6bc18bc` and both kinda look correct, so I'm kinda curious what's up there
21:12karolherbst[d]: anyway
21:13phomes_[d]: problem is that I don't get any assert warning or anything. So the bisect is the best hint I can provide right now
21:14karolherbst[d]: even with a debug build?
21:15karolherbst[d]: mhh.. maybe a gdb stacktrace would also help
21:18phomes_[d]: sorry the crash seems unrelated to your MR. It crashes even before it. Could be a game or VKD3D-proton update. I will look into that, but its not in your commits
21:19karolherbst[d]: okay
21:19karolherbst[d]: thanks for verifying!
21:22phomes_[d]: karolherbst[d]: for x4 I did try to dive it to it earlier: https://discord.com/channels/1033216351990456371/1034184951790305330/1474519331592474646
21:23karolherbst[d]: ohh so something with depth only stuff
21:23karolherbst[d]: interesting
21:25phomes_[d]: they were the worst offenders but several other things were slow as well. But this is trying to use renderdoc as a profiler and it is not built for that
21:26karolherbst[d]: right..
21:30phomes_[d]: btw thank you for the help the other day with https://gitlab.freedesktop.org/mesa/mesa/-/issues/14993. I was looking more into it and noticed that I had messed up your patch (I applied it by hand). So I had ended up doing `NIR_PASS(_, nir, nir_lower_variable_initializers, 32);` instead of what your patch had. It fixed the perf issue but caused all kinds of other problems.
21:31karolherbst[d]: oof
21:31karolherbst[d]: so with the proper thing it's better?
21:33karolherbst[d]: mhenning[d]: ohh.. yeah, I don't know if you saw, but we have a situation with constant initializers on private memory and it causes massive perf issues 🙃
21:34karolherbst[d]: not sure I'll have time to look into it
21:34karolherbst[d]: but the tldr is that the shader constants aren't extracted and we end up with tons of local memory accesses
21:34phomes_[d]: the proper thing did not change anything perf wise
21:35karolherbst[d]: phomes_[d]: generally or for that single shader?
21:35karolherbst[d]: though I'm not sure if that patch is even correct 🥲
21:35phomes_[d]: I only checked that specific shader
21:35karolherbst[d]: okay..
21:35karolherbst[d]: yeah then the fix needs to be a bit more complicated
21:38karolherbst[d]: this is very early in the compilation pipeline so I'm rather confused why that stuff happens in the first place
21:39phomes_[d]: btw what I did was to run `NAK_DEBUG=print ./nvdump fs.spv` and check if it still had all the mov's. Not sure there is a better way.
21:39karolherbst[d]: that's ifne
21:40karolherbst[d]: maybe I get bored tomorrow and I'll take a look, because that's totally not great what's happening there 😄
21:47_lyude[d]: (cc: airlied[d] )
21:47_lyude[d]: So I've been looking more at the flickering screen issue on my desktop back at home now that I've got somewhat of a lead regarding what might be happening finally. It looks like that occasionally we are running into situations where we try to assign a framebuffer to a display (I'm not totally sure if it's a cursor framebuffer or just primary scanout buffer), but then we end up failing in
21:47_lyude[d]: `nouveau_user_framebuffer_create` with error code `-ENOENT`, which seems most likely to be coming from this line:
21:47_lyude[d]: gem = drm_gem_object_lookup(file_priv, mode_cmd->handles[0]);
21:47_lyude[d]: if (!gem)
21:47_lyude[d]: return ERR_PTR(-ENOENT);
21:47_lyude[d]: I'm thinking this means one of either two things:
21:47_lyude[d]: * Somehow we're missing a `drm_gem_object_get()` (or whatever the function is called) somewhere
21:47_lyude[d]: * This is the one I'm trying to get feedback on to see if this makes sense or not: is it possible we're firing off a fence too early, which results in userspace releasing the gem object before its actually done with it?
21:49_lyude[d]: The reason the second explanation popped in my head is because I filed a bug with nvk a while back regarding PrusaSlicer not displaying properly, and I think the behavior there is actually quite similar. Some of the time it works perfectly fine, but every now and then I'll find a tab in the application or a window will pop up where things seem to display properly, but the surface being displayed
21:49_lyude[d]: never actually updates. So e.g. I might get a dialog that pops up asking me what to name a job I'm sending to my printer, but pressing buttons on the dialog or typing something into it doesn't actually update the dialog on screen
21:49_lyude[d]: which makes me wonder if both of these bugs are actually comign from the same issue
21:53_lyude[d]: I feel like if the first one was true we'd probably be seeing this bug a lot more often then I currently am
22:31airlied[d]: Could also be a framebuffer reference problem not just a gem object, would be good to dump out some info on what size etc when we get that enoebt