00:05 esdrastarsis[d]: mhenning[d]: Maybe something like `NVK_PERFTEST=dlss`
00:31 mohamexiety[d]: wouldnt it be better for this to be handled at the proton level? :thonk:
00:32 mohamexiety[d]: but i guess it depends on how flaky/inconsistent it is currently
00:38 karolherbst[d]: phomes_[d]: are there CTS tests for `NVX_binary_import`? 😄
00:38 karolherbst[d]: (why am I even asking when I already know the answer)
00:38 zmike[d]: cts for vendor extensions haha
00:39 zmike[d]: karol you crazy
00:39 karolherbst[d]: look, sometimes the idealist and optimist shines through
00:40 karolherbst[d]: mhenning[d]: wondering if it would be helpful for shader constants
00:41 karolherbst[d]: or any ubo that's below 1k
00:46 esdrastarsis[d]: karolherbst[d]: misyl implemented a sample
00:47 mohamexiety[d]: yeah misyl wrote tests for it
00:47 esdrastarsis[d]: https://github.com/misyltoad/VK_NVX_binary_import-example
00:50 phomes_[d]: the sample is working fine. I also tested on bunch of games. Many work just fine but I found 3 games where specific presets are broken. Updating to newer DLSS versions fixed that. But then an even newer DLSS came out and broke a bunch of games again
00:52 phomes_[d]: so I don't think it is ready to release to all users yet, but behind an env var it should be easier to test and not break due to conflicts as much
00:55 mohamexiety[d]: https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/184 there's these too
00:55 mohamexiety[d]: but yeah i wonder why it keeps breaking
01:21 mhenning[d]: karolherbst[d]: you only get 256 bytes per bank x 8 banks, plus you can't load directly from memory, so it's a little niche for more general stuff
01:21 karolherbst[d]: ohh right..
01:21 karolherbst[d]: 256 ain't much indeed
01:21 karolherbst[d]: well 256 * 8
04:53 gfxstrand[d]: We probably only need 512 or so for root constants.
12:23 karolherbst[d]: phomes_[d]: I have a fun thing to performance test, but it's probably also entirely insignificant. Might checking the MR with and without that suggestion: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40293#note_3380547
12:30 phomes_[d]: I will give that a test in a short while
14:07 karolherbst[d]: should we remove the no_ugpr debug thing?
15:00 phomes_[d]: karolherbst[d]: I do see a small improvement. I want to repeat some of the tests to be sure, as there was a proton update in the middle
15:01 karolherbst[d]: mhhhh
15:02 karolherbst[d]: in theory accessing more memory _shouldn't_ be an issue because it's all cache lines anyway, but I'm also not entirely sure
15:03 karolherbst[d]: but also.. if it's small might just be random stuff or just within the normal error ranges
15:59 mhenning[d]: karolherbst[d]: why remove it?
16:00 karolherbst[d]: mhenning[d]: it's making the code I'm working on more annoying 😄 And I totally forgot it existence with the UGPR stuff
16:09 karolherbst[d]: okay.. I think `nak_nir_lower_non_uniform_ldcx` is too aggressive 🙃
16:10 karolherbst[d]: like we can pull from `cx[UGPR][GPR + immS16]`
16:10 mhenning[d]: I don't know what you mean
16:11 karolherbst[d]: like we can do `LDC cx[UGPR][GPR + immS16]`
16:11 karolherbst[d]: and instead of that it lowers to LDG
16:12 mhenning[d]: oh, right. yeah, that could make a difference
16:12 karolherbst[d]: like ULDC requires `cx[UGPR][UGPR + immS16]`, so for that it makes sense
16:12 karolherbst[d]: but then it's uniform anyway
16:12 karolherbst[d]: ehh
16:13 karolherbst[d]: `ULDC cx[UGPR][immS16]`
16:13 karolherbst[d]: U in that case
16:13 karolherbst[d]: but that goes into its own MR
16:25 karolherbst[d]: mhhhhhh
16:26 karolherbst[d]: so I wanted to make use of UGPR + GPR for non uniform ldcx lowering, however it turns out that the uniform address gets pulled from a ubo that got lowered to load_global_constantand then it just falls apart
16:28 karolherbst[d]: can we reliably assume that `load_global_constant*` ops _always_ come from such an ubo? In that case I should clean up my ULDC MR and then optimize on top of that and then a loot of things can stay in UGPRs it seems
16:28 karolherbst[d]: that pattern also prevents using `LDC cx[UGPR][GPR + immS16]`
16:30 mhenning[d]: no, load_global_constant is a general intrinsic, we can't treat them like they're always from ubos
16:37 karolherbst[d]: okay
16:37 karolherbst[d]: we I guess I'd have to add a new intrinsic or something for `nvk_nir_lower_descriptors`
16:38 karolherbst[d]: it uses `nir_load_global_constant_offset` atm for that
16:51 karolherbst[d]: `%r37 = ldc.b32 cx[{%ur134, %ur135}][%r36+0x2c]` well that wasn't too hard
16:52 karolherbst[d]: `r5 = ldc.b32 cx[ur2..4][r4+0x2c] // delay=1 rd:0 wr:1`
16:52 karolherbst[d]: yeah...
16:52 karolherbst[d]: maybe it's actually really easy to support this 😄
16:54 karolherbst[d]: instructions `168 -> 97` in this one shader
16:59 karolherbst[d]: Totals from 33513 (20.11% of 166664) affected shaders:
16:59 karolherbst[d]: CodeSize: 520102896 -> 503172656 (-3.26%); split: -3.42%, +0.17%
16:59 karolherbst[d]: Number of GPRs: 1821680 -> 1806204 (-0.85%); split: -0.90%, +0.05%
16:59 karolherbst[d]: SLM Size: 230948 -> 230876 (-0.03%); split: -0.05%, +0.02%
16:59 karolherbst[d]: Static cycle count: 597000669 -> 524699524 (-12.11%); split: -12.22%, +0.11%
16:59 karolherbst[d]: Spills to memory: 7010 -> 6845 (-2.35%); split: -2.67%, +0.31%
16:59 karolherbst[d]: Fills from memory: 7010 -> 6845 (-2.35%); split: -2.67%, +0.31%
16:59 karolherbst[d]: Spills to reg: 19136 -> 18984 (-0.79%); split: -2.42%, +1.63%
16:59 karolherbst[d]: Fills from reg: 16844 -> 16567 (-1.64%); split: -2.21%, +0.56%
16:59 karolherbst[d]: Max warps/SM: 1296100 -> 1305068 (+0.69%); split: +0.70%, -0.01%
16:59 karolherbst[d]: oof
17:01 karolherbst[d]: phomes_[d]: in case you feel like wanting to give it a go: https://gitlab.freedesktop.org/karolherbst/mesa/-/commits/nvk/opt/ubo_ldcx?ref_type=heads
17:04 karolherbst[d]: uhm..
17:04 karolherbst[d]: I think it's broken 😄
17:04 karolherbst[d]: maybe
17:04 karolherbst[d]: dunno actually
17:05 karolherbst[d]: mhh
17:05 karolherbst[d]: I need to pull the size somehow...
17:07 karolherbst[d]: _should_ maybe just put it there in nvk if we are using ldcx instead of load_global_constant
17:11 karolherbst[d]: where is nvk filling in the address of the descriptor set? Or can applications mess those up?
17:13 karolherbst[d]: anyway.. I _think_ this is potentially a good idea, but needs more work
17:20 mohamexiety[d]: phomes_[d]: if you have time could you give the linear modifier mr another quick try for any of the artifacting games?
17:38 phomes_[d]: mohamexiety[d]: still artifacts. Starting with 5d2b969d4688a49b0481cac1d617819fdf3ae448
18:09 mohamexiety[d]: phomes_[d]: https://gitlab.freedesktop.org/mohamexiety/mesa/-/commits/nvk-gstream-fixes-debug could you try this? it shouldnt artifact but should crash
18:12 mohamexiety[d]: (just pushed one more commit)
18:23 karolherbst[d]: okay.. we have other palces where we add the size on the shader side to the desc set address thing..
18:23 karolherbst[d]: I think this has potential
18:25 karolherbst[d]: but also I'm not _that_ familiar with the binding model and everything 🥲
20:05 phomes_[d]: mohamexiety[d]: I get no artifacts or crash. There are two new commits on top of the one that caused artifacts. The first one ("debug stuff") gets rid of the artifacts
20:12 mohamexiety[d]: :concern:
20:21 phomes_[d]: karolherbst[d]: I gave it a test. Some games work fine, others are completely black screen, and then there is Dirt rally 2.0 benchmark where the car is no invisible, but the driver just floats in the air
20:21 karolherbst[d]: 😄
20:21 karolherbst[d]: yeah....
20:21 mohamexiety[d]: phomes_[d]: sorry about that, but could you try now? tried using the stashed offsets and also checking for zcull offset
20:22 karolherbst[d]: I need to figure out how to deal with the descriptors, but there might be a simpler solution to get what I want
20:22 karolherbst[d]: I need to see _why_ that change helps, because it might just be a side effect of something else
20:27 phomes_[d]: mohamexiety[d]: still no artifacts but I get this in the log:
20:27 phomes_[d]: `MESA: error: before zcull bind, old offset_B: 4096 new offset_B: 0
20:27 phomes_[d]: MESA: error: after zcull bind, old offset_B: 14976 new offset_B: 0
20:27 phomes_[d]: MESA: debug: ../src/vulkan/wsi/wsi_common_x11.c:1956: Swapchain status changed to VK_SUBOPTIMAL_KHR
20:27 phomes_[d]: MESA: error: before zcull bind, old offset_B: 4096 new offset_B: 0
20:27 phomes_[d]: MESA: error: after zcull bind, old offset_B: 14976 new offset_B: 0
20:27 phomes_[d]: MESA: error: before zcull bind, old offset_B: 35389440 new offset_B: 0
20:27 phomes_[d]: MESA: error: after zcull bind, old offset_B: 36041856 new offset_B: 0 `
20:28 mohamexiety[d]: ok yeah that confirms it. thanks so much!