02:48 Advert[m]: https://t.me/+c-LC9ed_hBgwYmZh
08:45 gfxstrand[d]: gfxstrand[d]: karolherbst[d] Did you ever file a CTS issue?
09:30 airlied[d]: so I've got a gnome app which creates an 800x409 image, and I'm getting a pagefault directly after that image on a DMA copy operation. Now the problem is often there is something valid in the vma afterwards so it doesn't die straight away always
09:31 airlied[d]: any ideas on what might be going wrong with nil here? turning off compression seemed to fix it but it might be red herring as it's a messy race. I think I probably need to add some guard pages to the vma to test it
09:44 airlied[d]: okay creating a guard page hack, seems to trigger it more often
09:45 airlied[d]: hmm maybe my guard hacks need a bit more thinking though, crashes in the vma codd
10:47 karolherbst[d]: gfxstrand[d]: nope
10:56 gfxstrand[d]: airlied[d]: Compression seems like a likely culprit.
10:56 airlied[d]: We have reports going back a few fedora versions so I'm not sure compression lines up
10:57 airlied[d]: But I'll turn if off tomorrow in mesa and see, definitely the copy engine reading off the end
11:00 gfxstrand[d]: It’s possible it’s something else. Maybe something is linear and that’s hitting edge cases?
15:02 asdqueerfromeu[d]: I can confirm that the Minecraft Vulkan renderer works on :triangle_nvk: (because it's the default option on 26.2 and Vulkan generally prefers dedicated GPUs by default)
16:07 karolherbst[d]: maybe I should look into the prepass scheduling issue... that it sometimes picks wrongly and causes less warps than originally isn't great 🥲 but I also forgot in which shaders I was hitting this..
16:17 karolherbst[d]: getting some reviews on this MR would be nice, apparently it speeds up the surge 2 by 10%: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40904 or rather, I'd also like to know if it causes perf regressions in games...
16:19 karolherbst[d]: looks like spiderman remastered could be hit by this negatively...
16:20 karolherbst[d]: maybe I can figure out how to mitigate it there...
16:36 karolherbst[d]: ohh that one shader could benefit from more vectorization if we'd be smarter with constants..
16:40 karolherbst[d]: https://gist.githubusercontent.com/karolherbst/e4a00bd3ae19523c3775e6709cce0f87/raw/bf43d792c56bdd8fbd2af542c87a57c94278f5d8/gistfile1.txt
16:40 karolherbst[d]: and the other loads have the same pattern
16:41 karolherbst[d]: the `u2u64` in the middle is kinda annoying...
16:41 karolherbst[d]: and the `iadd` can obviously overflow...
16:42 karolherbst[d]: but solving that would defo bring down register pressure quite a bit
16:44 karolherbst[d]: but also loading the base into an UGPR would help getting rid of the iadd 🙃
16:45 karolherbst[d]: _but_ folding in the constant is one thing, but this can also be vectorized maybe? though I guess that relies on the same proof...
16:45 karolherbst[d]: and the alignment doesn't check out...
16:46 mhenning[d]: Yeah, align_mul=4 might be the issue
16:46 karolherbst[d]: me screaming: "this can be so much better code"
16:46 karolherbst[d]: well at least folding in the constant add to base would already help sooo much
16:46 karolherbst[d]: then it's all a + base
16:46 karolherbst[d]: and we drop tons of values
16:46 karolherbst[d]: and adds
16:46 karolherbst[d]: but that relies on the `iadd` to not overflow
16:48 mhenning[d]: yeah you might need to prove no_unsigned_wrap
16:48 karolherbst[d]: but that really should just be `%80` in a UGPR, `%84` in a GPR and constant offset 🙃
16:48 karolherbst[d]: mhenning[d]: hard if the offset comes from a load
16:49 karolherbst[d]: maybe there is more information on the deref side we are losing here...
16:49 karolherbst[d]: or something explicit_io could be doing better...
16:49 karolherbst[d]: like I'm sure it's just a deref chain initially
16:50 karolherbst[d]: the shift is a bit odd tho...
16:50 karolherbst[d]: wondering where that one comes from
16:51 karolherbst[d]: anyway, something to look into later.. I saw similar patterns in compute shaders, which was also the reason I started to look into UGPR + GPR encodings, because it really helps with that kind of code
16:51 karolherbst[d]: but it would also reduce the hit with sinking loads, because then it's just the same value as the base addresses...
16:51 karolherbst[d]: ~~I hate how everything is connected~~
16:58 pendingchaos[d]: if you're talking about the %90, %93 and %101 iadds, those can be proven as no_unsigned_wrap
16:58 pendingchaos[d]: %86 is at most 0xffffff80 because of the shift, and 0xffffffff - 0xffffff80 is 127
16:58 pendingchaos[d]: so iadd(%86, #b) is nuw if b <= 127
16:58 karolherbst[d]: ohhhhhhh
16:58 karolherbst[d]: right
16:58 karolherbst[d]: I had patches for this somewhere...
16:58 karolherbst[d]: yeah I totally missed that part 🙃
16:59 karolherbst[d]: yeah....
16:59 karolherbst[d]: I do think I have all the patches needed to pull it off, just needs people to review those 🙃
19:26 karolherbst[d]: CTSing https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40897 again, but I think that would be good to land before the branching (in two? days)
19:28 karolherbst[d]: I added a commit to enable copy prop across F16 and F16v2 as well
20:53 karolherbst[d]: I know we are doing `undef_to_zero`, but wouldn't be `undef_to_nan` be way more fun? 🙃
20:54 karolherbst[d]: though I wonder.. I think I've seen that happening somewhere
21:41 airlied[d]: gfxstrand[d]: indeed it's a linear texture for some reason
21:50 airlied[d]: aligning height to 4 fixes the problem of course, probably need to experiment on the blob
22:37 gfxstrand[d]: 😩
22:38 gfxstrand[d]: Is it a fault on read or write?
22:39 gfxstrand[d]: And do you know what unit is faulting? Render to linear is sketchy as hell on NVIDIA.
22:45 airlied[d]: It's the copy engine
22:47 airlied[d]: And looks like a write fault
22:48 airlied: gfxstrand[d]: https://paste.centos.org/view/raw/3a58208e
22:48 airlied: the fault is just after the OFFSET_OUT allocation
22:49 airlied: alloc 3ff41e5000 1310720
22:49 airlied: alloc va [0x3ff41e5000, 0x3ff4325000)
22:49 airlied: bind vma mem<0x20>[0x0, 0x140000) to [0x3ff41e5000, 0x3ff4325000)
22:49 airlied: the fault happens at 0x3ff4325000
22:53 gfxstrand[d]: Weird. Maybe we’re aligning in the wrong place somewhere? I can look at it but it’ll be a bit. I’m in .eu and should be sleeping
22:57 gfxstrand[d]: Oh, I wonder if the image size is calculated without the extra 3 pixels or whatever that we need in the bottom-right corner to make up the alignment and we’re copying height * stride?
22:58 gfxstrand[d]: That feels wrong, though. Alignments shouldn’t do that. They’re powers of two
23:13 mhenning[d]: Maybe we need to be clearing something in the linear case that we're not? eg. looking at the code we only set DST_ORIGIN_X in the non-linear case, maybe we need to reset it to 0?
23:39 gfxstrand[d]: That’s very plausible
23:45 mohamexiety[d]: is this normal linear render, or is this one of the cases where we use the tiled shadow?
23:59 airlied: not render at all, it's just a texture upload, for some reason gtk choses linear here