01:37 airlied: has anyone ever seen or heard of shared memory allocator that works along the lines of a register allocator
01:37 airlied: I've got a bunch of temporary internal shared memory usage, and I want to optimise it's usage to the lifetimes of the users
05:12 karolherbst: airlied: ont eh spirv side it's all derefs, right? I wonder if it would be enough to track in which blocks a shared memory variable is alive and make lower_explicit_io a bit smarter there
05:16 airlied: karolherbst: there is all internal to the compiler, spir-v doesn't see it at all
05:16 karolherbst: ohhh, so it's for internally created variables? Well could do the same for those
05:17 airlied: yes but I'd like to have the variable reuse shared memory space if they don't overlap
05:17 karolherbst: yeah, but I don't think we have anything that tracks live ranges for variables
05:18 airlied: (of course I have to get it working without that first, but that would be the next step)
05:19 karolherbst: anyway, I don't think this is doable in an RA fashion, because you kinda have to do this before IO lowering the derefs
08:43 pendingchaos: airlied: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33914
11:00 airlied: pendingchaos: indeed that is what I'm looking for, I'll take a closer look tomorrow
12:43 FireBurn: robclark: The A840 issue with KGSL - showing a checker board, I'm guessing it's some sort of tiling issue, I've tweeked a few places that I think might have been causing it to no avail (microtile options), any suggestions on where else might be doing tiling and I'l tweek those next
13:21 karolherbst: pendingchaos: oh yeah, that looks like what I had in mind. I'd be interested to see something like that for NVK/NAK as well, but not sure how many shaders actually use enough shared memory to hurt occupancy besides a few very specific ones..
13:45 mareko: snowy_coder2: mediump IO is not fully enabled in radeonsi due to lots of CTS failures when it's enabled
13:46 mareko: are uregs in NAK the same as SGPRs?
13:48 mareko: snowy_coder2: the mediump IO implementation in both the GLSL compiler, NIR, and radeonsi may be missing things
14:05 snowy_coder: mareko: It seems (somewhat) working when trying radeonsi locally, it lowers outputs of FS
14:09 glehmann: mareko: uregs are poor man's sgprs because they can't be written in divergent control flow as far as I understand
14:10 glehmann: unlike sgprs, they are not used for booleans though, nv has seperate predicate registers for those
14:18 mareko: snowy_coder: mediump VS inputs and FS outputs are supported; the thing that doesn't 100% work is everything between them
14:18 mareko: actually only FS outputs enable it
14:19 mareko: that's the least useful case of mediump
14:19 mareko: we also enable it for ALU and intrinsics
14:20 mareko: not shader IO though
14:21 snowy_coder: mareko: That figures out, I tried to enable it and code-motion just pulled f2f32 back in the VS
14:22 snowy_coder: There's no other driver that does mediump varyings?
14:22 karolherbst: not sure if it's actually true that you can't write to UGPRs in non uniform control flow
14:22 karolherbst: but it does make writing a compiler harder if you assume you can
14:27 glehmann: I guess you can write them, but only in one path until the next convergence point or something like that?
14:30 robclark: FireBurn: you have something along the lines of https://gitlab.freedesktop.org/robclark/mesa/-/commit/2da45c609845854491c09bd26d9d8245834ea4af ?
14:38 mareko: snowy_coder: only radeonsi sets the lower_mediump_io callback, so no other driver does it in the GLSL linker, though some drivers might do it after linking
14:38 mareko: or some drivers might do it only for Vulkan
14:39 snowy_coder: mareko: Allright, thank you!
14:43 mareko: code motion pulling f2f32 back into the VS is OK as long as the total size of IO doesn't increase
14:48 mareko: one improvement would be to add another subpass into nir_opt_varyings that runs after code motion and reduces bit sizes of IO by looking for upconversions before output stores and bits used of input loads
15:55 snowy_coder: mareko: code-motion pulling back only f2f32 is a bit wasteful (even with the later pass, it would pull back f2f32, restore 32-bits for the IO, then re-lower the IO, recreate a f2f32 in the FS and optimize the f2f16-f2f32 VS pair away)
15:56 mareko: how is it wasteful?
15:56 snowy_coder: My general idea was for code-motion to "ignore" float conversions (unless it moves other instructions too)
15:57 snowy_coder: "how is it wasteful" Not wasteful, but a bit of redundant work? We can skip it if we're trying to move just a f2f32 (or i2i32)
16:02 mareko: we could change it to move only the src of upconversions
16:05 snowy_coder: what do you mean?
16:22 mareko: the code doesn't just move instructions, it moves a def-use graph identified by the deepest movable def-use post-dominator; if the post-dominator is an upconversion, it can instead say that the post-dominator is the src; then if the src is an input load, do nothing since there is nothing to move
16:25 FireBurn: robclark: Yes
16:28 snowy_coder: mareko: Yes that's I meant for "only f2f32", sorry.
16:28 mareko: basically if the post-dominator candidate for code motion is a unary upconversion, select the src of that to be the new candidate, and repeat until the candidate is not a unary upconversion since there could be a sequence of those
16:29 wens: is anyone working on supporting a dedicated DMA device for GEM DMA allocations? i.e. a follow up to https://lore.kernel.org/dri-devel/20250307080836.42848-1-tzimmermann@suse.de/
16:30 snowy_coder: If you think that is the right direction for mediump support I could help with that, it doesn't seem too hard (I hope)
16:33 mareko: that change would benefit and be verifiable on all VK drivers with 16-bit IO that use nir_opt_varyings
16:38 robclark: FireBurn: hmm.. maybe upstream vs android kernel has some ubwc mismatch? Perhaps you could try rendering to and dumping an offscreen img to see if the corruption shows up there, or if it is just gpu vs dpu?
16:46 FireBurn: Do you mean like a screenshot?
16:46 robclark: yeah
16:47 FireBurn: I showed you one before
16:50 robclark: maybe there is a reduced config (2 working slices) a830? That would change some of the config (but also should change the chip_id)
17:14 dcbaker: hakzsam: I have two of your patches "radv/meta: fix partial depth/stencil resolves with compute" and "ac,radv,radeonsi: use correct swizzle/pitch for depth-only images with SDMA" queued for 25.3, but they don't apply cleanly and the diff is pretty significant. What would you like me to do with those?
17:15 hakzsam: dcbaker: you can denominate, they aren't super important to be backported as they don't fix real game issues
17:15 dcbaker: hakzsam: sounds good, thanks