07:31 mareko: if there is a way to get the IB pointer, I don't know about it
07:32 mareko: you can track IB execution by adding memory writes into the IB, indicating progress
11:46 Venemo: mareko: but that adds runtime overhead, doesn't it?
12:17 Venemo: I think the intention here would be to make it possible to more easily debug GPU hangs without too much overhead
12:17 Venemo: for that, we need some way to tell what the GPU is doing when the hang happens
12:19 Venemo: ideally, the kernel should be able to tell which IB was being executed and give us a pointer where the hang happened
12:20 Venemo: it would then give us this info before resetting the GPU
12:26 Venemo: this'd enable us to debug gpu hangs without disabling reset, and without adding overhead in the userspace driver
12:31 Venemo: if it's impossible to say what the GPU is doing in an IB, that is a bummer
14:14 tonyk: mareko: but in the HW something need to be set as a program counter of the IB right? or is this something behind the firmware?
14:20 agd5f: tonyk, I think all of these registers are really just "named scratch registers"; Just agreed upon places for the CP firmware to store certain things during operation. I can ask the CP FW team how they handle the IB walk.
14:21 tonyk: agd5f: I see, thanks!
14:24 mareko: there is CP_HQD_IB_RPTR, which should contain the IB offset being executed
14:25 mareko: in dwords
14:26 tonyk: mareko: thank you!
14:26 Venemo: mareko: does that work for IB2 too?
14:27 Venemo: agd5f: please do, if you can. this would make hang debugging so so much easier
14:33 mareko: Venemo: supposedly yes, though it's only a 20-bit dword offset
14:33 Venemo: we don't allow IBs that are bigger than that
14:34 Venemo: the reason I'm so enthused about this, is that essentially we could do away with the hack that is RADV_DEBUG=hang and we could ask users for logs without having them disable GPU resets or slowing down their game performance
14:38 mareko: there are also state registers CP_IB1_BASE_LO/HI, CP_IB2_BASE_LO/HI containing IB base pointers
14:40 tonyk: and the CP_HQD_IB_RPTR will also move when the engine process a shader (even if's not exactly and IB)? or it will stop at the DRAW command?
14:41 mareko: shaders have nothing to do with CP
14:43 mareko: you can use CP_HQD_IB_CONTROL.PROCESSING_IB to tell whether the ring or IB is being executed, but I don't see a way to distinguish between IB1 and IB2 execution; if CP_IB2_BUFSZ == 0, IB2 might not be executing
14:47 tonyk: mareko: right, what should I look for to know more about the current shader being executed?
14:50 mareko: umr can print active waves
14:51 mareko: see " -wa" in mesa
14:52 mareko: the hw is capable of executing (32 * numCUs) shaders at the same time
14:52 mareko: that's a lot of shaders
14:59 tonyk: and what's a wave?
15:20 emersion: "R32G32B32 is a weird format and the driver currently only supports the barely minimum."
15:20 emersion: how hard would it be to support 3D images?
15:20 emersion: in radv?
16:18 emersion: (FWIW, radeonsi supports GL_RGB32F)
16:18 emersion: my usage is applying a 3D LUT for color management purposes
16:21 glehmann: is there a reson why you couldn't use RGBA32 instead?
16:21 pendingchaos: I don't think the hardware supports that format, so not easily
16:25 emersion: glehmann: my LUT only has 3 entries
16:25 emersion: RGB, no A
16:26 emersion: i suppose i could use R32 instead?
16:26 emersion: and manually do the addressing?
16:27 pendingchaos: if you're manually doing linear addressing, radv apparently supports linear sampling of r32g32b32 images
16:27 MrCooper: then you can't use texture filtering for interpolation though
16:27 pendingchaos: right, I forgot vulkan has a separate bit for that
16:28 pendingchaos: except radv still has that? FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT
16:29 MrCooper: crossing wires :) I was thinking of using 3 R32 texels next to each other for each component, but I guess you could use separate images per component
16:29 pendingchaos: if you want optimal tiling and can't use rgba for some reason, maybe you can do two separate r32g32 and r32 images
16:30 emersion: hm right interpolation
16:30 emersion: i really do want interpolation
16:31 emersion: so… i'd need to split off each channel
16:44 glehmann: emersion: having one unused channel doesn't hurt tho, does it?
16:44 emersion: yeah, maybe i'll do just that
16:44 glehmann: that's probably also what radeonsi does internally
22:46 mareko: tonyk: a wave is a subgroup in Vulkan
22:46 mareko: or SPIR-V
22:47 mareko: emersion: radeonsi will give you RGBX32F, 2 separate textures (RG32F and R32F) would be better because RGBX32F wastes 25% memory bandwidth