07:31mareko: if there is a way to get the IB pointer, I don't know about it
07:32mareko: you can track IB execution by adding memory writes into the IB, indicating progress
11:46Venemo: mareko: but that adds runtime overhead, doesn't it?
12:17Venemo: I think the intention here would be to make it possible to more easily debug GPU hangs without too much overhead
12:17Venemo: for that, we need some way to tell what the GPU is doing when the hang happens
12:19Venemo: ideally, the kernel should be able to tell which IB was being executed and give us a pointer where the hang happened
12:20Venemo: it would then give us this info before resetting the GPU
12:26Venemo: this'd enable us to debug gpu hangs without disabling reset, and without adding overhead in the userspace driver
12:31Venemo: if it's impossible to say what the GPU is doing in an IB, that is a bummer
14:14tonyk: mareko: but in the HW something need to be set as a program counter of the IB right? or is this something behind the firmware?
14:20agd5f: tonyk, I think all of these registers are really just "named scratch registers"; Just agreed upon places for the CP firmware to store certain things during operation. I can ask the CP FW team how they handle the IB walk.
14:21tonyk: agd5f: I see, thanks!
14:24mareko: there is CP_HQD_IB_RPTR, which should contain the IB offset being executed
14:25mareko: in dwords
14:26tonyk: mareko: thank you!
14:26Venemo: mareko: does that work for IB2 too?
14:27Venemo: agd5f: please do, if you can. this would make hang debugging so so much easier
14:33mareko: Venemo: supposedly yes, though it's only a 20-bit dword offset
14:33Venemo: we don't allow IBs that are bigger than that
14:34Venemo: the reason I'm so enthused about this, is that essentially we could do away with the hack that is RADV_DEBUG=hang and we could ask users for logs without having them disable GPU resets or slowing down their game performance
14:38mareko: there are also state registers CP_IB1_BASE_LO/HI, CP_IB2_BASE_LO/HI containing IB base pointers
14:40tonyk: and the CP_HQD_IB_RPTR will also move when the engine process a shader (even if's not exactly and IB)? or it will stop at the DRAW command?
14:41mareko: shaders have nothing to do with CP
14:43mareko: you can use CP_HQD_IB_CONTROL.PROCESSING_IB to tell whether the ring or IB is being executed, but I don't see a way to distinguish between IB1 and IB2 execution; if CP_IB2_BUFSZ == 0, IB2 might not be executing
14:47tonyk: mareko: right, what should I look for to know more about the current shader being executed?
14:50mareko: umr can print active waves
14:51mareko: see " -wa" in mesa
14:52mareko: the hw is capable of executing (32 * numCUs) shaders at the same time
14:52mareko: that's a lot of shaders
14:59tonyk: and what's a wave?
15:20emersion: "R32G32B32 is a weird format and the driver currently only supports the barely minimum."
15:20emersion: how hard would it be to support 3D images?
15:20emersion: in radv?
16:18emersion: (FWIW, radeonsi supports GL_RGB32F)
16:18emersion: my usage is applying a 3D LUT for color management purposes
16:21glehmann: is there a reson why you couldn't use RGBA32 instead?
16:21pendingchaos: I don't think the hardware supports that format, so not easily
16:25emersion: glehmann: my LUT only has 3 entries
16:25emersion: RGB, no A
16:26emersion: i suppose i could use R32 instead?
16:26emersion: and manually do the addressing?
16:27pendingchaos: if you're manually doing linear addressing, radv apparently supports linear sampling of r32g32b32 images
16:27MrCooper: then you can't use texture filtering for interpolation though
16:27pendingchaos: right, I forgot vulkan has a separate bit for that
16:28pendingchaos: except radv still has that? FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT
16:29MrCooper: crossing wires :) I was thinking of using 3 R32 texels next to each other for each component, but I guess you could use separate images per component
16:29pendingchaos: if you want optimal tiling and can't use rgba for some reason, maybe you can do two separate r32g32 and r32 images
16:30emersion: hm right interpolation
16:30emersion: i really do want interpolation
16:31emersion: so… i'd need to split off each channel
16:44glehmann: emersion: having one unused channel doesn't hurt tho, does it?
16:44emersion: yeah, maybe i'll do just that
16:44glehmann: that's probably also what radeonsi does internally
22:46mareko: tonyk: a wave is a subgroup in Vulkan
22:46mareko: or SPIR-V
22:47mareko: emersion: radeonsi will give you RGBX32F, 2 separate textures (RG32F and R32F) would be better because RGBX32F wastes 25% memory bandwidth