14:10karolherbst: DMA_DATA with SRC_ADDR and DST_ADDR being 0 will probably be a reason the GPU ran into memory faults, right? Or is it a "it depends" situation?
14:12hakzsam: it should indeed hang, unless the src/dst is IMM I think
14:21karolherbst: I mean it does, just wondering if that's the reason or if it could be something else
14:22hakzsam: this is likely the reason, including the page fault
14:22karolherbst: yeah.. I have 4 faults and the pattern is 4 times within the ib, so that matches. Anyway, thanks, guess I'll do more debugging then
16:37karolherbst: apparently src and dst being 0 is fine if 0 bytes are copied and I missed that part
16:41hakzsam: yeah, the packet is probably skipped, but there is no point to emit that :)
16:44karolherbst: apparently radeonsi uses it to force a wait for idle
16:45hakzsam: for CP DMA presumably
17:33Venemo: karolherbst: see radv_cp_dma_wait_for_idle. but what are you trying to do exactly?
17:34karolherbst: debugging my SVM MR
17:34Venemo: SVM = ?
17:34karolherbst: shared virtual memory, GPU buffers have the same address than an allocation on the host side
17:34Venemo: ah, I assume that is an OpenCL feature?
17:34karolherbst: yeah
17:34Venemo: got it
17:35karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32942 in case you are interested
17:35karolherbst: tldr: frontend needs to assign addresses to pipe_resources soo.. that's a lot of fun
17:35Venemo: so you're saying that there is a test case for SVM which causes radeonsi to hang?
17:35karolherbst: yeah, but it's most likely my fault
17:35karolherbst: just need to figure out what's wrong
17:36Venemo: I don't think I'm competent enough in gallium to judge that code :(
17:36karolherbst: that's the last IB submitted: https://gist.githubusercontent.com/karolherbst/7e7f2b82210bfa7fbb8cbf9a74c74a3b/raw/51db943f0f09a833bb140e92e61c43b9515cc108/gistfile1.txt
17:37karolherbst: anyway.. probably something weird going on
17:39karolherbst: happens with si_copy_buffer or maybe the previous si_clear_buffer call is messed up or so.. I'm sure there is a reasource having a NULL address for whatever reason.. oh well
17:39Venemo: you could try turning on tracing in the IBs, it seems like it's hooked up to PIPE_CONTEXT_DEBUG - then it will be able to tell you until which point in the IB it executed your commands
17:40karolherbst: one of the issue I'm facing is, that drivers might throw away the bo under certain circumstances, and that could end up messing up the address. I suspect something like that is going on here
17:41Venemo: that tracing thing would help pinpoint where the problem is
17:42Venemo: is the kernel able to soft-recover from this hang, or does it bring down your system?
17:42Venemo: also, do you see any page faults or something like that in your dmesg logs?
17:44karolherbst: soft-recover
17:44karolherbst: and it faults on a NULL pointer
17:46karolherbst: if I flush more often it works perfectly fine btw
17:49karolherbst: "in page starting at address 0x0000000000000000 from client 0x1b (UTCL2)" what's UTCL2?
17:51karolherbst: ohh yeah.. one of the ssbos in si_compute_clear_copy_buffer have a gpu_address of 0
17:51karolherbst: so I guess I need to figure out where that's coming from