16:37 zmike: mareko: I'm looking at deleting buffer refcounting, but this is increasingly difficult as I get into multi-context binds
16:38 zmike: specifically the non-invalidate scenario for bufferobj_data where the buffer is bound on multiple contexts
16:39 zmike: every solution I come up with for this seems varying degrees of gross
19:42 MandiTwo: Hi! For the issue ive asked couple days before, should i open a gitlab issue on the mesa repo?
19:57 ity: Does a DRM kernel driver need to do something special to support DRM Leasing?
20:04 mareko: is it normal that VK CTS takes several hours on a good GPU?
20:05 mareko: zmike: the multi-context stuff is tricky as multiple contexts can reference a single resource
20:05 mareko: *single=the game
20:05 mareko: *same
20:06 airlied: mareko: yes
20:06 airlied: mareko: are you running it with deqp-runner?
20:07 airlied: though even with deqp-runner and a really good CPU it can take 45mins to 8 hrs
20:08 zmike: mareko: yeah that's what I mean
20:09 zmike: and yeah for vkcts you definitely want as many cores/threads as you can because almost no tests are gpu-bound
20:09 zmike: it's just millions of tests
20:12 mareko: deqp-runner
20:17 pendingchaos: mareko: if you're using too many threads, I think there's a kernel bug after 6.6.14 that can make it really slow
20:18 pendingchaos: IIRC I can run it in under an hour with 16 threads, but it was estimated to take hours with 32 threads
20:20 pendingchaos: found it: https://gitlab.freedesktop.org/drm/amd/-/issues/4260
20:23 daniels: MandiTwo: yeah, filing issues is much better than IRC
21:25 mareko: pendingchaos: I've replied on the issue with a possible solution
21:40 airlied: vk cts also suffers from a reads back from vram problem, it's been fixed for some tests, but multisample resolve tests still hit it pretty hard
22:34 karolherbst: running compute shaders in the kernel? nice
22:36 mareko: it already runs compute shaders for some special cases
22:38 mareko: I have a hunch that any new compute shaders in the amdgpu module will have to be written and enabled by the community
22:44 karolherbst: mhh, I wonder if the kernel can be smarter about clearing pages there...
22:44 karolherbst: like is that for not leaking VRAM to other processes or has this another use case?
22:44 karolherbst: it's not 100% clear to for what this is used
22:46 airlied: not leaking VRAM is the main one
22:47 karolherbst: I see..
22:48 karolherbst: haven't really dug into the code, but if it's not caching uncleared allocations for reuse in the same process, that might be a way to speed things up
22:48 mareko: CTS likely spawns a new process for every test group
22:49 airlied: deqp-runner runs a bunch of tests, but I think they all open a new kernel fd
22:50 mareko: libdrm_amdgpu could keep device FDs open forever and reuse
22:51 mareko: but the kernel shouldn't clear with SDMA in the first place
22:51 airlied: sdma is such a trap :-P
22:52 mareko: it's just a design limit
22:53 airlied: the kernel has compute shaders to handle LDS cleaning already
22:53 mareko: SDMA can do about 60-100 GB/s, enough for PCIe, but the big cache and thus compute can do a few TB/s
22:54 airlied: just needs a bit overclock :-P
22:54 mareko: and without the big cache, it's about 500-900 GB/s on big GPUs
22:56 mareko: the compute clears could be set to bypass all caches, so that parallel workloads are minimally affected
23:14 karolherbst: airlied: I wonder if we need strict isolation between fds or if it's fine only between processes, but I can see why that's a big ugly to ensure on the kernel side
23:16 airlied: yes doing anything per-process is just a nightmare, and then you have web browsers so it all goes out the window
23:18 karolherbst: they already have a workaround in place for it anyway
23:18 karolherbst: but yeah...
23:49 mareko: also qemu is 1 process / many FDs