16:37zmike: mareko: I'm looking at deleting buffer refcounting, but this is increasingly difficult as I get into multi-context binds
16:38zmike: specifically the non-invalidate scenario for bufferobj_data where the buffer is bound on multiple contexts
16:39zmike: every solution I come up with for this seems varying degrees of gross
19:42MandiTwo: Hi! For the issue ive asked couple days before, should i open a gitlab issue on the mesa repo?
19:57ity: Does a DRM kernel driver need to do something special to support DRM Leasing?
20:04mareko: is it normal that VK CTS takes several hours on a good GPU?
20:05mareko: zmike: the multi-context stuff is tricky as multiple contexts can reference a single resource
20:05mareko: *single=the game
20:05mareko: *same
20:06airlied: mareko: yes
20:06airlied: mareko: are you running it with deqp-runner?
20:07airlied: though even with deqp-runner and a really good CPU it can take 45mins to 8 hrs
20:08zmike: mareko: yeah that's what I mean
20:09zmike: and yeah for vkcts you definitely want as many cores/threads as you can because almost no tests are gpu-bound
20:09zmike: it's just millions of tests
20:12mareko: deqp-runner
20:17pendingchaos: mareko: if you're using too many threads, I think there's a kernel bug after 6.6.14 that can make it really slow
20:18pendingchaos: IIRC I can run it in under an hour with 16 threads, but it was estimated to take hours with 32 threads
20:20pendingchaos: found it: https://gitlab.freedesktop.org/drm/amd/-/issues/4260
20:23daniels: MandiTwo: yeah, filing issues is much better than IRC
21:25mareko: pendingchaos: I've replied on the issue with a possible solution
21:40airlied: vk cts also suffers from a reads back from vram problem, it's been fixed for some tests, but multisample resolve tests still hit it pretty hard
22:34karolherbst: running compute shaders in the kernel? nice
22:36mareko: it already runs compute shaders for some special cases
22:38mareko: I have a hunch that any new compute shaders in the amdgpu module will have to be written and enabled by the community
22:44karolherbst: mhh, I wonder if the kernel can be smarter about clearing pages there...
22:44karolherbst: like is that for not leaking VRAM to other processes or has this another use case?
22:44karolherbst: it's not 100% clear to for what this is used
22:46airlied: not leaking VRAM is the main one
22:47karolherbst: I see..
22:48karolherbst: haven't really dug into the code, but if it's not caching uncleared allocations for reuse in the same process, that might be a way to speed things up
22:48mareko: CTS likely spawns a new process for every test group
22:49airlied: deqp-runner runs a bunch of tests, but I think they all open a new kernel fd
22:50mareko: libdrm_amdgpu could keep device FDs open forever and reuse
22:51mareko: but the kernel shouldn't clear with SDMA in the first place
22:51airlied: sdma is such a trap :-P
22:52mareko: it's just a design limit
22:53airlied: the kernel has compute shaders to handle LDS cleaning already
22:53mareko: SDMA can do about 60-100 GB/s, enough for PCIe, but the big cache and thus compute can do a few TB/s
22:54airlied: just needs a bit overclock :-P
22:54mareko: and without the big cache, it's about 500-900 GB/s on big GPUs
22:56mareko: the compute clears could be set to bypass all caches, so that parallel workloads are minimally affected
23:14karolherbst: airlied: I wonder if we need strict isolation between fds or if it's fine only between processes, but I can see why that's a big ugly to ensure on the kernel side
23:16airlied: yes doing anything per-process is just a nightmare, and then you have web browsers so it all goes out the window
23:18karolherbst: they already have a workaround in place for it anyway
23:18karolherbst: but yeah...
23:49mareko: also qemu is 1 process / many FDs