03:48kode54: cool
03:48kode54: vkoverhead still locks up my GPU if I use the Xe KMD
03:48kode54: right when it gets to the descriptor portion of the test
04:58karolherbst: Kayden: sooner or later I'll probably need an API to force a certain SIMD mode, because Intel has some subgroup extensions for that kinda stuff
06:20karolherbst: OpTypeInt 24 0 🙃 I know what you are all thinking, and I agree: this is invalid SPIR-V
14:52doras: I'm trying to figure out why calls to glGetQueryObjecti64v with GL_QUERY_RESULT take ~9 milliseconds to return. They are called using objects created with glQueryCounter and GL_TIMESTAMP.
14:53doras: I couldn't quite figure out where this is implemented in Mesa. I found the Gallium interface, but it wasn't clear how I can get to the actual implementation from there.
15:10zmike: it goes through the query interface as PIPE_QUERY_TIMESTAMP
15:14Andrew-R: karolherbst, oh, cool, llvmpipe/rusticl now report more realistic numbers in clpeak! :)
15:37karolherbst: Andrew-R: huh... I didn't fix anything there :D
15:59doras: Thanks zmike. Tracing this issue with eBFP suggests that the entire time is spent inside an ioctl. Now I need to figure out which and what it does...
15:59doras: This is with radeonsi, by the way.
16:16doras: Seems to be amdgpu_cs_wait_ioctl. Hmmm...
17:17doras: For context, this is called from Mutter's page flip event callback in a direct scanout scenario (client buffer was imported). The object for which the timestamp is queried is the client's EGLImage which was just flipped successfully. Any waiting for fences on this buffer was supposed to be done before flipping as far as I know. I'd expect the buffer to be idle at this point, so there shouldn't be any waiting required.
19:58doras: I guess glGetQueryObjecti64v records all calls to GL functions, not just those done in the context of the EGLImage. Though Mutter is not supposed to call any GL functions after the atomic commit when the query object is created and until getting a page flip event. The client likely does draw on a different buffer, so maybe we somehow end up waiting on its own work? It still sounds like an issue in amdgpu to me.
22:57doras: Well, it doesn't look like Mutter and the client are accessing the same fences (at least directly), and it does look like radeonsi is doing a command submission on behalf of Mutter inside glGetQueryObjecti64 which creates a fence and immediately waits on it before returning.
23:19Andrew-R: ...reading karol's mastadon leaves me with feeling world might be much better place if developers stopped to chase tails of proprietary software and hardware. Like, just quit en masse and enjoy things you really enjoy ... I can live with xvesa.
23:50karolherbst: Kayden: seems like linux 6.3 has a _major_ i915 perf regression. Or it's the fedora38 builds.. or something.. in any case, my CPUs spend 80% of their time inside i915_active_acquire_preallocate_barrier
23:50karolherbst: on 6.3
23:50karolherbst: and I see the GPU being starved
23:52karolherbst: still checking things out, so not 100% sure about this, but it's kinda looking like this
23:56karolherbst: maybe I also messed something up.. dunno.. it's kinda weird