17:22Venemo: karolherbst: some nice results, congrats! https://www.phoronix.com/review/rocm-rusticl-strix-halo
17:23karolherbst: Yeah.. I kinda expected worse tbh :D
17:43HdkR: oooo
17:44HdkR: Some room for improvement but pretty dang good.
18:20karolherbst: the kernel launch overhead is probably because of usermode command submission, I think ROCm uses it, but mesa doesn't?
18:29agd5f: correct
19:00Venemo: karolherbst, agd5f I assume by usermode command submission you mean user queues. shouldn't those actually have less overhead?
19:00karolherbst: yeah, that's why ROCm has a lower kernel launch overhead, no?
19:01karolherbst: I think there was somebody writing an MR for it, but not sure that was ever targeting mesa
19:02agd5f: yeah, rusticl could target either ROCr or the KFD IOCTL interface if it wanted to use the same user queues as ROCm.
19:05karolherbst: I suppose those interfaces aren't really suitable for 3D, so it's only viable to use on a compute only screen.. I think the issue in mesa was, that it's not that simple (tm), because radeonsi shares the pipe_screen across APIs within the same process, so not sure it's easily doable without reworking quite a bunch
19:06karolherbst: I have other places where it's causing significant overhead in this area, so maybe I work on those first and see how close I'm getting
19:47airlied: agd5f: could just expose user compute queues via amdgpu :-)
19:58agd5f: airlied, sure, but they'd also need mesa changes too.
19:59airlied: indeed, but they would be more compatible changes
22:59karolherbst: anyway.. I looked into why the latency is so much different, and most of the reason is that timestamp queries aren't great in mesa and cause a lot of pointless overhead :')
23:04HdkR: I do appreciate some pointless overhead.