IRC Logs of #dri-devel on irc.freenode.net for 2024-09-21

00:01 bluetail: actually, that appears to work
00:02 bluetail: so something about d3d12 appears to be problematic in there... ?
00:28 llyyr: you're missing the headers required to build d3d12
00:28 llyyr: not sure how Mesa's meson.build allowed you to enable d3d12 in the first place, but build system bugs in mesa are quite common
00:28 llyyr: the devs can't really test every possible combination of options
00:31 llyyr: And if you really want to build d3d12 for whatever reason, mingw64-headers should get you what's required
01:00 bluetail: well, llyyr it appears to be in AUR... blame the maintainer?
01:00 bluetail: Or if mingw64-headers fixes it, ask for it being a dependency
01:02 llyyr: what the AUR does is irrelevant
01:02 bluetail: I see.
01:04 bluetail: I will try to build again just to see if that >was< the issue.
01:05 llyyr: DirectX-Headers is probably the right package
01:07 bluetail: I do have directx-headers-git already. Installing the mingw headers did not fix it. :: directx-headers-1.614.0-1 and directx-headers-git-r222.48a7629-1 are in conflict. Remove directx-headers-git? [y/N]
01:08 bluetail: maybe I need "4 aur/mingw-w64-directx-headers 1.614.0-1 [+0 ~0.00]"
01:11 llyyr: arch moment
01:15 bluetail: nah, neither of it fixed it. Last time I had to remove dri something, now it was d3d12
01:15 bluetail: It's weird when I need to guess
02:46 jenatali: Weird. But yeah on native Linux you don't need that driver, it's only useful for WSL
03:01 alyssa: jenatali: but what if you want zink on hard mode
03:01 alyssa: and instead use the d3d12 gallium driver on vkd3d-proton
03:01 alyssa: :p
03:06 HdkR: I build d3d12 for FEX. For anyone that wants that :P
03:08 HdkR: Only a handful of drivers missing in the build configuration
08:20 MrCooper: K900 llyyr: there's the mesa-maintainers mailing list for this purpose
08:22 MrCooper: zf: with swap interval != 0, the swap can't be performed before vertical blank starts in order to avoid tearing; the timestamps correspond to the end of vertical blank though on Linux, that's probably more of a convention than defined by the spec
11:11 lina: Soo we're running into some interesting problems with virtgpu (native context) and implicit sync. TL;DR is for sync to work as intended across the VM boundary we need to be able to virtualize host syncobjs, but virtgpu right now has its own syncobjs in the guest that have nothing to do with the host's (and as far as I can tell would perform worse
11:11 lina: even for use cases within the guest). I made it work for GL by essentially making the virt proto support implicit sync the old-school way, but it gets more complicated for Vulkan. Details here: https://gitlab.freedesktop.org/asahi/mesa/-/issues/43
11:12 lina: airlied, robclark: I've been told you might have thoughts/ideas ^^ (also cc alyssa, gfxstrand, sima)
11:18 emersion: lina, FTR, the nvidia proprietary driver has chosen to not implement implicit sync and just require explicit sync
11:20 lina: I know, but actually implementing explicit sync across the VM barrier would either require true syncobj sharing which is the hardest solution in my list (#5) and requires changes across the board (including the guest kernel, virglrenderer, sommelier/the x11 thing from chaos_princess, etc.), or defining a whole new sync type in wayland/x11 for
11:20 lina: virtualized sync which is probably even harder.
11:21 emersion: why is syncobj sharing hard?
11:21 lina: Because the code doesn't exist anywhere, syncobjs in the guest kernel right now are the guest's own concept and have nothing to do with the host's
11:22 emersion: syncobj emulation could be done entirely in userspace
11:22 lina: The syncobj/explicit sync APIs are defined in terms of Linux sync objects, you can't emulate that in userspace since they are ioctls.
11:23 emersion: they are defined in terms of drm_syncobj
11:23 emersion: which is different from sync_file in that they may not have materialized yet
11:23 lina: Which is a kernel object...
11:23 emersion: you can create a drm_syncobj out of thin air, and then signal it when you want to
11:24 lina: You cannot use userspace-signaled sync objects with GPU APIs, that would violate the fence protocol since there is no guarantee they will ever be signaled.
11:25 emersion: i'm talking purely about the WSI bits
11:26 lina: This *is* the WSI bits, as far as I know the point of all this is the fences are used directly in GPU submissions
11:26 lina: At the end of the day the goal is that a guest submission with an out fence should be able to chain to a host submission from the compositor with an in fence without round-tripping through a bunch of userspace code. This works today for implicit sync in GL with the MRs I linked.
11:26 lina: (Or indeed without roundtripping through the VM kernel either)
11:26 emersion: what do you mean exactly by "WSI" then?
11:26 emersion: i mean Wayland/X11
11:27 emersion: WSI bits support userspace-signaled drm_syncobjs
11:27 lina: Yes, this works with Wayland/X11 implicit sync and I assumed explicit sync was the same, just with explicit sync objects. If explicit sync requires roundtripping through userspace for sync then that's a big regression, I sure hope that's not the case.
11:28 emersion: explicit is different in a good way: it supports more stuff because it uses drm_syncobj instead of sync_file
11:28 emersion: for WSI
11:28 emersion: IOW: you don't need a sync_file to make WSI happy with explicit sync
11:28 emersion: any drm_syncobj will do
11:31 lina: Yes but userspace-signaled syncobjs can only be waited on from userspace (I just checked the code)
11:31 lina: If you pass an unsignaled userspace syncobj to a GPU submission it will fail.
11:31 emersion: yes, the compositor will do that
11:32 emersion: it will asynchronously wait for the drm_syncobj
11:32 lina: But it doesn't do that if the syncobj has a real unsignaled fence that will signal in hardware I hope
11:32 lina: It should just let the GPU driver wait on it then
11:32 emersion: many do
11:32 emersion: no, because it needs to schedule the KMS flip
11:33 emersion: if it submits a buffer to KMS still waiting on a fence, it may miss a frame
11:33 emersion: especially if it tries to submit the frame as late as possible for reduced latency
11:33 lina: Sure but that depends on how the compositor is designed
11:34 emersion: yeah, just saying you won't gain a lot by using sync_file
11:34 emersion: it's not much different from userspace-signalled drm_syncobj in practice
11:35 lina: So then sure, we could support explicit sync by manually forwarding sync objects all in userspace I guess... but we still need some API for that, and some kind of worker thread dealing with the fence signaling, and it's still slower with more roundtripping through the VM/etc, and it still leaves implicit sync broken.
11:37 lina: Considering the use case is gaming here, I'm wary of any solutions that involve adding more round trips through the whole VM stack...
11:39 lina: Then there's the other problem that virtgpu right now implements in syncs on the guest side by just blocking the ioctl, which is also less than ideal... if we could virtualize host syncobjs properly then that would go away.
12:11 zamundaaa[m]: lina: all compositors I know of wait for the acquire point to be signaled on the CPU
12:12 zamundaaa[m]: In the other direction though many do move a fence into the release point, where making the GPU wait on it instead of userspace could have benefits
12:42 lina: It would still be a lot higher latency to have a GPU->virgl->virtio->guest kernel->virtgpu fence->syncobj->sommelier or whatever->virtgpu channel->guest kernel->virtio->virgl->host syncobj->compositor path (which you'd need to do it all in userspace) vs. GPU->compositor (what happens now with implicit sync and my changes, and what would happen with
12:42 lina: proper virtualized host syncobjs and explicit sync too)
12:42 lina: I really, really don't think that signaling mess is the answer we want to present frames from games.
12:45 lina: There's already enough complexity and layering in this virtgpu stuff, in many cases more than is really needed... I don't want to make it worse, we should be moving towards cutting out complexity and latency, not adding more.
14:19 DemiMarie: lina: Compositors generally wait for client buffers to finish rendering before doing their own rendering, whether explicit or implicit sync is in used.
14:20 DemiMarie: What about having guest Mesa issue operations to the host instead of using in-guest ioctls?
14:22 DemiMarie: robclark: what does ChromeOS do? What will it do if it ever has a pure explicit sync driver?
14:30 bluetail: lina does that include moving from C to Rust?
14:36 robclark: lina: tu supports syncobj in guest.. the secret is it is all fences at guest KMD and below... iirc drm/virtio needs some implicit sync support, but even that is all fences under the hood
14:37 robclark: lina: I'll look at the gitlab issue but all roads lead to dma_fence so it shouldn't be all that complicated
14:38 robclark: DemiMarie: other than Linux guest vm's (borealis) CrOS is pure explicit sync (ignoring some very old platforms)
14:38 DemiMarie: robclark: I think guest VMs are lina's case
14:38 DemiMarie: what do you do there?
14:39 robclark: below guest userspace it is all explicit fences
14:52 robclark: lina: the fence forwarding is to avoid blocking on guest side ioctl.. the only thing is the guest would need to wait until the execbuf that creates the fence is sent to the host, similarly to the way any (guest or otherwise) can't send a syncobj back to kernel to wait on if it isn't backed by a fence yet
14:53 robclark: anyways, I'll read up on what the issue is with asahi/hk when I get home.. I doubt it needs anything more at the virtgpu proto level
15:35 nano-: Trying out the zink stuff on macos with moltenvk and while I'm well versed in other areas of development, I'm a bit in the dark when it comes to this. I have built libzink.a after fixing the missing spirv_info_h dependency in the build system, but now that I'm trying to link it to the application without -framework OpenGL I lack all opengl symbols, where are they supposed to come from in this case?
16:00 lina: robclark: tu passes all BOs to the driver and seems to do implicit sync at the kernel level (there's even a flag for it)...
16:01 lina: how does CrOS do explicit sync across the VM boundary?
16:22 JEEB: ?33
16:22 JEEB: whoops
16:42 nano-: I've compiled the glapi now, but for some reason none of its functions are prefixed by 'gl', otherwise they all seem to be there.
17:13 nano-: Is Mike Blumenkrantz here perhaps?
17:19 alyssa: zmike: ^
17:21 DemiMarie: lina: Generally submitting commands from a VM to the host is cheap so long as one doesn't need a reply.
17:22 DemiMarie: Also, commands are guaranteed to be executed in-order.
17:22 DemiMarie: This means that you can submit commands that perform syncobj ioctls and know that the ioctls will happen before the GPU commands that depend on them.
17:23 DemiMarie: Does that help?
17:51 lina: DemiMarie: I can probably work with that to avoid blocking on replies if we fully virtualize syncobjs (would have to generate all unique IDs on the guest side so we never need to get anything back) but it doesn't really help for the other solutions (doing it in userspace etc).
17:55 DemiMarie: lina: I think option 6 (wrap host syncobjs in guest ones) is the best way to go w.r.t. compatibility with applications.
17:55 DemiMarie: I admit that it looks like the hardest one though
17:58 lina: That would mean a bunch of guest kernel changes to implement the concept of a host-backed fence and communicating fence state and inserting them into syncobjs, yeah...
17:59 DemiMarie: I'm especially concerned about Wine here.
18:00 lina: It seems virtgpu supports host fences right now, but only a single global fence timeline, which isn't really ideal since the GPU can have multiple queues and process fences out of order... might be good enough if that codepath is only used for CPU waits from the guest though, but we'd still need to map fence IDs to syncobjs in the host (userspace)
18:00 lina: and then do something so syncobjs in the guest can be mapped back to host ones...
18:02 DemiMarie: People (like me) do run nested Wayland compositors, so putting guest CPU-side waits on an ultra-slow path would not be great.
18:06 DemiMarie: lina: If there is a protocol limitation, I'd rather fix it in the protocol.
18:07 DemiMarie: The virtio-GPU protocol is easy to extend and it's okay to have features that are native-context-specific.
18:08 DemiMarie: My biggest concern with the simpler solutions is that they might later need to be ripped out in favor of one of the more complete ones.
18:09 DemiMarie: But it is ultimately your call and I trust you to get it right.
18:11 lina: I'm still trying to wrap my head around it, but it looks like there are a number of possible rings in virtgpu and each ring has its own fence context / sequence? I'm not sure how those map to application usage, it looks like apps can just pick a ring when submitting?
18:12 lina: I guess that was designed with hw scheduled GPUs in mind that have a fixed number of hw GPU rings that always process things in order... but that doesn't map well to drm/asahi, since we have an arbitrary number of firmware queues that can execute in any order.
18:14 DemiMarie: I'd just ignore virtio-GPU's concept of rings.
18:15 lina: Right, but since each ring has its own fence context, fences are assumed to complete in-order within a ring.
18:15 DemiMarie: You can pass your own fences.
18:16 lina: How? virtio_gpu_fence_event_process goes through the list of driver fences and signals all with a smaller seqno than the event, with the same context (ring)...
18:16 DemiMarie: Not use virtio-GPU fences
18:17 lina: OK, but then we need a whole separate extension for wrapping discrete fences
18:17 DemiMarie: Just pass your own numbers to the ioctls
18:17 DemiMarie: That does not seem too hard to add, at least on the virglrenderer side.
18:18 lina: Doing it in userspace is option #5 in my list, which will work within an app but does not work for explicit sync / sharing syncobjs since it would be a different kind of object...
18:18 DemiMarie: Fair
18:18 lina: We would need guest kernel support for a different fence concept if we want syncobjs to work (option #6)
18:21 DemiMarie: Does the Xe driver have the same problem you do?
18:21 lina: I don't think Xe has native context?
18:21 DemiMarie: Not yet but it will
18:22 DemiMarie: And Qubes OS needs it to, otherwise once it gets GPU acceleration Lunar Lake hardware won’t work.
18:22 DemiMarie: I also suspect ChromeOS needs it so they can move away from virgl.
18:24 DemiMarie: I believe that Xe native context support is explicitly planned.
18:24 lina: I think I have an idea, but it's too long to spam IRC... let me write it up as an issue comment
18:39 lina: Heh, I just realized in syncs are also broken right now, since virtgpu really assumes commands complete in order... so it just ignores in_syncs from its own queue, which doesn't work if commands actually execute in separate queues in the underlying driver...
18:49 lina: DemiMarie: Posted the comment, let me know if it makes sense ^^
19:37 JoshuaAshton: Any Vulkan Video afficionados around? https://gitlab.freedesktop.org/mesa/mesa/-/issues/11925 to help figure out if me or RADV is wrong here...
19:43 Lynne: that's a dump
19:45 Lynne: cts doesn't test everything, so before me and airlied fixed this over the past few weeks, it wasn't possible to encode anything
20:28 nano-: So is libzink and libglapi supposed to be combined with glvnd on macOS? or EGL? The information seems very diverse, lots of mentioning of X11, but my target application is native macOS with SDL2.