04:12Lynne: any way of updating descriptors for buffers via an vkdeviceaddress instead of vkbuffer+offset when using peasant-tier descriptor pools?
06:32glehmann: Lynne: no, that only works with descriptor buffers. You can always use BDA instead of buffer descriptors in your shaders though, if you don't need robustness (bounds checking)
06:33Lynne: you don't get robustness or any sort of validation with descriptor buffer storage buffers
12:38DottorLeo: hi karolherbst, i've seen your MR about enabling by default some drivers for rusticl. I'm curious about what drivers could be enabled by defaults, are there any candidates for 24.1 :) ?
12:38karolherbst: it's mostly for future versions and it's mostly up to the driver maintainers
12:40DottorLeo: thanks. I was curious because i have an RDNA2 card and i'd like to toy a bit with rusticl on some opencl applications
12:42karolherbst: you can already do that
12:42DottorLeo: i also see that you have a radeon card for testing, mind if i ask what gpu do you have?
12:42karolherbst: this MR is not about enabling rusticl for certain drivers, just for what devices it's enabled by default at runtime
12:43karolherbst: so with RUSTICL_ENABLE=radeonsi you can already test it on yours
12:43karolherbst: and my AMD GPU is RDNA2 as well
12:43DottorLeo: nice :D
12:43karolherbst: though you won't have a lot of fun with 24.1 I think
12:43karolherbst: not sure the critical fixes arrived in time for that release
12:43DottorLeo: could you be interested in bug report about rusticl on GFX9 and maybe older?
12:44DottorLeo: i'll try to use that variable, thanks!
12:44karolherbst: sure, just make sure you check with a more recent mesa release
12:44karolherbst: 24.1 is already eol
12:44DottorLeo: sorry 25.1 ^^"
12:44DottorLeo: i mean future mesa release :D
12:45karolherbst: 24.2 would be good enough already
12:45K900: There should be a 24.3 before 25.1
12:45DottorLeo: yeah, waiting for Fedora to update to that
12:45K900: It's not out yet
13:07alyssa: anyone seen dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp32.generated_args.denorm_nmax_nan_flush_to_zero_vert fail on x86_64?
13:07alyssa: looks like the constant folding doesn't agree with CTS
13:08alyssa: but radv&nvk advertise float controls..
13:17alyssa: well, here's a nir bug. no clue how this worked before
13:18alyssa: unless this is a FEX bug :clown:
13:26chripyVagrancy: I don't know if that works for what you do, but you could test with box64 instead of FEX and see if that yields the same result.
13:38alyssa: I have an x86 build box :)
14:18robclark: lina: tu passes all BOs for residency (we don't yet have vm_bind), not for sync.. implicit and explicit sync works btwn processes and across vm boundary with fences.. it's all dma_fence under the hood for any kind of sync
14:19robclark: even syncobj is dma_fence
14:20alyssa: FEX bug!
14:20alyssa:writes patch
14:20HdkR: \o/
14:21chripyVagrancy: cool stuff
14:22chripyVagrancy: (was talking about https://box86.org/ in case that wasn't clear)
14:36HdkR: I would love to see CTS runs on FEX versus Box64, but that's also a lot of effort :P
14:58lina: robclark: If MSM_SUBMIT_BO_NO_IMPLICIT is not set on a BO, the kernel driver will handle implicit sync at the kernel level as long as the global MSM_SUBMIT_NO_IMPLICIT is not set (see msm_gem_submit.c). tu never sets the former and only sets the latter if all BOs are internal, which means that the kernel driver is handling implicit sync for tu, at
14:58lina: least for in syncs.
14:59lina: Ah, and submit_attach_object_fences handles out sync unconditionally.
15:00robclark: lina: tu could probably be more clever and figure out which BOs need implicit sync.. or if it wanted to require a newer kernel could attach dma fence fd to shared dmabufs.. but it needs to do _some_ sync related to residency (so we don't unpin in-use BOs)
15:00robclark: if you want to do implicit sync in userspace use the dmabuf ioctl to attach fence fd
15:00lina: Which doesn't work across the VM barrier.
15:00lina: Which is exactly my problem.
15:00robclark: ofc it does
15:01lina: How would it? the dmabuf ioctls only apply in the guest, they don't touch fences in the host
15:01robclark: if it is a host fence, it is pushed back down to the host (if the ctx-type supports it)
15:01lina: How?
15:01lina: Am I missing some mechanism here for dma-buf fence sync across virtgpu?
15:02robclark: virtgpu should fish out the fence from the dma_resv (at least assuming someone implemented implicit sync support in virtgpu)
15:02robclark: it's all just dma_fence in the end
15:02robclark: and if it is a host fence the wait can be pushed down to the host instead of waiting in the guest
15:02lina: How would that even work? The dma-buf ioctl code is generic and knows nothing about virtgpu. If you put a fence in a buffer in the guest then there is no codepath to somehow sync that back to the host.
15:03robclark: the fence is tracked in the dma_resv
15:03lina: Yes, in the guest... not the host
15:03robclark: right, but to pass it to the host goes thru virtgpu
15:03lina: There is no codepath to do that as far as I can tell.
15:04robclark: for shared buffers it should be attached to the virtgpu EXECBUF ioctl
15:04robclark: (for implicit)
15:05robclark: or you can use the dmabuf ioctl to pull the dma_fence fd back out
15:07lina: Where?
15:08lina: The virtgpu code on the guest kernel side adds the guest-side virtgpu fence into objects passed into the exbuf, nothing more (which is useless for me since that's the thing mesa already does manually for implicit sync on the guest side)
15:09robclark: virtgpu guest kmd extracts all the fences, if they are guest fences it does guest side wait, if they are host fences they are passed back to the host
15:10DemiMarie: robclark: host fences are waited on the CPU which is too slow
15:10lina: *All* fences are waited on the CPU except those from the same queue.
15:11lina: And again this does nothing to pass fences down to BOs on the host side
15:11robclark: host fences are passed back to host
15:11robclark: it turns into explicit sync on the host side
15:12lina: I spent the past two days reading this code and have not found any code that does that...
15:12DemiMarie: Could syncobjs be made first class citizens in the protocol, just like BOs are?
15:12robclark: no
15:13DemiMarie: Why?
15:13robclark: syncobjs are only UMD syntax sugar
15:13robclark: it is all dma_fence under the hood
15:13robclark: lina, I'll walk you thru it next week
15:13DemiMarie: I meant host syncobjs in the form of file descriptors.
15:13lina: There are zero hits for DMA_BUF_IOCTL_IMPORT_SYNC_FILE in virglrenderer, how could it possibly be doing implicit sync properly...
15:15colinmarc: I submitted a random tiny vulkan video patch a while ago without the Fixes: tag, is it still possible to backport to 24.1? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28719
15:15robclark: DemiMarie: fd's don't exist across vm boundary.. and drm_syncobj isn't something that a driver can have a custom impl of.. but it is unneeded, it's _all just dma_fence_
15:15robclark: lina: it isn't needed
15:15robclark: lina: I'll walk you thru it next week
15:16DemiMarie: robclark: doesn't the guest need to be able to perform operations on host syncobjs to make the Wayland syncobj protocol work?
15:16robclark: don't use wl syncobj proto
15:17lina: I'm literally looking at virtgpu_fence.c and virtgpu_submit.c and the only code here is to handle virtgpu guest fences (which are always signaled by the host), nothing at all about host fences and nothing to somehow pass fences down to the host in any way.
15:17DemiMarie: There is literally no other non-deprecated option for Wayland explicit sync
15:17robclark: well, turn it back into syncobj on the host side
15:18kisak: colinmarc: the last scheduled point release for 24.1 has passed.
15:18robclark: wl syncobj was probably a mistake but it should be possible to work around
15:18lina: robclark: If you don't use the syncobj proto (so you use implicit sync) then you *need* to insert fences into buffers for it to work (implicit sync) and for that on the guest side you need those dmabuf ioctls and they literally are never mentioned in the virglrenderer code. This simply cannot work without host-side implicit sync, which is the case
15:18lina: for tu/msm right now as I just confirmed.
15:18DemiMarie: robclark: how is one supposed to do that?
15:19robclark: lina: probably ryan would be better positioned to explain the wl part since he was working on wl cross-domain
15:19DemiMarie: Asahi does not support implicit sync at the kernel level at all.
15:19robclark: then asahi is holding it wrong
15:19robclark: it is literally impossible to support implicit sync across drivers entirely in userspace
15:20lina: ???
15:20DemiMarie: For Vulkan one should not be using kernel implicit sync even if the driver supports it.
15:20lina: Of course it is, you just take the syncobjs and import/export them from buffers, it's what our gallium driver does to emulate implicit sync and it's what the vulkan common WSI code does.
15:20lina: DMA_BUF_IOCTL_IMPORT_SYNC_FILE as I mentioned
15:21lina: If you read the MRs I linked, my stopgap fix for now was to literally do the same thing in virglrenderer to add implicit sync support to the virtio interface (only, not the native one)
15:21lina: But that only works well for GL
15:21DemiMarie: This is a problem with the virtio-GPU protocol. It needs to support cross-VM explicit sync with syncobjs.
15:21lina: (Unless you do what tu/msm does which is apparently to implicit sync every single buffer... which can't possibly be fast)
15:21robclark: DemiMarie: it is not about the UMD api
15:22DemiMarie: robclark: Asahi simply does not need implicit sync across drivers.
15:22lina: Implicit sync *works* across drivers and we *do* need it
15:22lina: It works perfectly fine between our GPU driver and our display controller driver
15:22DemiMarie: robclark: disregard what I said
15:22robclark: DemiMarie: no, hw only supports fences, not syncobjs.. (in this case virtual hw).. syncobj is just fancy syntax on top of fences
15:23lina: The syncobj is just a transport vehicle for a fence
15:23robclark: right
15:23lina: The explicit sync UAPI puts the fence in your syncobj, then you use the dma-buf API to transfer the fence to the relevant BOs, and the end result is the same as if the kernel had done it all in one go (implicit sync UAPI)
15:23robclark: right
15:24lina: And that works in the host, and it works in the guest within the guest, but it does not synchronize at all between host/guest.
15:24colinmarc: kisak: thanks for the answer. how does that work with stable distros like ubuntu, will noble upgrade to 24.2?
15:24lina: You put the fence in the BO in the guest and the host knows nothing about that fence.
15:24colinmarc: my manjaro box doesn't have 24.2 either, but I guess it will soon
15:25robclark: at one point in time virtgpu did not support implicit sync.. I believe someone was working on that.. defn you'd need virtgpu support for implicit sync, but that doesn't require anything new in terms of protocol or on the host side
15:25robclark: guest side implicit sync becomes explicit sync on the host side
15:25lina: Sigh... that simply does not work...
15:26lina: If an implicit sync protocol puts a fence in a BO in the guest then you can't magically force the receiver side in the host to use explicit sync instead
15:26lina: The host needs the fence in the BO too
15:26robclark: I'll walk you thru it next week, we aren't getting anywhere here
15:26DemiMarie: What is happening next week?
15:27lina: I'll look forward to finding out about all this code that I can't find and which does not work with drm/asahi and which is clearly not needed for drm/msm since as I just confirmed it does implicit sync all in the kmd in the host...
15:27DemiMarie: I think this needs to be handled in a conference call
15:27robclark: I'll be back from vacation next week and in the office
15:28kisak: Routine updates are at the discretion of the package maintainers for individual distros. Specific to Ubuntu, I don't agree with their policies and maintain a PPA to treat mesa devs as first class citizens, but that's a whole can of drama I don't feel like opening today.
15:28kisak: colinmarc: ^
15:28DemiMarie: robclark: next week as in September 23 or September 30?
15:30colinmarc: kisak: apologies, I didn't know I was asking about drama :) just asking practically, it's unlikely for ubuntu noble to get 24.2, then?
15:30robclark: well, this week if the week starts on Sun
15:30lina: OK ^^
15:31colinmarc: kisak: only asking because I need to provide advice for users of my app on ubuntu and I'm a bit confused about how mesa bugfixes filter down
15:32colinmarc: if that advice is "install mesa from a PPA", that's fine
15:32DemiMarie: colinmarc: Can you ship your app as a Flatpak or Snap with bundled Mesa?
15:33colinmarc: I don't currently, but maybe
15:33DemiMarie: robclark: thank you.
15:33DemiMarie: lina: did you see my response on GitLab?
15:34kisak: For Ubuntu LTS, the non-LTS stack eventually gets backported from non-LTS, and separately there's an end-of-series bump for the previously mentioned backport. That makes 2 updates every 6 months.
15:35colinmarc: Demi: can you relink the thread? I'm curious about this topic but it got buried above
15:35DemiMarie: Colin Marc: Which topic?
15:35DemiMarie: Mesa?
15:36colinmarc: implicit/explicit sync over the virtio-gpu boundary
15:36DemiMarie: It is all the recent conversation not part of your discussion.
15:37colinmarc: Yeah, I know it's unrelated, I was just curious if there was a gitlab issue where I could read some background
15:37DemiMarie: I don't want to use quote replies or Matrix threads because those do not translate to the IRC side well at all.
15:38DemiMarie: https://gitlab.freedesktop.org/asahi/mesa/-/issues/43
15:39DemiMarie: Colin Marc: Can you work around the Mesa issue in your app?
15:39colinmarc: Yeah, I'm going to do some gross runtime stuff and disable hierarchical coding based on the mesa version
15:40colinmarc: It'll be fine :)
15:42colinmarc: I think I can even fetch the driver version via vulkan itself with some extension or another
15:43colinmarc: The particularly mean bug that I would really like to backport limits GOP size to 32, though. Which is really rough for bitrate
15:44DemiMarie: Can you backport the fix to Ubuntu’s version and send a patch there?
15:45colinmarc: I don't think ubuntu maintains a separate version, they just... don't update
15:46DemiMarie: I meant as a patch to be applied during Ubuntu package build.
15:47colinmarc: I have a hard time imagining they'd accept that, given it's not a security issue or a crash and no one really cares about vulkan video yet :)
16:23lina: DemiMarie: Replied ^^
16:24DemiMarie: lina: thanks!