06:19 dj-death: jnoorman: landing !34344 today? :)
06:55 jnoorman: dj-death: sorry about the delay. There are still some questions around how we want to handle SSBO offsets in ir3. So what I'll do is pull-out the reviewed NIR commits from !34344 and merge them separately to unblock you :)
06:59 dj-death: jnoorman: thanks!
07:02 karolherbst: mareko: I noticed that radeonsi advertises support for sRGB image stores. Sadly I don't really know what's the expected behavior in OpenGL, but it doesn't match what OpenCL expects, and that is treating the values within the shader as RGB and doing implicit conversion into sRGB space on stores and the opposite for loads. Atm it seems like that
07:02 karolherbst: radeonsi advertises image support for everything it can texture from, so I wonder if that needs to change or not.
07:15 jnoorman: dj-death: !35542
07:22 dj-death: jnoorman: thanks a bunch
07:22 dj-death: jnoorman: I'll probably have to add another callback for accept offsets, I guess drivers will choose what's more conveninent for them
09:00 ccr: tomba, you were spotted at Vanha ;)
09:01 tomba: ccr: Oh? Fellow batmudder? =)
09:11 ccr: tomba, wizard for last 13+ years. I saw you chatting with Zin at the terrace :)
09:20 tomba: ccr: small world =)
09:21 ccr: sometimes it is
12:09 jannau: should there be a DEFINE_LOADER_DRM_ENTRYPOINT for each kmsro display driver? "apple" and "vkms" are missing compared to dril_drivers in src/gallium/targets/dril/meson.build
12:13 jannau: not even sure if this related to the issue the alpine edge user sees with mesa 25.3.1. I looked into this since the users gets "apple supports no extensions (Symbol not found: __driDriverExtensions)"
13:10 mairacanal: lynxeye, could you take a look at https://lore.kernel.org/dri-devel/20250602132240.93314-1-mcanal@igalia.com/? i'd like to push it to drm-misc-fixes, but it would be nice to have your r-b
15:01 hch12907: probably a silly question, but is the minimum required python version for building Mesa still 3.7?
15:03 hch12907: would be nice to bump it to 3.9, so that functions like str.removeprefix and str.removesuffix can be used
16:35 mareko: karolherbst: I don't think our image stores support sRGB, and I don't think GL expects it to be supported
16:35 mareko: same for Vulkan
16:35 karolherbst: radv doesn't have that problem
16:35 mareko: what problem?
16:35 karolherbst: advertising support for sRGB storage images
16:36 karolherbst: radeonsi always sets "PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_SHADER_IMAGE" together
16:37 karolherbst: or rather, both depend on the result of si_is_sampler_format_supported
16:37 karolherbst: so if I call into "is_format_supported" with sRGB and PIPE_BIND_SHADER_IMAGE it gets marked as supported
16:38 karolherbst: dunno if that's also an issue with OpenGL tbh, not entirely sure how it's checked there
16:39 karolherbst: anyway, the end result is, that radeonsi claims to support it for PIPE_BIND_SHADER_IMAGE
16:43 mareko: if there is a way to set the format to srgb in the image store instruction, radeonsi could insert the conversion code there
16:46 mareko: GL rejects SRGB as a shader image format
17:06 alyssa: mareko: we don't generally know the image format when storing though and it'd be a silly thing to emulate unconditionally
17:08 mareko: that's true, but I don't know about OpenCL
17:13 mareko: karolherbst: you can return false from si_is_format_supported if the format is sRGB and bind is a shader image
17:55 karolherbst: mareko: in OpenCL you only know the data type, but not the format
17:55 karolherbst: and yeah.. returning false is what I had in mind
17:57 karolherbst: no idea what needs srgb write support, but it took me like 10 minutes to write the code and it works on the arm based drivers. Not sure about the reason they've added it, but...
18:35 alyssa: karolherbst: because it's more work to disable it than not and if there are no CTS fails nobody notices
18:39 karolherbst: yeah.. the CTS not testing this was my assumption, didn't know it was invalid from a GL perspective
18:40 zmike: GL has a very specific list of formats allowed for storage images
18:41 zmike: see also the table in https://registry.khronos.org/OpenGL/extensions/ARB/ARB_shader_image_load_store.txt
18:48 alyssa: the mobile hw im familiar with can do sRGB image stores, I just don't know why you would
18:56 karolherbst: apparently it was important enough that it was made mandatory for OpenCL 2.0 support :')
18:57 karolherbst: so if I ever want to default to OpenCL C 2.0 or 3.0 instead of 1.2, I have to support it 🙃 which isn't a great reason
18:58 karolherbst: but that also means there are probably applications out there not checking for the ext string, but just for the format and that's a pain to find
19:01 DemiMarie: I think figured out a way to make cross-VM syncobj work.
19:02 DemiMarie: robclark: in the guest you import something that references the host fences and insert them into the guest syncobj
19:02 DemiMarie: when submitting to the host, the guest extracts all of the fences from it
19:03 DemiMarie: then the host userspace searches for a host fence that corresponds to each of the guest fences it received
19:03 DemiMarie: if the guest userspace is well behaved, this will always exist. If not then the guest gets VK_ERROR_DEVICE_LOST.
19:04 DemiMarie: an alternative is to use "fake" (PV) syncobs
19:04 DemiMarie: guest userspace thinks it has a real syncobj backed by guest fences, but it really has a fake syncobj backed by host fences
19:05 DemiMarie: if you try to use the syncobj for anything that needs real (guest) fences, you get an error back
19:05 DemiMarie: but if you don't, and if you conform to the OpenGL/Vulkan specs you won't, you never know that the syncobj you got is not a real one
19:06 DemiMarie: the guest trusts that a host syncobj will signal promptly because it trusts the host, so it's okay for the guest to import a host syncobj
19:07 DemiMarie: s/syncobj/dma-fence/g obviously
19:09 DemiMarie: do you want to have a voice call about this at some point? maybe with alyssa?
19:10 robclark: I'm not entirely sure what you are trying to do.. but when a syncobj is submitted to the guest kernel, the guest fence should already exist (even if it's corresponding host fence doesn't).. that guest fence can be pushed to the host, by the time the host sees it the corresponding host fence already exists. That can be turned into a host syncobj if desired.
19:10 DemiMarie: you can submit a syncobj with no fences
19:10 robclark: drm does not allow that
19:11 DemiMarie: I meant submit via Wayland, not to the GPU
19:11 DemiMarie: this is for cross-VM Wayland
19:12 robclark: sure, but that should use a virtgpu cross_domain context, and all the same rules apply
19:12 DemiMarie: you can send a syncobj via SCM_RIGHTS without having any sync files in it
19:13 jenatali: karolherbst: There are so many things that are mandatory for 2.0 that were not good ideas
19:13 DemiMarie: virtgpu is just a transport here
19:13 karolherbst: jenatali: I know.. that's why I'm curious if anything even uses that one... but it's also trivial to support...
19:14 DemiMarie: robclark: a cross_domain context isn't a real GPU
19:15 robclark: re: SCM_RIGHTS, sure.. but that is only within the guest, and not where it gets passed to host
19:15 DemiMarie: you need something in the cross-domain API that isn't a dma_fence
19:15 DemiMarie: and has no analog in normal DRM uAPIs
19:15 robclark: what should happen is the wl proxy in guest calls virtgpu EXECBUF ioctl with the syncobj as arg
19:16 DemiMarie: and then the host needs to send a syncobj to the compositor
19:17 robclark: hmm, although I suppose the issue is you want to pass to host before the dma_fence exists.. I think what I'd suggest is the proxy should do drm ioctl to wait until syncobj is backed by dma_fence, similar to what vk drivers do
19:17 DemiMarie: robclark: that's a Wayland protocol violation I think
19:18 DemiMarie: the protocol explicitly makes that the responsibility of the host compositor
19:20 DemiMarie: what you could do is have something in virtgpu that is _not_ a dma_fence, but rather a collection of them
19:20 DemiMarie: and that collection is allowed to be empty
19:21 DemiMarie: yeah, what you are proposing is a protocol violation
19:21 DemiMarie: I believe guest userspace is allowed to send a syncobj that will never have a fence added to it, and Wayland requires that requests are processed in-order
19:22 DemiMarie: So what you are proposing would deadlock
19:24 robclark: I mean you might be able to do something with a fake timeline on the host side, but I don't think you know the ordering of the fence that will eventually correspond to the syncobj.. so I think my idea is still better. Either that or teach wl to use dma-fence again
19:26 robclark: DemiMarie: maybe another way to look at it would be to consider the guest wl proxy _as_ the wl compositor.. where the wait happens.. but it is just a kind of nested compositor that, after the wait, fwd's things to the host compositor
19:26 DemiMarie: you would find out the ordering by hooking ioctls on the syncobj
19:27 DemiMarie: robclark: my experience with Qubes is that you don't want to do that kind of stuff
19:27 robclark: but you need the fence to know the fence ctx
19:27 DemiMarie: fence ctx?
19:27 DemiMarie: can't you get that from the fence ioctls in the guest?
19:27 DemiMarie: or what about the fake syncobj idea?
19:27 robclark: the fence timeline/ordering
19:27 DemiMarie: the guest kernel knows every ioctl that happened to that syncobj
19:28 DemiMarie: and can forward it to the host
19:28 robclark: not without changes in drm core and to drm_syncobj.. currently fence is a driver thing, but syncobj is not
19:29 robclark: and guest kernel can't really wait for a host syncobj to be populated with a fence
19:29 DemiMarie: can that population happen for any reason other than the guest doing something?
19:30 DemiMarie: I guess what I am saying is that changes to core and drm_syncobj are needed
19:31 DemiMarie: alternatively, guest userspace can use a different driver for its syncobjs
19:31 DemiMarie: a PV implementation
19:31 DemiMarie: that accepts the same ioctls but forwards them to the host
19:31 DemiMarie: teaching Wayland to use fences won't happen
19:32 robclark: I'd try my idea first.. wl protocol didn't _need_ to be using syncobj instead of fences (and I still think it was a mistake).. we had an experimental proto w/ wl using fences (and android uses fences) with no problems
19:33 DemiMarie: you don't need to support nvidia proprietary
19:33 DemiMarie: that's the thing that led to syncobj
19:33 DemiMarie: desktop does need to support that driver, as much as it would be nice not to
19:33 DemiMarie: and nvidia proprietary requires syncobj
19:34 DemiMarie: the reason for that requirement is that without it you have no CUDA and CUDA has no good open-source alternatives in many ecosystems
19:35 robclark: how are you going to use cuda in a guest?
19:35 DemiMarie: the broader Wayland ecosystem needs to support nvidia proprietary
19:35 DemiMarie: and in the non-virtualized case, syncobj is strictly better
19:35 DemiMarie: so you would need to convince compositors to support a protocol that is only needed for virt
19:36 DemiMarie: also, there is an nvidia ioctl proxy that might be used for cuda
19:37 airlied: karolherbst: didn't CL3.0 back it out of being mandatory, nobody should be advertising CL 2.0
19:37 DemiMarie: telling people to go from syncobj to sync file means telling them to drop support for Nvidia proprietary and that won't fly
19:37 robclark: strictly better has been asserted many times, but in handwavey ways.. anyways, do the wait in the guest wl proxy to get something working. That way if we actually do need something else there is some evidence to back up the claim and reason to spread this mess into guest drm_syncobj
19:38 robclark: protocol could support either
19:38 DemiMarie: I think nvidia proprietary never creates an unsignaled fence
19:38 robclark: anyways, if you do the wait for the guest syncobj to be populated in the guest wl proxy, then the fence can be turned back into a syncobj on the host and no wl compositors are hurt in the process
19:39 DemiMarie: the problem is that you are reordering Wayland requests or potentially deadlocking
19:41 robclark: thread per client to do the waits wouldn't deadlock.. and would be equiv to doing the wait on the app side
19:41 robclark:bbiab
20:16 DemiMarie: robclark:
20:16 DemiMarie: it isn't that the wait deadlocks
20:16 DemiMarie: the client is allowed to send another request, wait for the reply, and only then add a fence
20:17 DemiMarie: so if you don't send the subsequent request until the fence had a syncobj, the client is blocked forever
20:17 DemiMarie: the only way to avoid deadlocks would be to reorder requests, but Wayland doesn't allow that at all
20:34 robclark: but the wl proxy in the guest is the compositor.. it's just a rootless nested compositor
21:04 glehmann: I'm too dumb to replace a nir def properly, does anyone have suggestions? https://gitlab.freedesktop.org/mesa/mesa/-/issues/13364#note_2961000
21:06 glehmann: one solution would be to only do this opt with a single use, then the second loop is unnecessary. I think I originally did that in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24557 before faith wanted me to change it
21:24 alyssa: glehmann: oh, ouch.
21:26 glehmann: I guess iterating in reverse could help, then the new uses get added before the current iteration
21:26 alyssa: glehmann: I think the pattern I've used here is to record an array of uses while walking the linked list in the first loop
21:26 alyssa: and then iterate the array instead of the list for the replace
21:26 alyssa: u_dynarray is kinda clunky but this is probably faster than walking the list twice anyway
21:27 alyssa: util_dynarray_init_from_stack should avoid the silly dynamic allocations
21:27 alyssa: (or, use a fixed size array and bail from the opt if you accumulate too many uses)
21:27 alyssa: (actually do that lol)
21:28 alyssa: then there's never any allocation and it's potentially faster than what we do now ith minimal loss of generality
21:33 jenatali: Or just continue if the def is already the one you were going to replace it with?
21:34 alyssa: ....or that, yeah, lol
21:34 alyssa: nice.
21:35 glehmann: I mean, what would the continue condition look like? there is no way to identify if an use was a previous or a new use in the worse case
21:36 glehmann: think of something like (subgroupExlusiveAdd(a) + a) + a
21:43 jenatali: Oh. Now that I actually pull up the change I see what's going on
21:44 jenatali: I'd probably write this by adding a new inclusive scan op and then you can replace all users of the old one with the new one?
21:44 jenatali: Then you're not adding any new users to the old intrinsic
21:49 glehmann: that sounds quite reasonable, I will probably implement that tomorrow
21:50 jenatali: Cool, sounds good
21:50 jenatali: I do wonder why D3D only ended up with exclusive scans...
22:32 anholt: I'd sure love to see GLSL source passed through zink and into the VK driver.