12:52 mwalle: robertfoss: thanks!
15:04 MrCooper: zamundaaa[m]: seeing a similar performance hit with explicit sync on Intel ADL-P iGPU, so it's not RADV / amdgpu specific
19:26 feaneron: i'm trying to debug a problem with gtk4, vulkan, and webkit, and i'm hitting a wall now, perhaps people around here know something that can help
19:28 feaneron: to start, webkit renders web pages in a subprocess, and sends the rendered buffer using dmabuf with the parent process. it does so by using gbm on the subprocess, pretty stand stuff, and the buffer is imported using e.g. egl.
19:28 feaneron: standard* stuff
19:30 feaneron: gtk4 is able to render dmabufs both on vulkan and opengl. on opengl, it uses the egl dmabuf import routines. on vulkan it does so using VK_STRUCTURE_TYPE_EXTERNAL_MEMORY_IMAGE_CREATE_INFO & family
19:31 feaneron: if available, webkit uses the gtk4 api to import dmabufs
19:31 mattst88: Company: ^
19:32 feaneron: Company is aware of that :)
19:32 mattst88: ah, great
19:32 feaneron: recently gtk4 switched to vulkan by default, and that has caused a problem in webkit: the imported dmabufs are entirely black!
19:33 emersion: feaneron: Vulkan doesn't do implicit sync, maybe that's the issue?
19:33 feaneron: i've debugged webkit and it picks the exact same format, modifiers, number of planes, etc, regardless of gtk4 using vulkan or opengl
19:34 feaneron: on opengl, the dmabuf importing works; on vulkan, it's empty
19:34 feaneron: emersion: hold on, there's a plot twins coming
19:34 feaneron: these failures are true for radv and intel
19:35 feaneron: but, it didn't always fail; assuming e.g. XR24, some modifiers would work, others wouldn't
19:36 emersion: sync issues are a bit random
19:36 emersion: if you add a fat big sleep(), does it work?
19:36 feaneron: after lots of testing, we noticed something: the format+modifiers pairs that failed had dcc modifiers on them (afaics intal calls that ccs)
19:37 feaneron: so much so that importing works if webkit is run with RADV_DEBUG=nodcc / INTEL_DEBUG=noccs
19:38 feaneron: emersion: i've tried that! it doesn't seem to make a difference. i also tried fences, glFinish(), etc
19:38 feaneron: so somehow, this seems to be related to dcc / css (what generic terminology do people use for them?)
19:38 jenatali: Sounds like a missing resolve
19:39 feaneron: missing resolve?
19:39 jenatali: Resolving the compression data into the actual surface data
19:39 emersion: maybe the planes with index>0 aren't imported properly?
19:40 feaneron: sorry, that's not something i'm familiar with. where can i read more about it?
19:44 emersion: jenatali: maybe missing layout transition?
19:44 jenatali: Yeah, that's plausible
19:44 emersion: feaneron: tried vulkan volidation layers?
19:44 emersion: valdation*
19:44 emersion: validation*
19:44 jenatali: Not sure if the dmabuf export extensions let you specify a layout of some kind
19:44 feaneron: yes, no complaints from validation layers
19:45 emersion: there are some transitions that you need to perform when using foregn VkImages
19:46 emersion: you need to VkImageMemoryBarrier from VK_IMAGE_LAYOUT_UNDEFINED to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
19:46 emersion: and then back into GENERAL
19:46 jenatali: Doesn't a layout from undefined discard the contents?
19:47 jenatali: Er, transition from undefined
19:51 zmike: mareko: is there any reason not to use the samplers_as_derefs pipe cap?
19:51 feaneron: emersion: do you know of any existing code that does these transitions?
19:52 emersion: feaneron: https://gitlab.freedesktop.org/wlroots/wlroots/-/blob/master/render/vulkan/pass.c?ref_type=heads#L156
19:53 emersion: wlroots would also be a good source of inspiration for sync stuff and VK_EXT_image_drm_format_modifier
19:55 dj-death: there is some discussion right now with gamescope devs
19:56 emersion: where is that?
19:56 mattst88: feaneron: regarding "resolves", dcc/ccs works by having a secondary surface that holds information like "this whole block in the real surface is the clear color"
19:56 dj-death: using UNDEFINED as oldLayout with modifiers using compression is breaking for use
19:56 zamundaaa[m]: emersion: I've wondered before... why do you need or want image layout transitions with VkImages that just refer to a dmabuf with a modifier?
19:56 mattst88: feaneron: which makes things faster by avoiding reading or writing the real surface
19:56 dj-death: emersion: https://github.com/ValveSoftware/gamescope/issues/356
19:56 dj-death: emersion: also khronos gitlab
19:57 mattst88: feaneron: a "resolve" means to actually write out the data in the secondary surface to the primary surface, so that the primary surface can be used directly without the secondary surface
19:57 dj-death: it looks like various drivers/apps are relying on unspecified behavior
19:57 zamundaaa[m]: Like, I see why it could make sense when you render to a dmabuf, but when reading from it I'd think you wouldn't want to touch anything about it
19:57 dj-death: which could generate hangs on Intel HW
19:57 feaneron: mattst88: i see, i did notice that all faulty formats were multiplanar, this explain it
19:58 emersion: ouch, long discussion is long
19:59 feaneron: Company: ^^^ if this interests you
19:59 mareko: zmike: do you mean lowering sampler derefs? we haven't done that yet, but if we do, it would probably be better to transition radeonsi and radv at the same time
20:02 zmike: mareko: I mean currently I'm using integer sampler indexing, but using derefs would be more optimal and if there aren't plans to delete that path then I might get to it soon
20:02 mattst88:remembers hearing people talk about "HiZ resolves" years ago at Intel and having zero clue what they meant
20:02 zmike: it's what radeonsi uses currently
20:02 dj-death: mattst88: it's the compressed depth buffer resolve
20:03 mattst88: dj-death: yeah, I understand what it means now. just not in 2013 :)
20:03 dj-death: ah sorry :)
20:04 mattst88: still at least 1000 things about graphics drivers I still feel this way about today though :P
20:04 Company: dj-death, emersion: so what VkImageLayout should I assume for a freshly imported dmabuf?
20:05 dj-death: Company: that's subject to debate atm
20:05 Company: I think this just runs the generic codepath in GTK which uses UNDEFINED
20:05 dj-death: Company: you kind of need to know in what was used last to generate the data
20:06 Company: oof
20:06 dj-death: but literally anything but UNDEFINED for use with modifiers
20:06 dj-death: because that will whack the data on some HW
20:07 dj-death: because not initializing compression data can lead to hangs on some generations
20:07 dj-death: so we have no choice but avoid it
20:07 jenatali: FWIW D3D generally expects our "COMMON" layout to be used for any cross-API stuff, which I think would map to GENERAL for Vk
20:08 dj-death: yeah GENERAL would be a good idea
20:08 dj-death: Company: also be careful to image creation parameters
20:09 dj-death: Company: they need to match between importer/exporter
20:09 dj-death: including usage flags
20:09 Company: dj-death: the exporter is GL
20:09 Company: so...
20:09 dj-death: ah okay, with modifiers that should be fine-ish
20:10 Company: feaneron: gsvulkanimage.c:828 - try setting that to VK_IMAGE_LAYOUT_GENERAL and see if that helps
20:11 zamundaaa[m]: This unspecified mess sounds a lot like implicit modifiers all over again
20:11 Company: but yeah, I never considered anything about this
20:12 Company: zamundaaa[m]: to me that sounds like the whole dmabuf import/export thing wasn't thought entirely through - because it was only spec'd out as far as necessary, but not as far as people were inevitably gonna drive it
20:14 feaneron: Company: it works
20:14 feaneron: proof → https://i.imgur.com/mhvL3vL.png
20:15 feaneron: Company: i can send a gtk patch if you want
20:16 Company: feaneron: yes - make sure the commit message references https://github.com/ValveSoftware/gamescope/issues/356
20:16 Company: "for a longer discussion, see"
20:16 feaneron: okay, my commit message will probably suck but i'll do it anyway :P
20:16 Company: feaneron: also, get mcatanzaro to test on intel
20:17 feaneron: mcatanzaro doesn't use intel, i'll try and get someone else
20:17 Company: feaneron: commit message is something like "UNDEFINED means that the data can be discarded and we don't want to discard it. Because the spec is unclear (see $link for a discussion), err on the side of caution and use GENERAL"
20:18 Company: "Fixes import failures with webkit, see $webkit-bug-if-vailable"
20:18 feaneron: okay i was writing something along these lines. glad my understanding is not too wildly off :)
20:19 Company: feaneron: so, on our previous 50/50 discussion about it being a GTK issue or Mesa issue, what do we decide now? 50/50 GTK and Mesa? :D
20:20 feaneron: i'd put it on the infinitesimally small middle point between both, schrodinger style
20:20 emersion: i'd blame Khronos :P
20:20 emersion: thanks for the links
20:20 jenatali: Yeah, I'd blame Khronos too
20:21 feaneron: thanks everyone for the directions
20:21 jenatali: Layout should be part of API or spec for import/export
20:22 Company: the khorons dmabuf stuff is almost entirely done by people in here, or?
20:22 Company: or are Google and/or Nvidia involved in there, too?
20:26 colinmarc: so the issue is transitioning the dmabuf to undefined when releasing it? just trying to figure out if I have this bug. it looks like I'm transitioning to GENERAL after reading
20:26 Company: this was about import
20:27 Company: GTK was setting UNDEFINED for dmabufs on import
20:27 colinmarc: for the dst layout?
20:27 Company: src layout
20:28 colinmarc: sorry if I'm being thick, why does the src layout matter? (I'm doing that too)
20:28 dj-death: I think it's an oversight in the spec
20:28 colinmarc: https://github.com/colinmarc/magic-mirror/blob/main/mm-server/src/compositor/video.rs#L301-L302
20:28 Company: yeah
20:28 dj-death: there is no way to ignore the layout in a queue transfer
20:28 emersion: JoshuaAshton is also saying that the layout should be ignored by the driver
20:28 dj-death: people have been relying on "yeah just don't do anything... juuuust for modifiers"
20:28 dj-death: that's not really in the spec
20:29 dj-death: there is no way to deal with an import with actual undefined content
20:29 dj-death: we just need to add a way to do that
20:29 Company: colinmarc: yeah, that line should probably say GENERAL because UNDEFINED is taken as "can be ignored" and not as "unknown"
20:29 dj-death: and tell everybody...
20:30 colinmarc: huh. I took undefined to mean "whatever it currently is".
20:30 Company: yeah, same here
20:30 colinmarc: I think there's even a validation layer check that it's undefined
20:30 dj-death: yeah on first use
20:34 Company: so, if I want to export the vkImage, what do I transition it to? GENERAL for the layout, but what access and pipeline stage?
20:35 colinmarc: should be NONE for dst on both, right? (also asking because I want to make sure I'm doing it right)
20:36 Company: I was wondering if it needs to be BOTTOM_OF_PIPE or TOP_OF_PIPE
20:36 mareko: zmike: there are potentially plans to remove all derefs paths
20:36 Company: and access I don't have the slightest idea
20:36 dj-death: Company: stage/access for source or destination?
20:37 mareko: zmike: how are derefs more optimal? shouldn't they be the same but clunkier?
20:37 Company: dj-death: for destination right before calling GetMemoryFdKHR() and handing that fd off to external code
20:37 dj-death: Company: for destination stage=VK_PIPELINE_STAGE_ALL_COMMANDS_BIT access=VK_ACCESS_MEMORY_READ_BIT should cover you
20:38 dj-death: unless you also want the thing to be CPU readable, also throw in VK_ACCESS_HOST_READ_BIT
20:38 Company: which might end up being libva or eglImportTex or mmap() or whatever
20:38 dj-death: and VK_PIPELINE_STAGE_HOST_BIT then
20:39 Company: I don't know what the user of the dmabuf is gonna do with it
20:39 colinmarc: is undefined only a problem for dmabuf transfers? or should I not be transitioning from undefined for like color attachments and stuff either?
20:40 dj-death: colinmarc: it's only a problem for queue transfers
20:40 dj-death: colinmarc: I mean...
20:41 dj-death: colinmarc: it'll be a problem too if you have some data on a non imported buffer and you essentially discard the content with UNDEFINED
20:41 colinmarc: I've been using it for cases where I don't know the layout (or I just want the code to be a bit more general)
20:41 colinmarc: ie as "unknown"
20:41 Company: UNDEFINED means DONT_CARE
20:41 colinmarc: right, that's how I understood it
20:42 Company: about the data
20:42 Company: not about the layout
20:42 colinmarc: I guess I don't get how transitioning from UNDEFINED would discard anything
20:43 colinmarc: but if that's the case I need to change a few things :)
20:43 dj-death: colinmarc: in most case it doesn´t
20:43 Company: the driver is allowed to "optimize" things
20:43 dj-death: colinmarc: on Intel there might be a hidden compressed buffer associated with an image
20:43 dj-death: colinmarc: that's what gets reset on UNDEFINED
20:44 Company: ultimately it means you make the driver take the same codepaths as it would for a new buffer
20:44 colinmarc: ahhh that makes sense
20:44 colinmarc: like you're signaling initialization
20:45 Company: yeah
20:45 Company: at least that's how I interpret it
20:46 colinmarc: man I hope this fixes that weird nv bug I've been having
20:53 DemiMarie: dj-death: what happens if the modifiers do not match? Is it a security problem or will it just result in junk being rendered?
20:55 DemiMarie: I’m asking because the child process is sandboxed.
21:03 dj-death: DemiMarie: yeah undefined
21:03 dj-death: DemiMarie: garbage if you're lucky
21:03 dj-death: DemiMarie: more likely failure to import or something because of potential size differences
21:04 dj-death: DemiMarie: and then if you have matching sizes but compressed/uncompressed mismatch, possibly hangs
21:05 zmike: mareko: well right now I have some really awful code to turn non-derefs back into derefs anyway, and being able to delete that and just have derefs the whole way through would be more optimal in every way
21:17 zmike: eric_engestrom: nice work!
21:17 zmike: 🚢 🚢 🚢
21:27 DemiMarie: dj-death: can it cause memory corruption (on CPU or GPU)? The concern here is a child process passing a malicious DMA-BUF to compromise the parent.
21:31 eric_engestrom: zmike: thanks! 🤗
21:46 jenatali: zmike: I've been meaning to add a pipe cap to stop the GL frontend from lowering shared to explicit I/O and keep it in derefs as well. Not sure if SPIR-V wants derefs for that but I think it does?
21:46 jenatali: Right now we're un-lowering those as well
21:49 zmike: uhhhhh
21:50 zmike: yeah...it does...but also it kinda doesn't matter since I think I'm just jamming in a dword array for it
21:50 zmike: so it's less annoying there by far
22:28 jenatali: Oh I guess it's GL so no 16-bit shared
22:28 jenatali: Or 64-bit
22:29 zmike: there is, but I just alias it
22:29 zmike: or maybe that's with CL, I don't remember
22:29 jenatali: Yeah CL can alias, I don't think GL can
22:30 jenatali: Forgot that SPIR-V added aliasing. DXIL doesn't allow that :(
22:31 zmike: I meant the 16/64 bit types
22:42 jenatali: I see