01:20zzoon[m]: airlied: It'll be great for you to review this, when you're available! https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22202
01:23airlied: zzoon[m]: yup thanks for reminder! just crawling out of 3 weeks backlog, but will put near the front of the list :-)
01:42airlied: zzoon[m]: great work, so the intel devices also have the huc firmware, and I think there is some support for having the firmware do the slice decoding
01:43airlied: I do wonder if we should try and add slice decode support to vulkan to avoid having to parse slice headers in the driver
01:53zzoon[m]: Yeah that's what I tried to find but hcp is by defualt and I couldn't manage to see that it works by huc.
01:54zzoon[m]: airlied: I mean, media-driver and vaapi-driver works with HCP by default and I tried to modify it to use huc but failed.
01:54zzoon[m]: for hevc decoding.
01:55airlied: zzoon[m]: do you know if you ever got huc to work at all? it at least needs a kernel parameter I think on gen9
01:56zzoon[m]: Ah..didn't know about the kernel parameter..
01:56airlied: I'm fine with using hevc and just doing the slice header, it would be nice to have the vulkan api support slices though
02:03airlied: zzoon[m]: I think it's enable_guc=2
02:05zzoon[m]: that's good to know...I should've noticed about that.. slice parsing was a pain.
02:06airlied: I wrote h264 slice parsing before, it wasn't fun, then I discovered the short header path
07:16MrCooper: karolherbst: nice
10:15Lynne: what are pipeline image barriers considered in vulkan? compute ops? graphic ops?
10:16Lynne: I ask, because assuming that you set up a simple command buffer with an image barrier and a compute dispatch, does a VkSemaphoreSubmitInfo with a compute + compute wait/signal stages also wrap the image barrier?
10:23pixelcluster: it would cover the image barrier even without the submit because of submission order
10:25Lynne: isn't the whole point of VkSemaphoreSubmitInfo that it allows you to wait on semaphores during the stage you need to wait on them, rather than before execution can begin?
10:27pixelcluster: well that functionality was already in the 1.0 VkSubmitInfo with pWaitDstStageMask
10:28pixelcluster: but that is a bit besides the point of barriers
10:29pixelcluster: if you only need to synchronize within a queue, you only need barriers and no semaphores, even across different submits to that queue
10:30ishitatsuyuki: right
10:30ishitatsuyuki: the granular sync stuff is more like events
10:31ishitatsuyuki: semaphores are for cross-queue sync
10:35Lynne: semaphores are part of the library API, so I can't change them
10:36Lynne: users can use the images in whatever queue they want, and I have to be able to deal with it
10:38Lynne: so is a pipeline barrier included with compute+compute, or do I have to use all_commands_bit in the wait VkSemaphoreSubmitInfo and compute in the signal VkSemaphoreSubmitInfo?
10:44pixelcluster: semaphore signal with compute and semaphore wait with compute should work similar to a compute-compute pipeline barrier (except cross-queue and stuff) afaik
10:44pixelcluster: (except cross queue = except it also works cross-queue)
10:47Lynne: right, that makes sense, but still doesn't answer my question in case it is cross-queue - could the pipeline barrier get executed before the previous submission on a queue finishes
10:53pixelcluster: I would say that depends on the dst stages of the barrier (or the commands after it), i.e. if the pipeline barrier's dst stage has compute in it, the barrier would be guaranteed to finish before the semaphore is signaled
10:53pixelcluster: based on https://registry.khronos.org/vulkan/specs/1.3-extensions/html/chap7.html#synchronization-semaphores-signaling - " In the case of vkQueueSubmit2, the first synchronization scope is limited to the pipeline stage specified by VkSemaphoreSubmitInfo::stageMask. Semaphore signal operations that are defined by vkQueueSubmit or vkQueueSubmit2 additionally include all commands that occur earlier in submission order."
10:55pixelcluster: actually the "additionally include all commands that occur earlier in submission order" is kinda weird, but I think if you put compute in the dstStageMask you're safe either way
10:57pixelcluster: it is weird because you could read it as if the "all commands that occur earlier in submission order" really means all commands, not just limited by the stageMask, but that doesn't make sense because then the stageMask would be meaningless so I don't think that is actually meant here
11:00pixelcluster: hmm I guess it makes sense if you read it as "all commands that occur earlier [than the submitted batch] in submission order"
11:08Lynne: yeah, that sounds a bit in the gray area of the spec, so I'll pick all_commands for the wait stage, thanks
11:59rsalvaterra: Hi, everyone!
11:59rsalvaterra: Alright, who broke my NVAC this time? xD https://paste.debian.net/1277686/
12:00rsalvaterra: karolherbst: Any ideas? :)
13:49jfalempe: tzimmermann, I have working poc to use DMA to copy the framebuffer to VRAM for mgag200. it's not measurably faster, but it will free up some CPU.
13:49jfalempe: I'm currently using a small 32k buffer with dma_map_single() for DMA transfert, but it would be better to use directly the drm_shadow_plane_state->data for DMA.
13:49jfalempe: But I don't find how to do that without copying the data to a dma-capable buffer.
13:55tzimmermann: jfalempe, gem shmem is not made for this, i think
13:56tzimmermann: TBH i'd prefer to avoid such optimizations
13:57tzimmermann: it's a lot of complexity for little gain
14:00tzimmermann: i know that it's the fun stuff to work on. but in terms of maintainence it's pure overhead
14:00jfalempe: it's not that complex, but I was a bit disappointed it's not faster ;)
14:01tzimmermann: i can see two things that might benefit mgag200: irqs and cursors
14:02jfalempe: irq is next on the list, but you need to use DMA to have the softrap irq.
14:02tzimmermann: i once made a pathcset for irq-driven pageflips. it's somewhere on dri-devel. pick it up, if you want to.
14:03jfalempe: ah, thanks, I will look into this.
14:03tzimmermann: there's no vblank irq, but the patchset used the vsync instead. other drivers do that as well
14:05tzimmermann: the other thing is the HW cursor: matrox only supports 16-color cursors. but that might be enough for most compositors. the key here is compositor support, which is missing
14:05jfalempe: you can use bitblit to have 32bit cursor
14:06jfalempe: it is supposed to handle transparency, but I tested only with opaque color.
14:07tzimmermann: that's better done in userspace. the driver shouldn't do compositing.
14:07tzimmermann: but with the 16-color hardware plane, it would make a difference
14:08tzimmermann: IDK if that is really practical to support in userspace
14:12jfalempe: I mean bitblt is a g200 drawing instruction, so it can be used to draw the cursor, the drawback is you have to save what is under the cursor to restore it after it moves.
14:15tzimmermann: jfalempe, I think this is the vblank patch: https://lore.kernel.org/dri-devel/20191205160142.3588-4-tzimmermann@suse.de/
14:16tzimmermann: jfalempe, exactly. that's better left to userspace.
14:16tzimmermann: rule-of-thumb for DRM is: if you can't do it in hardware, you do it in userspace
14:18tzimmermann: and rumor has it, that bitblit isn't that much better, compared to optimized userspace with advanded CPU instructions (SSE, etc)
14:18tzimmermann: i never measured this, though
14:19jfalempe: but bitblit avoid copying data from cpu to vram, which is the slowest thing.
14:20jfalempe: if your cursor image is in VRAM it's much faster than copying the damaged region.
14:24tzimmermann: i can't really argue against that, but i tend to believe daniel's comments on 2d acceleration: https://blog.ffwll.ch/2018/08/no-2d-in-drm.html and using blitting adds asynchronous operations, which requires additonal complexity; plus you'd have to track updates in the driver
14:25tzimmermann: it's not worth the effort. that's what i meant with 'it's a maintenance overhead'
14:26tzimmermann: for other cool HW hacking: mga hardware support zooming. IDK if that's possible, but it might be useable by userspace. Gnome has accessibility features that zoom the display
14:35tzimmermann: and IIRC mga HW supports overlay planes for video output
14:40jfalempe: yes, but I'm not sure what we can do with that.
14:45tzimmermann: there's also irq support on ast HW, but it's not well documented. maybe that's interesting
17:23jenatali: gfxstrand: For that WSI caching change, it seems like the right thing to do is to add a wsi_instance object to contain per-winsys caches. Do you agree?
17:40gfxstrand: jenatali: Ugh... yeah, probably.
17:41gfxstrand: jenatali: Or we could just make it global call_once
17:41gfxstrand: But that's been so fraught
17:41jenatali: If I could add an array of winsys instances to the vk_instance that are on-demand initialized, that wouldn't be too bad
17:42jenatali: But cleanup still means touching every driver, so nevermind
17:43jenatali: Guess I'll start adding it
19:17jenatali: gfxstrand: It looks like vulkan/wsi wasn't supposed to take a dependency on vulkan/runtime, based in the fact that everything is Vk* types instead of vk_* types. However that abstraction seems to have leaked. Is it worth trying to maintain/restore it or should I abandon it?