00:58airlied: is nvock in here?
00:59airlied: alyssa, glehmann : just noticed the nir indirect calls were added > a year ago and I don't see any users in the tree
01:00airlied: they also don't seem to have serial support
01:04airlied: pixelcluster: found you :-) ^^
02:05olivial: zink-anv-cml-asan CI job broken for anyone else?
02:05olivial: I'm *pretty* sure it's not touched by my changes
07:49linkmauve: > 22:39:29 K900> I wonder if it makes sense to specify video-only Vulkan devices and then have something bridge that to V4L2
07:49linkmauve: I have exactly that in the works, I can already decode the first frame of some H.264 streams using either ffmpeg or gstreamer’s Vulkan video decoders, tested on the Rockchip rk3588 and on the AllWinner A64 atm.
07:50K900: Very cool
07:50linkmauve: > 22:46:10 K900> Is there literally any relevant hardware with stateful V4L2 codecs though
07:50linkmauve: Besides Qualcomm, Apple also relies on a coprocessor with its own stateful firmware AFAIK.
07:50K900: 3588 is actually the hardware I'm personally most interested in :)
07:51K900: (given that's what's running on my NAS)
07:51linkmauve: K900, it isn’t public yet, once I have a working prototype I intend to find funding to continue the maintainance, but I have almost no experience finding companies interested in that kind of work.
07:52linkmauve: I guess Rockchip could be interested in that, maybe some of the board vendors as well.
07:52K900: Yeah unfortunately not really something I can help with, but maybe Collabora folk would be interested?
07:52K900: Or Radxa
07:53linkmauve: Yeah them too, they surely have customers who could be interested in Vulkan video.
07:54K900: Collabora is working with Radxa on 3588 bringup
07:54K900: Well, is contracted by Radxa is probably more accurate
07:55linkmauve: But first thing first, decoding more than just a keyframe and dealing properly with DPB slots so that I can decode a full video. :)
07:55K900: We have a Radxa guy in NixOS spaces, I can get you in touch if you're ever interested
07:55linkmauve: I indeed am!
07:56HdkR: I'm a customer interesting in Qualcomm Vulkan video :P
07:56HdkR: interested*
07:57linkmauve: HdkR, Vulkan video is stateless, you would have to remap it to a stateful hardware.
07:57K900: Uh oh
07:58K900: I just realized something
07:58K900: They're on matrix.org which is very dead right now
07:58HdkR: linkmauve: Definitely, but it doesn't stop me from being interested in it
07:58linkmauve: Thankfully I’m on XMPP. :)
07:58linkmauve: HdkR, haha. :D
07:59linkmauve: HdkR, I’ve heard Wine is starting to use Vulkan video for decoding, this will be useful in FEX-emu hopefully. ^^
07:59HdkR: It would be nice yea. V4L2 and x86 applications don't play
08:24tzimmermann: vsyrjala, hi. about the deadlock in vblank timer. i've been exploring various ideas to fix the deadlock. the easiest way solution schedules a worker that runs drm_crtc_vblank_handle() outside the hrtimer code. is it allowed to run drm_vblank_handle() outside of the vblank IRQ?
08:44karolherbst: Vulkan question.. if I dispatch a compute shader with VkPipelineShaderStageRequiredSubgroupSizeCreateInfo::requiredSubgroupSize == 8, does the SubgroupSize SPIR-V builtin also have to return 8? Because I see anv returning 32 and I'm wondering if it's my fault or not
09:06glehmann: > If the pipeline was created with a chained VkPipelineShaderStageRequiredSubgroupSizeCreateInfo structure, or the shader object was created with a chained VkShaderRequiredSubgroupSizeCreateInfoEXT structure, the SubgroupSize decorated variable will match requiredSubgroupSize.
09:06glehmann: driver bug
09:07glehmann: as long as 8 is a supported subgroup size, and the stage is in requiredSubgroupSizeStages ofc
09:15dj-death: karolherbst: that should be tested by CTS and anv would fail that test...
09:19dj-death: const nir_lower_subgroups_options subgroups_options = {
09:19dj-death: .subgroup_size = get_subgroup_size(&nir->info, max_subgroup_size),
09:19dj-death: using the NIR value which should be what you pass in requiredSubgroupSize
09:19dj-death: maybe you're putting that pNext in the wrong location
09:19karolherbst: well.. yeah not sure
09:20karolherbst: I saw vk_pipeline.c to pick it up
09:20karolherbst: I suspect it picks the simd32 shader at dispatch time, but not sure...
09:20karolherbst: maybe I should debug more
09:20karolherbst: I chained it on the VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO object
09:20dj-death: from what I remember if you give a fixed size at pipeline creation, the shader is compiled only once at the requested size
09:21dj-death: src/vulkan/registry/vk.xml: <type category="struct" name="VkPipelineShaderStageRequiredSubgroupSizeCreateInfo" returnedonly="true" structextends="VkPipelineShaderStageCreateInfo,VkShaderCreateInfoEXT">
09:21dj-death: looks like the right place
09:23dj-death: we should get that when Anv calls vk_pipeline_shader_stage_to_nir()
09:42karolherbst: okay.. yeah, I'll debug later, just wanted to know if my assumptions are correct or not
09:42karolherbst: thanks for the info!
09:46dj-death: np
09:58karolherbst: uhh.. it was zink calling "nir_lower_subgroups" with a subgroup_size set :'(
10:00karolherbst: dj-death: what are the perf characteristics in regards to subgroup sizes on anv btw. Is it always beneficial to use 8 over 16 over 32 if it fits, or are there situations where simd16 might be faster than sim8?
10:00dj-death: it's complicated :)
10:00karolherbst: I know that iris defaults to 16, just wondering
10:01dj-death: as you go up in size, the register space gets halved
10:01dj-death: so there is that...
10:01dj-death: more spilling because you run out of space
10:01karolherbst: right... sadly with vulkan I can't let the driver decide, and for CL subgroup support I have to know the subgroup size the shader runs at, so I always have to pick a size
10:01dj-death: it's more difficult to allocate because messages to load/store data require bigger chunks of contiguous registers
10:02dj-death: SIMD16 is usually faster than SIMD8
10:02dj-death: but for SIMD32 it's not always the case
10:02karolherbst: mhhh
10:02dj-death: assuming no spilling
10:03karolherbst: I wished vulkan could give me a preferred subgroup size or something
10:03dj-death: even we can't until we compile the shader
10:04karolherbst: so the "best" generic plan would be to create them all and compare metrics the vulkan runtime gives me on those pipelines?
10:04karolherbst: does vulkan even give enough information there
10:04dj-death: yeah
10:04dj-death: it's what we use for shader-db
10:05dj-death: check the spilling first
10:05karolherbst: is it driver agnostic?
10:05dj-death: then cycle count
10:05dj-death: karolherbst: of course not :)
10:05karolherbst: :')
10:06karolherbst: not sure I like the plan to make it per driver in zink
10:07dj-death: maybe you can make an extension to request all shader variants
10:07karolherbst: I have a benchmark here where SIMD8 seems to be faster than SIMD16
10:07dj-death: and have a dispatch parameter to tell which one to use
10:07karolherbst: mhhh
10:07dj-death: yeah it's possible
10:07dj-death: but compiling all variants will sucks compile time wise
10:07karolherbst: maybe I go with lowest first for compute
10:07karolherbst: intel's CL runtime always uses SIMD8 afaik
10:08dj-death: really?
10:08karolherbst: (or SIMD16 on newer hardware?)
10:08dj-death: interesting...
10:08dj-death: yeah Xe2+ only does SIMD16/32
10:08karolherbst: they have a cl_intel_required_subgroup_size extension tho
10:08dj-death: that does anything different from the vulkan extension?
10:09karolherbst: not really
10:09karolherbst: let's you declare a subgroup size in the kernel
10:09karolherbst: core CL is "give me the subgroups ize for this workgroup size"
10:09karolherbst: so you can query how the runtime would behave
10:10karolherbst: but it doesn't let you pick
10:10karolherbst: sadly the CTS is using those queries and tests that what the runtime returns matches what the shader returns
10:10karolherbst: and the only way to model that in vulkan is to set the subgroup size and let zink pick
10:11karolherbst: but picking the optimal size is hard :)
10:14karolherbst: ohh looks like on xe intel's CL runtime stopped using always 8
10:16dj-death: yeah that sounds surprising to me always 8
10:16dj-death: SIMD16 performs better in most cases
10:16karolherbst: even for compute?
10:17dj-death: everywhere
10:17dj-death: pre Xe2, you can only do SIMD16 on FS/CS
10:17dj-death: everything else is SIMD8
10:17karolherbst: kernels in CL tend to be a lot more branching, so I can see why a smaller subgroup size might have less perf issues in regards to divergency
10:18karolherbst: I see
10:20karolherbst: yeah okay in another benchmark SIMD16 is a bit faster
10:21karolherbst: _anyway_ it only matters on hardware with multiple subgroup sizes, so maybe I can add vendor specific selection algos...
10:21karolherbst: so only intel and AMD so far
10:21karolherbst: correctness first anyway
12:46daniels: K900: ‘contracted by Radxa’ definitely isn’t accurate
13:06pendingchaos: can someone add me to gitlab.freedesktop.org/gfx-ci/private-access/shader-db-private for use with compiler work ?
13:06pendingchaos: not sure if I'll use it regularly since we have radv_fossils,
13:06pendingchaos: but maybe I'll run it with radv sometimes to make sure they all compile, or to repro any issues other people have
13:13karolherbst: it's a lot bigger
15:56anholt: pendingchaos: added you to shader-db-private
16:00pendingchaos: thanks
16:21pixelcluster: airlied: the indirect calls users are taking an unexpectedly long time to get merged :)))))
16:23pixelcluster: I'm still on it - the main MR is https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29580 which itself hasn't seen much recent progress, but to get review going I've split off various chunks into other MRs (current one being https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34531) and am trying to get those merged
17:11glehmann: do the zink-lavapipe job emulate fp64?
17:12zmike: no, llvmpipe supports native int64 and fp64
17:20glehmann: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37164 I was wondering why this causes crashes
17:20zmike: VVL errors probably
17:20glehmann: and I now figured out why, I mistakingly enable it as soon as any lower_doubles_options is set
17:21glehmann: but it still shouldn't crash with the unwanted lowering...
17:21glehmann: I guess I need to figure out how to run this locally
17:22glehmann: "let's clean up some nir subgroup stuff", I should have known it wouldn't be so easy
17:53mndrx: Hi, I'm new here and had a question about the meaning of the output when running with LIBGL_DEBUG=verbose. Should I ask here or somewhere else?
17:53K900: Here is probably fine
17:54mndrx: ok thx
17:55K900: Also, as a general rule, https://dontasktoask.com/
17:56mndrx: 👍
17:59mndrx: when running `LIBGL_DEBUG=verbose glxgears` it first says: "using driver i915 for 4" (running on Intel HD Graphics 4600 btw) and then: "pci id for fd 4: 8086:0416, driver crocus"
18:00mndrx: does the 1st refer to the kernel driver and the second the Mesa one or both, cause that kinda confuses me.
18:05mndrx: I think it is using the crocus driver, because it is a Haswell iGPU, but then it also says i915, which makes me think it is using the i915g driver, so I just wanted to know how to interpret the output.
18:07pendingchaos: "i915" is likely printed by loader_get_kernel_driver_name(), which gets that name for libdrm
18:07pendingchaos: which would make it the kernel driver
18:08K900: i915 is the kernel driver
18:10mndrx: Aha, so my initial though was right, anyway, many thanks for answering.
19:18bluetail: Is testing W7500 amdgpu ttm power management (kernel) issues via QEMU passthrough realistic, or does it miss too much hardware state? Each time rebooting the full machine to try yet another kernel param for my w7500 gpu is tedious.
20:37mlankhorst: airlied: I haven't tried the new version on xe yet, but the old patch still applies. If you do another pass can you add the xe patch too?