IRC Logs of #dri-devel on irc.freenode.net for 2023-03-31

00:21 alyssa: why is PIPE_CAP_SHADER_CAN_READ_OUTPUTS a thing
00:21 alyssa: set by radeonsi and v3d but I'm skeptical it's load bearing
00:22 alyssa: used only to call io_to_temporaries in... tessellation shaders? I guess?
00:27 alyssa: zmike: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17043
00:27 alyssa: is CAP_DITHERING actually doing things?
00:27 alyssa: on real workloads I mean
00:28 alyssa: (i also don't bother to dither)
00:29 alyssa: PIPE_CAP_DEST_SURFACE_SRGB_CONTROL seems like it's just begging to be deleted
00:30 alyssa: seems to exist only so virgl can do EXT_sRGB without EXT_framebuffer_sRGB on particular host drivers
00:30 alyssa: which really seems like a niche within a niche
00:37 cphealy: I've always been curious how dithering works with the GL API. Seems pretty unclear. Would be nice to have available for cases where one wants to use an RGB565 render buffer (to reduce RAM usage) but want to minimize banding.
01:20 robclark: alyssa: type more native-context drivers and then we don't have to care about virgl (except blob drivers, grmbl)
01:21 robclark: but I think currently PIPE_CAP_DEST_SURFACE_SRGB_CONTROL is probably not a thing that can be removed
01:26 alyssa: robclark: probably not
01:27 alyssa: but it is begging to be deleted nonetheless ;)
01:29 zmike: alyssa: the point of PIPE_CAP_DITHERING is to improve cache hits for drivers that won't dither
01:31 alyssa: zmike: yeah, I got that
01:31 alyssa: is it actually load bearing and not just another cap though
01:31 zmike: shrug
01:32 alyssa: zmike told me at xdc to delete caps like that one
01:32 zmike: no I said delete the ones I'm not using
01:33 alyssa: oh
01:33 alyssa: robclark: sorry PIPE_CAP_DEST_SURFACE_SRGB_CONTROL has gotta go, zmike's not using it
01:33 zmike: ab
01:35 robclark: kinda don't think it works that way ;-)
01:35 zmike: oh do I have to start using it?
01:36 zmike: ok
01:36 alyssa: zmike 2024
02:47 i509vcb: I assume there is a way to generate the docs locally for mesa?
02:56 Lynne: 99% sure the validation layer is faulty when using descriptor buffers
02:56 Lynne: such a waste of time
02:57 Lynne: on less crashy GPUs it reports descriptor indices are null when it's impossible for them to be null, I know, I've checked
02:58 Lynne: on more crashy drivers it just, you know, crashes the GPU
03:02 zmike: descriptor buffer is great as long as you don't have any bugs
03:02 jenatali: That's just Vulkan in general isn't it?
03:03 zmike: oh right
03:08 Lynne: descriptor buffers really are great
03:11 Lynne: it was such a mess before with regular descriptors if you wanted to fully parallelize your pipelines
03:12 Lynne: you had to allocate multiple descriptor sets from the pool, one for each command buffer, which sounds simple, unless you had multiple descriptor sets used in shaders
03:13 jenatali: Lynne: Doesn't descriptor_indexing also just solve that?
03:16 Lynne: I was not aware of it at the time, and I would've hated to go back and mess with the code again to fit that in
03:16 jenatali: Fair
03:18 Lynne: besides, it's ugly and an afterthought, descriptor buffers should've been the standard, nothing's more reliable than raw addresses
03:23 Lynne: sure, it wouldn't have been able to support bindless hardware, but I think some people behind the spec lack hardware standards when it comes to writing standards
03:24 Lynne: *fixed binding hardware
03:26 jenatali: The descriptor set model looks a lot like what D3D12 ended up with, where our binding tiers kind of match, tier 1 is like Vulkan's stock model, tier 3 is like descriptor indexing enabled
03:26 jenatali: Some different caveats though
04:18 Lynne: d3d12 has a much better encoding API than vulkan, I'm jealous
10:25 columbarius: Do gpus have a preferred internal buffer format? The usecase is a filter chain, where each filter would apply sth. to the buffer and pass it on to the next.
10:28 emersion: i don't think there's a generic concept like this, it depends on the use-case
10:32 columbarius: hmm. The goal is to find a usable dsp format for videostreams in pipewire, such that the stream could be converted once at the start and the end of some filterchain, such that filters prefferably write their transformations with this format in mind.
10:33 columbarius: and it should also be efficient on the used gpu, but I guess the answer is: it's complicated and not generalizable ^^`
10:38 emersion: do filters really need to be format-specific?
10:39 emersion: the format that everybody uses everywhere is ARGB8888
10:39 emersion: but for HDR you want 10-bit or fp16
10:42 columbarius: I don't know. The question is, if it is benefficial to propose a common/prefered/cannonical format, which might also work with hdr and is efficiently supported by compute units on the hardware, or if the whole idea is pointless and each filter will formatconvert anyways
10:43 columbarius: the current guess would be float16/32 RGBA
10:49 emersion: float32 is overkill
10:50 emersion: float16 should be good enough, but may not be supported by all hw
10:50 emersion: though you target vulkan so maybe it is
10:53 columbarius: thanks
10:53 emersion: fp16 might use more power/resources than ARGB888
10:54 emersion: 8
10:55 hays: is there a 1:1 substitution for GL/gl.h and GL/glx.h using GLES/EGL?
10:56 hays: im struggling to understand how these headers interrelate
10:56 columbarius: so gpus are actually optimized for argb8888?
10:59 zamundaaa[m]: columbarius: fp16 simply has twice the amount of data per channel vs argb8888
11:00 HdkR: How efficient output is usually just comes down to how many bits are being output these days. Unless you hit some really ugly edge case in some hardware :P
11:01 columbarius: and there is no intermediate format I guess, so the gpu works on argb8888 or fp16 directly. Precission errors are just accepted/not significant?
11:03 emersion: if you want precision, you need to use more power/resources and pick fp16
11:04 emersion: it's a tradeoff
11:07 columbarius: ok, so the format in use is definded by the shader without the gpu or graphics api converting it to any intermediate thing(don't want to write representation, since this is already taken).
11:10 HdkR: columbarius: The shader output is still a vec4 per colour attachment, but the ROPs give you "free" reinterpretation of that data.
11:11 columbarius: ROP?
11:11 HdkR: hardware blend units on the GPU
11:11 HdkR: (Or software on some GPUs, lets not fall in to the weeds)
11:12 columbarius: thx
11:13 HdkR: I would still love an ivec4 for integer colour outputs but that won't ever be a thing
15:57 gfxstrand: Ugh... create_passthrough_gs has been through a lot....
15:59 gfxstrand: It really should be zink_create_passthrough_gs at this point.
16:09 zmike: I had the same thought when I saw the ci runtimes for the changes
16:09 zmike: if you want to move you have my ab
16:11 gfxstrand: I'm more annoyed that it used to be useful for NVK and now it's not. :P
16:12 zmike: you had ample time to review and opted not to :P
16:25 gfxstrand:madly types a GS builder for vulkan/meta
16:52 MrCooper: anyone happen to know the oldest Intel GPU gen which supports any preemption?
16:55 gfxstrand: Broadwell, I think.
16:55 gfxstrand: Maybe SKL
16:55 gfxstrand: It came in tandem with execlists.
16:56 gfxstrand: Whether or not it works, on the other hand....
16:59 MrCooper: thanks
18:21 Venemo: gfxstrand, cmarcelo, mslusarz do you guys think it's OK to have this field in the common shader info struct? or should we just keep it in radv code? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22222/diffs?commit_id=8a8bd3b8dffc76625d3216745c9598e76e5f84b7
18:39 cmarcelo: Venemo: seems fine, but you likely want to say "dispatch is known to be linear at compile time" or something like, right?
18:42 Venemo: cmarcelo: I can extend the comment to include that
18:52 Lynne: shader_object looks pretty nice
18:53 Lynne: I like how there's a layer to convert objects to pipelines too
18:54 jenatali: Will be interesting to see which drivers want to support it
18:54 Venemo: who doesn't wanna support it?
18:55 jenatali: I don't have any data one way or another, it just seems like it wouldn't be a win in all cases
18:55 jenatali: Dozen won't support it until/unless D3D does something similar FWIW
18:57 Venemo: well, it basically trades off some GPU bound performance (of having all info in a PSO) for some CPU bound performance (thanks to not needing to recompile the same shader between different pipelines)
18:57 jenatali: I guess. Seems like there could've been other solutions for not recompiling, like GPL for example
18:58 Lynne: they say in the blog post that there should be no measurable difference between objects and pipelines on any platform, so I'm on board
18:58 gfxstrand: jenatali: GPL is horrible
18:58 Lynne: "Dispatch calls using compute shader objects must not be measurably slower than dispatch calls using compute pipelines"
18:58 Venemo: GPL still keeps the pre-rasterization stages together so you still had to recompile those + it still had the state in the pipeline object, which shader objects don't need anymore
18:58 gfxstrand: Lynne: Well, that's "no measurable difference" is probably a stretch.
18:59 gfxstrand: The more linking we're able to do the faster you'll go
18:59 jenatali: Yeah, a compute pipeline is a compute shader so it makes sense there that there's not going to be a difference
18:59 karolherbst: well, unless I missed anything, at least on Nvidia hardware it should be all the same in the end, just that pipelines objects might use more memory?
18:59 gfxstrand: karolherbst: Yes, this is ideal for NVIDIA
18:59 gfxstrand: It's serious work for just about everyone else.
18:59 karolherbst: are we even deduplicating shaders in nvk? :D
19:00 gfxstrand: Most of the engineering work for Intel on the compiler side is done. I had to do the horrible MSAA madness for GPL.
19:00 gfxstrand: karolherbst: No, we don't pipeline cache at all.
19:00 gfxstrand: karolherbst: My plan is to rework all that stuff once NAK is ready
19:00 karolherbst: might be something nvk could do, so that reusing the same shader in multiple pipelines is also for free basically
19:00 karolherbst: ahh
19:00 karolherbst: fair
19:00 Lynne: NAK?
19:01 gfxstrand: But, yeah, we don't even pipeline cache right now.
19:01 gfxstrand: Lynne: New nouveau compiler
19:01 karolherbst: :)
19:01 jenatali: FWIW D3D does the shader de-duplication inside pipelines for drivers already. We just still require a pipeline to link them
19:01 gfxstrand: Nvidia Awesome Kompiler
19:01 qyliss: GPL and NAK apparently referring to software makes this a very confusing conversation to follow for casual idlers such as myself :D
19:01 karolherbst: gfxstrand: though I meant if you have the same fp shader, but everything else is different across pipelines, would the pipeline cache stuff figure that out? Or is that too nvidia specific?
19:01 gfxstrand: hehe
19:01 zmike: xdc memes are timeless
19:01 Venemo: NAK = Not Actually a Kompiler
19:01 gfxstrand: karolherbst: Depends on how you set up your keys
19:01 karolherbst: okay
19:01 gfxstrand: karolherbst: If you make the keys orthogonal, you get deduplication.
19:01 karolherbst: nice
19:02 Venemo: I hope you guys manage to find a better name for it before it gets merged
19:02 Lynne: what does nvk currently use then?
19:02 karolherbst: but you also only upload the shader stage once, right? or would that be per pipeline?
19:02 gfxstrand: The big thing with shader object is that there is an 1:1 (almost) mapping from SPIR-V to client-accessible binaries.
19:02 gfxstrand: karolherbst: Once.
19:02 gfxstrand: karolherbst: vk_pipeline_cache
19:02 karolherbst: ahh
19:02 karolherbst: nvm then, then it's all the same for us
19:02 gfxstrand: Yeah, the long-term plan for NVK is that it will be 100% ESO internally.
19:03 karolherbst: kinda lucky we have this super dynamic inter stage configuration thing
19:03 gfxstrand: Maybe do some cross-stage linking stuff if it's useful later.
19:03 Venemo: easy to do this now
19:03 gfxstrand: karolherbst: Yeah, NVIDIA really is the only hardware where all this is easy. Even there, though, MSAA is aparently a bit of a pain.
19:03 karolherbst: MSAA is always pain, so that's expected
19:04 gfxstrand: I don't think it's as much of a pain as the NVIDIA engineers have been claiming it is on the Khronos calls but it's not zero like everything else.
19:04 karolherbst: yeah.. dunno
19:04 jenatali: Seems odd to me that an extension would be designed that's difficult for other hardware to handle, but 🤷
19:05 gfxstrand: jenatali: It's going to be interesting to see how it plays out. We've done the brain work to be pretty sure it's implementable on AMD, Intel, and NVIDIA.
19:05 karolherbst: some applications are really doing a lot of ad-hoc linking
19:05 gfxstrand: It's easy on NVIDIA
19:05 karolherbst: so pipelines were never a good fit
19:05 Venemo: it's difficult for drivers because they were designed to work with pipelines not shaders. it's just a matter of refactoring
19:05 gfxstrand: The biggest pain point on Intel will be the fact that patch control points is now dynamic.
19:06 gfxstrand: Oh, and mesh inputs to FS are different. I don't think anyone has a plan for that yet.
19:06 karolherbst: ah yeah...
19:06 jenatali: Yeah, I'll be following along with great interest
19:06 Venemo: gfxstrand: how are mesh->fs different on intel?
19:08 gfxstrand: Venemo: I don't remember the details, I just know the FS input interface is different.
19:08 Venemo: ah, that must be painful
19:08 Venemo: it is also different on AMD but the difference can be accounted for in register programming
19:09 gfxstrand: Venemo: Worst case, we compile them twice. How bad could that be? It's not like we compile every FS 3x already or anything horrible like that. ;-)
19:10 gfxstrand: Between ESO and descriptor buffer, we've now managed to make Vulkan hell for everybody. :)
19:11 Venemo: oh wow
19:11 zmike: ideally nobody will notice the common contributor to all of these great specs
19:11 Venemo: :D
19:13 gfxstrand: hehe
19:13 Venemo: even it's hell for driver devs though, these things addressed real feedback from the audience, so I think that's a good thing
19:19 karolherbst: I wonder if applications like dolphin-emu will also make use of this
21:09 Lynne: ping on vkGetMemoryHostPointerPropertiesEXT actually properly checking whether memory is importable
21:10 Lynne: currently it says sure for mapped device memory and then buffer creation fails when trying to import it
22:10 zmike: Lynne: not sure if helpful, but I just revived my old lavapipe descriptor buffer implementation for some testing
22:10 zmike: I can make a more official branch with it if you're interested
22:11 Lynne: sure, I can give it a spin
22:12 zmike: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22244
22:24 Lynne: it works! multiplane images too! it's pretty neat
23:19 zmike: cool, hopefully it's useful for you for debugging issues you might come across
23:34 jenatali: Time to go add an implicit GS for polygon point fill mode :(
23:37 alyssa: jenatali: Can you add that to DX12? or does qualcomm hw not do it?
23:37 jenatali: No idea if hardware does it natively, it probably does since it's part of the wireframe feature bit
23:38 jenatali: But it's the most useless feature... I just don't want to introduce CTS fails from flipping on support for wireframe
23:38 alyssa: given you only have 4 IHVs to worry about if they all do it it seems like an easy thing to bolt on to dx12
23:38 jenatali: Easy isn't the only criteria. It also has to be hard to emulate, and useful for apps
23:38 alyssa: Yeah, that's fair.
23:39 jenatali: And this is neither of those