10:15 tiredchiku[d]: hm
10:15 tiredchiku[d]: we don't implement VK_EXT_debug_utils yet, do we
10:16 tiredchiku[d]: oh we do
10:18 tiredchiku[d]: it's just mentioned in nvk_instance.c and not physical_device.c
10:18 tiredchiku[d]: okay
10:32 tiredchiku[d]: what's our equivalent of radv's `radv_bo_create`?
10:32 tiredchiku[d]: is it just `nvk_CreateBuffer`
10:34 pixelcluster[d]: radv's `bo`s don't really correspond to `VkBuffer`s, they correspond to `VkDeviceMemory`
10:36 pixelcluster[d]: I suppose it's roughly `nvkmd_dev_alloc[_tiled]_mem` (it's just that `radv_bo_create` does additional logging for memory tracing utilities)
10:36 tiredchiku[d]: basically I'm poking around trying to implement `VK_EXT_device_address_binding_report` in nvk, using radv as a reference for it, to slowly trey and understand how nvk works
10:43 pixelcluster[d]: hm first thing I'd try would be putting the relevant logging into `nvkmd_dev_*_mem` functions (you can call `vk_address_binding_report` from there just like radv calls it from `radv_bo_create`
10:48 tiredchiku[d]: thank you <3
11:01 tiredchiku[d]: unrelated, but what's going on with the MRs opened here? https://gitlab.freedesktop.org/drm/nouveau/-/merge_requests
11:02 tiredchiku[d]: only 25's seeing activity
11:03 karolherbst[d]: tiredchiku[d]: tldr: it was a failed attempt by me by even allowing MRs
11:03 karolherbst[d]: it's pain to accept and submit anything through that
11:04 karolherbst[d]: the patch submission process is an entire mess atm, and we really should just move drm-misc (where all the patches go through) to gitlab and ask people to submit it there once that's all figured out
11:05 tiredchiku[d]: MR 24 seems useful tbh
11:06 karolherbst[d]: I think something like that went upstream, no? let me check actually
11:06 tiredchiku[d]: was also reverted
11:06 tiredchiku[d]: as of linux 6.10.2, the sg_debug issue still exists
11:06 tiredchiku[d]: I was able to replicate it on my 1660Ti laptop too
11:07 karolherbst[d]: ahh
11:08 karolherbst[d]: Lyude: maybe makes sense to look into this MR if it's correct and apply it to drm-misc?
11:09 tiredchiku[d]: https://github.com/torvalds/linux/commit/a222a6470d7eea91193946e8162066fa88da64c2
11:09 tiredchiku[d]: the revert ^
11:09 karolherbst[d]: oh yeah.. maybe airlied[d] as well ^^
11:09 tiredchiku[d]: if you want me to, I can build a kernel with that patch and sg_debug enabled
11:09 tiredchiku[d]: and test it myself
11:09 karolherbst[d]: yeah, that would be helpful
11:09 karolherbst[d]: thanks
11:11 tiredchiku[d]: oke, will report in a few hours
11:56 marex: so regarding https://gitlab.freedesktop.org/drm/nouveau/-/issues/381 ... setting GSK_RENDERER=gl ... seems to help
11:56 marex: I have no idea what-so-ever why does it help
11:57 marex: maybe GTK is trying to do something only newer hardware reliably supports and that crashes the ancient 880M ?
17:29 gfxstrand[d]: Let's see if I've successfully moved minSampleShading into the MME. That one was "fun"
17:31 gfxstrand[d]: Naively, it requires floating-point math. 😅
17:31 gfxstrand[d]: I think I figured out how to do it with just an integer add, though.
17:32 redsheep[d]: The MME is the command processor, right? So you're reimplementing a bunch of stuff for DGC using the really tiny number of registers and lots of restrictions and such?
17:33 redsheep[d]: Is that just for a path that only gets used with DGC, or is that the new way those things are just going to work for nvk?
17:36 gfxstrand[d]: It's the way it'll always work.
17:37 gfxstrand[d]: The big thing I'm trying to do is use the MME to separate states that need to be combined from multiple places.
17:39 redsheep[d]: That sounds like something that could make perf go 📈
17:40 gfxstrand[d]: `SET_HYBRID_ANTI_ALIAS_CONTROL` as an example needs both `rasterizationSamples`, which comes from dynamic state (or render targets, depending on stuff) and `minSampleShading` which always comes in through the shader compile. The current implementation requires a bunch of state change tracking and then I combine it by pulling from both. With the MME, I can do the combining in a macro and just call
17:40 gfxstrand[d]: that macro when I bind a new shader and call it differently when `rasterizationSamples` changes.
17:42 gfxstrand[d]: gfxstrand[d]: Looking good so far...
17:42 redsheep[d]: I am not sure I follow... Does that mean nvk wouldn't need different compiled shader variants for different levels of msaa?
17:42 gfxstrand[d]: No.
17:43 gfxstrand[d]: It's part of the shader but it's not really compiled in.
17:43 gfxstrand[d]: It's sort of side-band shader state
17:44 gfxstrand[d]: But the place where we get it is `vkCreateGraphicsPipelines()` or `vkCreateShadersEXT()`
17:46 redsheep[d]: Ok so this is more about lowering cpu overhead then
17:46 gfxstrand[d]: There's a couple others like that. Tessellation shaders have some parameters which literally come from the shader source but which we have to program into the hardware separately. They don't actually affect the compiled binary, just the way we bind it. Annoyingly, one of those is whether the triangles go clockwise or counter-clockwise and there's a dynamic state which flips it. So the new MME
17:46 gfxstrand[d]: looks like the old state register but has another bit for "actually, flip it" and I can set most of the tess parameters when the tessellation shader is bound and set the "flip it" bit whenever that dynamic state changes and the macro will sort out the rest.
17:47 gfxstrand[d]: redsheep[d]: Yes and no. I think it will lower CPU overhead. But it also separates concerns a bit which will make device-generated commands way easier. And I want device-generated commands and the CPU commands to take the same path through the hardware whenever possible.
17:48 gfxstrand[d]: The solution on hardware which doesn't have an awesome macro unit is to use OpenCL to compile the same driver code for both the CPU and GPU.
17:51 redsheep[d]: Sounds good. Happy to see more extensions coming online for nvk, gotta keep ahead of turnip 😛
17:54 gfxstrand[d]: This one is the last major piece for D3D12 support. My goal is to show off Starfield at XDC
17:58 redsheep[d]: Wait, what? That isn't on the vkd3d tracker at all... I thought the next limiter was fragment shader interlock for 12_1. Is DGC needed for better 12_0 or something?
18:00 redsheep[d]: Hmm yeah I see that in the optional profile, along with descriptor buffer. Getting starfield working would be cool
18:01 gfxstrand[d]: Oh, interlock... Ugh
18:02 gfxstrand[d]: DGC is required for D3D12 indirect dispatch which starfield uses heavily.
18:02 redsheep[d]: So without that it takes a slow path, I assume?
18:03 redsheep[d]: I haven't tested starfield at all, kind of forgot that game existed
18:05 gfxstrand[d]: I don't think there is a slow path for indirect dispatch without DGC
18:05 asdqueerfromeu[d]: redsheep[d]: I wonder if we're ever going to beat Faith's OG Vulkan driver (ANV) though
18:05 redsheep[d]: Oh ok. Weird that it shows as optional for vkd3d-proton then
18:06 gfxstrand[d]: I think the D3D12 feature might be optional and they just don't advertise it if they don't have DGC.
18:06 gfxstrand[d]: asdqueerfromeu[d]: How far behind are we? There were a few intel-only extensions but not that many
18:06 redsheep[d]: Ah, that makes sense. Hmm I wonder how many of the remaining dx12 titles are ones not able to use a feature they want then
18:07 gfxstrand[d]: <a:shrug_anim:1096500513106841673>
18:07 redsheep[d]: gfxstrand[d]: 30 more extensions would put nvk ahead of anv
18:07 asdqueerfromeu[d]: gfxstrand[d]: About 30 extensions away according to mesamatrix
18:07 redsheep[d]: And still would be 25 extensions behind radv lol
18:09 redsheep[d]: There are a lot of intel and amd ones that probably don't make sense or can't really work, though there are also quite a few nvidia extensions that nvk could maybe have that amd and intel won't implement
18:10 gfxstrand[d]: asdqueerfromeu[d]: You have to remember that I had a team helping me and 8 years to build ANV.
18:10 gfxstrand[d]: I'd say we're doing pretty darn good for mostly just me for 2.5 years.
18:11 gfxstrand[d]: Also, I didn't have to write a compiler for ANV. (Well, okay, I did build NIR in parallel with ANV...)
18:13 redsheep[d]: Yeah I don't think anybody reasonable is going to claim you're not going fast enough. Being "behind" by only a couple dozen is mighty impressive if you ask me.
18:18 gfxstrand[d]: Looking at interlock just for grins and 🤯
18:22 redsheep[d]: If I am not misremembering once interlock and DGC are in the rest of the heavy feature work is mesh shaders and the laundry list of crazy RT related stuff
18:27 redsheep[d]: Oh fragment shading rate is also probably pretty important. Wonder how hard that is
18:29 gfxstrand[d]: Fragment shading rate really shouldn't be bad. It's all R/E but the HW is probably pretty close to the API.
18:32 HdkR: Interlock is going to be very funny :)
18:33 phomes_[d][d]: if there are still features that are suitable for my level then I am happy to work on them
18:33 HdkR: Pretty sure the Switch emulator devs just detected the 100-ish assembly instruction idioms and replaced them
18:36 gfxstrand[d]: Yeah, on NV, it looks like it has this whole side-band per-pixel (or per-sample) matrix of locks that you take. Then there's this `SV_ORDERING_TICKET` thing you use to ensure ordering. I've not figured out all the details yet but I'm starting to get the gist.
18:36 HdkR: Nice
18:36 gfxstrand[d]: I'm guessing SV_ORDERING_TICKET is just a mangled version of `gl_PrimitiveID` or similar.
18:36 HdkR: Understanding how the ordering ticket works is pretty important
18:37 gfxstrand[d]: Not that I expect it to be `gl_PrimitiveID`. But it's some sort of ID coming in from the rasterizer to help keep things in-order.
18:37 gfxstrand[d]: It's probably the thing that's implicitly passed to the blend unit.
18:37 HdkR: As evident by the name ye
18:38 gfxstrand[d]: On Intel, this is all like 2 instructions. 😅
18:39 HdkR: Yea, PSI isn't cheap on both NVIDIA and AMD. A couple hundred instructions going in to the mutex and leaving isn't cheap :P
18:41 gfxstrand[d]: I wonder how many games are actually using it
18:41 HdkR: Probably only emulators
18:41 redsheep[d]: It doesn't seem like a super commonly used feature. I tried to dig for a game that is known to use it and I came up empty.
18:41 gfxstrand[d]: HdkR: IDK that it's that cheap on Intel, either. It's just that everything is cheap if your GPU struggles to parallelize anyway.
18:41 HdkR: haha
18:42 HdkR: I think some D3D12 game uses UAVs which are emulated using PSI?
18:42 redsheep[d]: Way to murder your old hardware with words lol
18:42 HdkR: A Plague Tale maybe?
18:44 HdkR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22250 Nice, has three PC games
18:46 HdkR: It's missing Redream, the Dreamcast emulator. It uses it for OIT emulation :)
18:46 HdkR: Silly PowerVR features
18:48 redsheep[d]: I thought Just Cause 3 was at least somewhat working, strange if that wants interlock
18:48 HdkR: Could be that it's an optional
18:49 HdkR: If the folliage self-intersects improperly, it's probably fine for most people :P
18:49 redsheep[d]: I mean worst case it leads to flickering foliage, right?
18:50 HdkR: Could be, never looked at it
18:57 Lyude: karolherbst[d]: sorry, which MR are you talking about?
18:57 karolherbst: Lyude: https://gitlab.freedesktop.org/drm/nouveau/-/merge_requests/24
18:58 Lyude: karolherbst: gotcha, I can take a look at that one!
18:59 karolherbst: yeah.. would be interesting to know if it actually fixes the issue you I think tried to fix here?
19:00 Lyude: karolherbst: Do you mean the runpm issue w/r/t low memory situations?
19:00 karolherbst: uhh.. maybe?
19:00 Lyude: that one already got fixed, so I think this would be a separate issue
19:01 karolherbst: the one this reverted https://github.com/torvalds/linux/commit/a222a6470d7eea91193946e8162066fa88da64c2
19:01 Lyude: Oh!
19:01 Lyude: that one
19:01 Lyude: yes that might actually be the proper fix for it then
19:55 airlied[d]: I do have another fix for the SG_DEBUG but I'm not sure it's the right fix
19:55 airlied[d]: I need to send out the v2 and maybe we should merge it and cross fingers
19:56 airlied[d]: the proper fix involves ripping up a bunch of nouveau internals to not pass sg tables when it should be passing dma addr
20:41 airlied[d]: https://patchwork.freedesktop.org/patch/608381/