10:54gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12596
10:55gfxstrand[d]: There. Now airlied[d] has an issue to look at with sadness instead of complaining here regularly. 😝
10:58airlied[d]: Yay 🙂 my perhaps incorrect assessment is newer hw might be better prepared for context saving to vram, but that is from just browsing RM headers
11:52gfxstrand[d]: Yeah. We'll see. From my assessment last year, a bunch of that stuff still applied on Turing.
11:53gfxstrand[d]: But also, I'm not too worried about that. ZCULL surviving a render pass is pretty rare.
11:53gfxstrand[d]: As long as you're not running 5 games side by side, I doubt the context switching stuff matters that much.
11:57snowycoder[d]: Hello, I'm trying to learn the code, how did you reverse engineer the commands in the command buffer? Is there a way to intercept and dump the commands generated by the Nvidia driver?
11:59snowycoder[d]: (I'm trying step-by-step to implement VK_EXT_discars_rectangles, but I haven't seen similar classes in the generated nvidia headers as implemented in the RADV drivers)
11:59snowycoder[d]: If you instead need help on other issues I'm happy to lend a hand
13:30gfxstrand[d]: I'm pretty sure it's window clip: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/headers/nvidia/classes/clc597.h?ref_type=heads#L3740
13:31gfxstrand[d]: For RE, envyhooks
13:31gfxstrand[d]: https://gitlab.freedesktop.org/nouveau/envyhooks
16:20snowycoder[d]: gfxstrand[d]: I was looking for "discard" or "rectangle" and I missed it, thanks!
20:15airlied[d]: mhenning[d]: does zcull require uapi changes or just GSP calls in the kernel to set things up?
20:16gfxstrand[d]: OGK has interfaces for it
20:17gfxstrand[d]: It's done through the software channel, though.
20:17gfxstrand[d]: I haven't dug into the code deep enough to know how they're implemented.
20:17airlied[d]: like we might need to pull the zcull mask out maybe
20:18gfxstrand[d]: But I'm pretty sure that stuff is optional
20:18airlied[d]: but the ZCULL mode/bind seems like in-kernel only
20:18gfxstrand[d]: Like if you don't provide the data, it just flushes and clears around context switches
20:22mhenning[d]: airlied[d]: I believe it requires UAPI to return the result of NV2080_CTRL_CMD_GR_GET_ZCULL_INFO to userspace
20:23mhenning[d]: which probably isn't a big deal, but it does need to be wired up
20:24mhenning[d]: the other half of it is setting up a context switching buffer, which can be done entirely kernel-side, but we probably don't want to use zcull on kernels that aren't zcull aware
20:25gfxstrand[d]: Ugh... I forgot about the zcull info struct. 😫
20:25gfxstrand[d]: Yeah, there's a lot of stuff in there
20:27mhenning[d]: I think (am not sure) that the hardware defaults to NV2080_CTRL_GR_SET_CTXSW_ZCULL_MODE_GLOBAL which means that context switches are broken by default, so we also need to signal to userspace whether zcull is safe to use
20:28gfxstrand[d]: I kinda wonder if some of this stuff wouldn't be easier to develop against the blob kernel. 🤔
20:28gfxstrand[d]: That way we know userspace works when we're testing kernel stuff.
20:28mhenning[d]: The kernel side isn't that complicated, I'm not really worried about that part
20:29mohamexiety[d]: I did get this idea when poking around for the virtual memory stuff but not sure how annoying it'll be to wire up openrm support
20:29gfxstrand[d]: Yeah, I don't know. I haven't looked at it for more than a few hours.
20:29gfxstrand[d]: (openrm, that is)
20:31gfxstrand[d]: mhenning[d]: Yeah but IDK if there's translation required for the ZCULL info. I suppose we can copy+paste from openrm and be pretty sure we get it right, though.
20:32karolherbst[d]: mhenning[d]: I think that only really matters for security reasons though
20:32karolherbst[d]: like...
20:32karolherbst[d]: for vGPU use cases
20:33karolherbst[d]: well.. maybe one could exploit it via webgl or whatever
20:33mhenning[d]: gfxstrand[d]: I don't believe there's translation required, and anyway since it's static per-card, I'm planning to hard-code userspace against the values the blob gets back while I'm doing initial bringup
20:33gfxstrand[d]: That's fair
20:33gfxstrand[d]: That's what I'd do
20:34mhenning[d]: karolherbst[d]: I don't think that's true? I think you can get incorrect rendering if you context switch between two apps with z buffers
20:34karolherbst[d]: I highly doubt that
20:34karolherbst[d]: zcull is way more complex than that
20:35karolherbst[d]: there are internal lookup things going on, and based on the bound depth buffer and stuff
20:35karolherbst[d]: but might be good to check if nvidia flips it on a normal desktop
20:36mhenning[d]: Oh, I hadn't seen any indication that the zcull hardware track the bound depth buffer
20:36karolherbst[d]: it even factors in the size and everything
20:36karolherbst[d]: but
20:36karolherbst[d]: it's a bit more complicated
20:37mhenning[d]: but anyway the blob can't get into that state since either there's no depth buffer or there's zcull context switching set up
20:37mhenning[d]: karolherbst[d]: what is this based on? Reverse engineering? Nvidia documentation?
20:38karolherbst[d]: me knowing too much
20:43gfxstrand[d]: Typically depth test acceleration has some sort of structure that stores polygon information so you can skip reading the depth buffer in the common case. You always have to be able to handle the per-pixel case and pre-populated depth info so there's always an "I don't know, look at the depth buffer" state. It's always safe for the hardware to flush everything out to the depth buffer and put the
20:43gfxstrand[d]: acceleration info in that state.
20:44gfxstrand[d]: So there's *something* you can do on context switch which is guaranteed safe.
20:44karolherbst[d]: yeah, the hardware doesn't really have that much dedicated on-chip storage anyway, there is some compression going on with various levels to make it even fit, and even then it's not always enough
20:46karolherbst[d]: I think the kernel level thing is kinda separate as well
20:46karolherbst[d]: but not sure
20:47mhenning[d]: gfxstrand[d]: Right, the kernel can ask for NV2080_CTRL_GR_SET_CTXSW_ZCULL_MODE_NO_CTXSW which just discards some info on context switch. I don't think that's the default though
20:47gfxstrand[d]: IIRC, there's an address we can hand the kernel or GSP to tell it where to spill the acceleration info instead of flushing and resetting.
20:47karolherbst[d]: yeah
20:48karolherbst[d]: sounds about right
20:48mhenning[d]: gfxstrand[d]: Yes, that's what I meant by "setting up a context switching buffer"
20:48gfxstrand[d]: mhenning[d]: Yeah, it's not a great mode. You'll find your perf on the floor if you get preempted.
20:49gfxstrand[d]: I'm less worried with defaults, though. We can set it to whatever we want.
20:49gfxstrand[d]: Well, I guess if it requires the kernel to set it then we can't.
20:49gfxstrand[d]: Bugger
20:49mhenning[d]: Right, the only reason I bring up the default is that it's what we get right now with an unmodified kernel
20:50karolherbst[d]: didn't anybody here set up interception for those GSP calls?
20:50karolherbst[d]: well.. would just mean patching the open source module
20:51karolherbst[d]: but I don't see them touching it at all, mhh
20:51karolherbst[d]: there might be a way to do RPC calls from userspace...
20:53karolherbst[d]: I know there are a couple of red herrings involved here and you shouldn't overthink it
20:59gfxstrand[d]: Ideally as much as possible is in GSP and we just FALCON it.
21:00karolherbst[d]: it's part of context switching, and it's about context switching the contexts state, and there are a few bits and pieces (well.. it's in the class headers), but there are also a couple of context switched GPU registers and that's the part GSP will touch
21:01karolherbst[d]: looks like in the context header the zcull mode is at offset 0x1c
21:01karolherbst[d]: `NV_CTXSW_MAIN_IMAGE_ZCULL`
21:01karolherbst[d]: `manuals/volta/gv100/dev_ctxsw.ref.txt`
21:01karolherbst[d]: ohh...
21:01karolherbst[d]: I've found something nice
21:01karolherbst[d]: `NV_PGRAPH_PRI_FE_SEMAPHORE_STATE_D_REPORT_ZCULL_STATS0`
21:02karolherbst[d]: `NV_PGRAPH_PRI_FE_SEMAPHORE_STATE_A 0x0040413C`
21:02karolherbst[d]: wasn't aware that stuff exist... interesting
21:02gfxstrand[d]: That sounds potentially interesting
21:02karolherbst[d]: wondering if you can do it via mme
21:02karolherbst[d]: well.. touch it
21:02karolherbst[d]: but you'll probably need to read out via mme as well
21:02karolherbst[d]: `manuals/volta/gv100/pri_fe.ref.txt`
21:03karolherbst[d]: looks like you bind a VA address and let it collect stuff
21:03karolherbst[d]: ohh there is also 3d stuff for it
21:04karolherbst[d]: `NVCB97_SET_REPORT_SEMAPHORE_D_REPORT_ZCULL_STATS0`
21:04karolherbst[d]: maybe it's the same 😄
21:04karolherbst[d]: so yeah.. guess you could check via report semaphores if it's doing something
21:04karolherbst[d]: ohh actually... I think I asked what those counters mean...
21:04karolherbst[d]: you have 4 values, obviously the header don't tell what's what
21:05gfxstrand[d]: Yeah, same as pipeline stats. IDK what all stats we'll get, though.
21:05karolherbst[d]: I know what's what, but I have that under NDA
21:05karolherbst[d]: soo...
21:14gfxstrand[d]: gfxstrand[d]: I did see a bunch of FALCON stuff that looked ZCULL related when I was messing about in September or so.
21:14gfxstrand[d]: But there might still be some mode thing we need to tell the kernel about. 🤷🏻♀️
21:14gfxstrand[d]: And we need the info.
21:52airlied[d]: it would be good to get a trace of the cmds the blob vulkan submits to their kernel driver to see what might be needed
22:26snowycoder[d]: `Passed: 62/288 (21.5%)`
22:26snowycoder[d]: Hey, I did something with VK_EXT_discard_rectangles!
22:27snowycoder[d]: Just need to figure out why 80% of tests fail 😭
23:57airlied[d]: marysaka[d]: just wondering how far coop matrix is?