01:56 gfxstrand[d]: The existance of `CTXSW_ZCULL_BIND` scares me a bit
01:57 gfxstrand[d]: I hope I don't need to invoke FALCON for this
02:02 gfxstrand[d]: I'm gonna hope that's ancient nonsense that I don't need to care about...
02:09 airlied[d]: might need to capture some openrm printfs to see if it gets used
04:32 tiredchiku[d]: marysaka[d]: that mesh shader MR :3
04:32 tiredchiku[d]: gonna be testing it either today or tomorrow
05:53 ahuillet[d]: gfxstrand[d]: of course you need to care
05:54 ahuillet[d]: ifNV2080_CTRL_CTXSW_ZCULL_MODE_NO_CTXSW
05:56 ahuillet[d]: oh, I see it's actually documented in the public headers.
08:41 lighterowl: hi #nouveau , I'm running a Latitude 7680 which has a RTX A1000 (GA107GLM) inside, I've been getting errors like these from time to time lately - https://paste.debian.net/plain/1327925 . this happens even if the nvidia gpu remains unused to the best of my knowledge, i.e. HDMI output is off. does this look familiar to anyone?
08:49 airlied[d]: it's waiting for a fence on some buffer when it goes into powering off the gpu, but the fence never signals
08:50 airlied[d]: can't say I've ever seen it or have any idea what might cause it
09:40 lighterowl: thanks airlied[d]
10:41 karolherbst[d]: gfxstrand[d]: the tldr is that zcull state is context switched and.... we kinda have to manage the buffers, because the firmware won't
10:44 karolherbst[d]: but yeah.. it needs the kernel involved and stuff
10:44 karolherbst[d]: it's also part of nvidia's UAPI afaik
12:01 Ermine: I'm trying to build xf86-video-nouveau and get error messages about non-existent members of struct _PixmapDirtyUpdate in nv_driver.c Where does that structure come from?
12:09 karolherbst: Ermine: I'd advise you to not build it at all, unless you really really really have to (no GL driver)
12:11 Ermine: Well, I just want to upgrade this package in a distro
12:11 karolherbst: delete the package instead
12:11 karolherbst: nobody works on it anymore
12:12 karolherbst: and e.g. fedora defaults to modeset for nv50 and newer GPUs (15 years old)
12:12 karolherbst: *17
12:13 karolherbst: and it also works on older GPUs anyway (unless it's soo old to not have GL at all, in which case you can't use it today anyway anymore)
12:28 Ermine: okay, I've told them this
12:28 Ermine: So I should consider the driver unmaintained?
12:29 karolherbst: more or less yes. Maybe somebody would be willing to fix compilation bugs if they get reported, but new features won't be added, and bugs are also probably never fixed as well
12:29 karolherbst: so moving users towards the modesetting DDX and fix bugs in the GL driver is time better spent
12:34 Ermine: okay, thank you
13:20 gfxstrand[d]: ahuillet[d]: Someone will need to implement that in nouveau first. 🙃
13:25 gfxstrand[d]: Also, ugh...
13:36 gfxstrand[d]: At least we have SWCTX so we can do this through the command buffer instead of having to bang on ioctls.
13:38 gfxstrand[d]: This might be easier if I brought up NVK on OGK or Windows first. 🤔 Then we'd only be stabbing in the dark from one end at a time.
13:38 gfxstrand[d]: But also, maybe it's not that hard to do the kernel bits.
13:38 gfxstrand[d]: 🤷🏻‍♀️
13:50 marysaka[d]: I can at least say that it's trivial to parse the NVRM uapi in Rust as I do that for envyhooks 😄
13:56 karolherbst[d]: gfxstrand[d]: you don't want to do that through SWCTX
13:57 karolherbst[d]: ohh right.. I wanted to check something
14:00 karolherbst[d]: gfxstrand[d]: zcull size depends on the currently bound depth buffer in case you didn't know
14:01 karolherbst[d]: and it operates on 16x16 pixels (which is kinda an inherent rasterizer thing, variable rate shading also operates on 16x16 e.g.)
14:02 karolherbst[d]: there is on-chip memory, which depends on the GPU (not arch, the actual GPU)
14:02 gfxstrand[d]: 😫
14:02 gfxstrand[d]: I was afraid of that
14:02 karolherbst[d]: I have the rought bounds, but not sure how you figure out the actual limit
14:02 karolherbst[d]: *rough
14:02 gfxstrand[d]: At least there are queries, though, right?
14:03 karolherbst[d]: mhhh I wonder if it's a GSP RPC for that
14:03 karolherbst[d]: *if there is
14:03 karolherbst[d]: the bound is per MPC anyway
14:03 karolherbst[d]: ehh
14:03 karolherbst[d]: GPC
14:04 karolherbst[d]: btw, zcull only operates on one depth buffer at the same time
14:04 karolherbst[d]: if you switch the depth buffer, the on-chip memory is spilled into the zcull buffer
14:05 karolherbst[d]: ohh wait...
14:05 karolherbst[d]: I was wrong
14:05 karolherbst[d]: the on-chip size is in Pixel, however, it's not storing pixel data
14:05 karolherbst[d]: there is some compression going on
14:06 karolherbst[d]: and I think you can configure that one
14:06 karolherbst[d]: ehh.. might be done internally actually
14:07 karolherbst[d]: zcull also depends on earlyz, and will only be enabled in certain circumstances if earlyz is disabled
14:08 karolherbst[d]: anyway.. I think that's the tldr
14:09 gfxstrand[d]: I can't believe Nvidia is making me miss Intel's HiZ hardware. 😭🤣🤦🏻‍♀️
14:09 karolherbst[d]: 😄
14:09 karolherbst[d]: isn't that more like EarlyZ though?
14:10 gfxstrand[d]: No, it's a form of depth compression that massively reduces depth buffer bandwidth.
14:10 karolherbst[d]: ahh
14:13 gfxstrand[d]: Instead of storing actual pixel values, it stores either a min/max per-4x4 pixels or a plane per-4x8 pixels. That's enough to pass or fail the depth test without looking at the actual depth buffer most of the time.
14:13 karolherbst[d]: sounds like zcull then
14:13 gfxstrand[d]: Yeah but there's no kernel involvement
14:14 gfxstrand[d]: It's just attached to the depth buffer
14:14 karolherbst[d]: right...
14:14 karolherbst[d]: for nvidia it's mostly because of context switching and knowing where to store the content or something
14:14 gfxstrand[d]: The SWCTX stuff I was looking at yesterday gave me big Imagination vibes.
14:15 gfxstrand[d]: I don't like IMG vibes. 😝
14:15 karolherbst[d]: yeah.. don't use SWCTX if you can avoid it
14:15 karolherbst[d]: SWCTX is usually only useful if you really have to poke at per context MMIO registers
14:15 karolherbst[d]: but that stalls everything else
14:16 gfxstrand[d]: Doing it with SWCTX makes sense if I need to be able to swap it out or reprogram mid command buffer.
14:16 karolherbst[d]: you basically force a specific context to be active, the kernel handles the method, and than unblocks other contexts
14:16 karolherbst[d]: *then
14:16 gfxstrand[d]: Otherwise, I'm basically implementing that in NVK with stalls in my submit thread and ioctls.
14:16 karolherbst[d]: right..
14:16 karolherbst[d]: I think it's easier to let gsp handle that. But I don't think you really have to swap the buffer all that often
14:17 karolherbst[d]: (on the kernel side I mean)
14:17 gfxstrand[d]: If it's just a buffer I allocate and set up once at device or queue init and reuse, ioctls are fine.
14:18 karolherbst[d]: you can also have a global buffer for all contexts 😄
14:18 karolherbst[d]: `NV2080_CTRL_GR_SET_CTXSW_ZCULL_MODE_GLOBAL`
14:19 gfxstrand[d]: That sounds a little sketch
14:19 karolherbst[d]: there is also `NV2080_CTRL_GR_SET_CTXSW_ZCULL_MODE_SEPARATE_BUFFER`
14:19 karolherbst[d]:and
14:19 karolherbst[d]: you can explicitly allow channels to share the state
14:19 gfxstrand[d]: Yeah...
14:20 karolherbst[d]: or you use `NV2080_CTRL_GR_SET_CTXSW_ZCULL_MODE_NO_CTXSW` as suggest, just means the zcull buffer is lost on each context switch
14:20 karolherbst[d]: *suggested
14:20 karolherbst[d]: mhh
14:20 karolherbst[d]: `NV2080_CTRL_CMD_GR_GET_ZCULL_INFO` looks interesting
14:21 karolherbst[d]: `The callee should compute the size of a zcull region as follows. (numBytes = aliquots * zcullRegionByteMultiplier + zcullRegionHeaderSize + zcullSubregionHeaderSize)`
14:22 karolherbst[d]: gfxstrand[d]: btw... you give GSP just the virtual address of the zcull buffer
14:22 karolherbst[d]: soooo
14:22 karolherbst[d]: you could also just VMBIND it...
14:22 karolherbst[d]: but anyway
14:22 karolherbst[d]: you set a buffer per context
14:22 karolherbst[d]: and I think you might have to resize it at some point maybe
14:24 karolherbst[d]: I can't really figure out what that buffer actually contains
14:25 karolherbst[d]: but I suspect it's storing data per depth buffer
14:51 gfxstrand[d]: karolherbst[d]: Yes but someone would have to implement that
14:51 karolherbst[d]: it's the default
14:51 gfxstrand[d]: Okay
14:52 gfxstrand[d]: So we can just not saver/restore for now
14:52 gfxstrand[d]: That's probably okay
14:53 karolherbst[d]: or maybe let me rephrase, I _think_ it's the default
14:53 karolherbst[d]: one might want to verify with NV2080_CTRL_CMD_GR_GET_ZCULL_INFO
14:54 gfxstrand[d]: Is that something I can do from userspace?
14:54 karolherbst[d]: I don't think so
14:54 gfxstrand[d]: IMO, making that work is the first step because I need the strides and stuff anyway
14:54 karolherbst[d]: yeah...
14:55 karolherbst[d]: well _maybe_
14:55 karolherbst[d]: the buffer you tell the kernel about is not the buffer you set via the command buffer. However I think you'd need the info for the command buffer one
14:55 gfxstrand[d]: Yeah
14:56 karolherbst[d]: `NV2080_CTRL_CMD_GR_CTXSW_ZCULL_BIND` + `vMemPtr = NULL, zcullMode = NV2080_CTRL_GR_SET_CTXSW_ZCULL_MODE_NO_CTXSW` is probably how you enforce the "no switching" mode
14:56 gfxstrand[d]: So the one you tell GSP about is just for spilling the on-chip zcull info?
14:57 karolherbst[d]: it's for context switching
14:58 gfxstrand[d]: So cursed
14:58 karolherbst[d]: yeah...
14:59 karolherbst[d]: the command buffer one is afaik per depth buffer, some of the switch RE people were also looking into it
15:00 karolherbst[d]: and I suspect the size of that one is depth buffer split into 16x16 tiles * some factor + some alignment
15:00 karolherbst[d]: there is also some stuff going on if you exceed the zcull on-chip storage
15:06 karolherbst[d]: also what's `LOAD_ZCULL` and `STORE_ZCULL` :ferrisUpsideDown:
15:06 karolherbst[d]: mhhhhh
15:07 karolherbst[d]: gfxstrand[d]: `SET_ZCULL_STATS` I _think_ that might be useful
15:07 karolherbst[d]: and `CLEAR_REPORT_VALUE_TYPE_ZCULL_STATS`
15:08 karolherbst[d]: `SET_REPORT_SEMAPHORE_D_PIPELINE_LOCATION_ZCULL`?
15:08 karolherbst[d]: `SET_REPORT_SEMAPHORE_D_REPORT_ZCULL_STATS0` up to 3
15:08 karolherbst[d]: just the question is what those values will mean
15:08 gfxstrand[d]: karolherbst[d]: I think `LOAD/STORE_ZCULL` rectify the zcull data with the actual depth buffer. There's also a separate `CLEAR`. The macros that the blob calls right after binding the ZCULL buffer either load or clear based on bits passed to them.
15:08 karolherbst[d]: ahh
15:09 gfxstrand[d]: If it's anything like Intel/AMD, the depth hardware doesn't actually write to the depth buffer unless it has to. Instead, it all gets written at the end with `STORE_ZCULL`.
15:09 karolherbst[d]: I see
15:09 karolherbst[d]: kinda makes sense
15:09 gfxstrand[d]: `LOAD_ZCULL` is going to either initialize to some sort of dummy value and then load on first hit or it's going to actually scan the depth buffer and populate the zcull data.
15:10 gfxstrand[d]: And `CLEAR` populates the zcull with the assumed clear value.
15:10 karolherbst[d]: soo.. probably it loads from the bound buffer on each depth buffer swap to fill the on-chip zcull buffer I guess
15:11 gfxstrand[d]: I suspect the on-chip zcull is just a cache of the off-chip zcull buffer. Sort of how stacks work.
15:11 gfxstrand[d]: But IDK why it would need a separate save/restore then. 🤔
15:11 karolherbst[d]: so `_SET_ZCULL_REGION_SIZE` + `LOAD_ZCULL`?
15:11 karolherbst[d]: is that what the blob is doing?
15:11 karolherbst[d]: (and `STORE_ZCULL` before that probably)
15:12 karolherbst[d]: mhhhhhh
15:12 karolherbst[d]: I wonder....
15:12 karolherbst[d]: is the blob doing `LOAD_ZCULL` after binding a new depth buffer?
15:12 gfxstrand[d]: Yes
15:12 karolherbst[d]: soo here is my guess
15:12 gfxstrand[d]: Either LOAD or CLEAR depending on $stuff
15:12 karolherbst[d]: `SET_ZCULL_REGION` is entirely optional
15:13 karolherbst[d]: and on binding a new depth buffer, the on-chip is simply cleared
15:13 karolherbst[d]: as an optimization you can fill it with a previously stored state
15:13 gfxstrand[d]: I very much don't think `SET_ZCULL_REGION` is optional.
15:13 gfxstrand[d]: It defines the render area that zcull applies to
15:14 karolherbst[d]: sure, but the on-chip one doesn't care
15:14 gfxstrand[d]: plus `0xf0` in the width for unknown reasons. 🙃
15:14 karolherbst[d]: the on-chip one is fixed in size
15:14 karolherbst[d]: the compression chosen (the hw does that) just determines how many pixel it can hold.
15:15 karolherbst[d]: I don't see how some buffer in VRAM is relevant here
15:16 karolherbst[d]: ohh yeah.. the `LOAD` + `STORE` just skips the need of recompressing when you swap the depth buffer
15:17 karolherbst[d]: + state
15:17 karolherbst[d]: so yeah.. it entirely sounds optional
15:19 karolherbst[d]: gfxstrand[d]: maybe `NV2080_CTRL_CMD_GR_GET_ZCULL_INFO.zcullRegionHeaderSize`?
15:19 karolherbst[d]: or just width alignment
15:20 karolherbst[d]: `#TPC's * 16` for the width alignment
15:21 karolherbst[d]: but anyway.. I guess you want the zcull stat reporting to work to get feedback
15:22 karolherbst[d]: mhh there is also `SET_REPORT_SEMAPHORE_D_REPORT_ZPASS_PIXEL_CNT64`
15:22 karolherbst[d]: might also help
16:20 gfxstrand[d]: Hrm... The blob doesn't offset its zcull regions with `renderArea`
16:30 karolherbst[d]: I don't think the render area is relevant here... the buffer corresponds to a depth buffer 1:1 and really just contains the content from the on-chip buffer 1:1
16:30 karolherbst[d]: (after saving it)
16:31 karolherbst[d]: and zcull only operates on the on-chip buffer anyway
16:32 karolherbst[d]: (and if the on-chip buffer isn't big enough to cover the entire depth buffer, it only operates on parts of it)
16:36 gfxstrand[d]: They also always set `REGION_SIZE_C.DEPTH = 1`, even for layered render targets
16:37 karolherbst[d]: mhh oh wait.. there is `SET_ZCULL_STORAGE` and there is `SET_ZCULL_REGION`
16:38 karolherbst[d]: but that offset you mentioned above seems to be just for the depth
16:38 karolherbst[d]: `SET_ZCULL_REGION_PIXEL_OFFSET_C_DEPTH`
16:38 gfxstrand[d]: Yes, the region specifies the region of the depth buffer that zcull applies to, more or less.
16:38 karolherbst[d]: mhhh
16:39 karolherbst[d]: I guess that could make sense if the depth buffer is too big and the driver splits it apart
16:39 gfxstrand[d]: The blob sets it to the render target size, aligned, and with some weird padding in X
16:39 karolherbst[d]: ohhh
16:39 karolherbst[d]: yeah...
16:39 karolherbst[d]: maybe use a _huge_ depth buffer and see what happens
16:39 karolherbst[d]: mhh
16:39 gfxstrand[d]: I have tried
16:39 karolherbst[d]: storage is puge
16:39 karolherbst[d]: like 10 MP
16:39 karolherbst[d]: on maxwell it's 4MP
16:40 karolherbst[d]: but the hardware can also deal with having the depth buffer too big
16:40 karolherbst[d]: internal heuristics to get the biggest benefit next draw etc...
16:41 gfxstrand[d]: Oh, I know what happens when you go too big
16:41 gfxstrand[d]: I allocated a 4096x8192 depth buffer
16:41 karolherbst[d]: enable depth stencil zcull
16:41 karolherbst[d]: that reduces the amount of pixels zcull can cover
16:41 karolherbst[d]: (by 50%)
16:41 gfxstrand[d]: In that case, it sets the subregion mode to 16x16x2 (IDK what that does) and leaves a couple of the subregions without zcull.
16:42 karolherbst[d]: mhhh
16:42 karolherbst[d]: what method?
16:43 karolherbst[d]: there is something where the covered region is split into 4 regions, but not sure if that's a driver or hardware thing
16:44 karolherbst[d]: ehh wait
16:44 karolherbst[d]: it's divided into 16 things
16:50 gfxstrand[d]: Yeah, the region is split into 16 subregions
16:50 asdqueerfromeu[d]: karolherbst[d]: Is that megapixels?
16:51 karolherbst[d]: yes
16:52 gfxstrand[d]: Looks like they use a little memory at the end of the zcull storage to store some metadata that the MME reads to decide when it actually needs to load/store
16:53 gfxstrand[d]: They also realy like DREAD/DWRITE. I should probably figure out how those work better
16:54 karolherbst[d]: mhhhh, random thought: if only part of the depth buffer is used in a draw, setting the zcull thing to a specific region probably makes it load/store just that specific region, and also allows for finer coverage (because the internal resolution depends on how many pixels are needed, meaning if the region is smaller, the buffer holds fewer pixels, but at a higher resolution)
16:55 gfxstrand[d]: Yeah, but that can be really tricky to keep in-sync in Vulkan.
16:55 gfxstrand[d]: We had a lot of issues with that on ANV
16:55 gfxstrand[d]: That's probably why they're just always doing the whole first array slice and leaving it at that.
16:56 karolherbst[d]: it would be interesting to see what nvidia does it you use the same depth buffer, but steadily using less and less of it
16:57 gfxstrand[d]: They probably keep using the whole first slice
16:57 karolherbst[d]: maybe?
16:57 gfxstrand[d]: If you change the area that zcull covers, you have to throw away all your zcull data.
16:57 karolherbst[d]: or just use a new buffer
16:57 karolherbst[d]: and keep the old
16:57 gfxstrand[d]: Because it's no longer in-line with the depth buffer content
16:57 karolherbst[d]: ohh wait...
16:57 karolherbst[d]: the depth buffer might change...
16:57 karolherbst[d]: yeah...
16:57 gfxstrand[d]: If you make a new zcull and keep the old, now you have 3 things potentially out-of-sync
16:57 karolherbst[d]: yeah, fair
17:05 gfxstrand[d]: I'm also a little confused by the addresses being used for zcull data. They're totally different from the actual depth buffer. So either a) They're allocating them driver-internal somehow (which has all sorts of issues) or b) They're just not sparse binding those.
17:06 karolherbst[d]: the zcull buffer bound is its own thing
17:07 gfxstrand[d]: Well, yes, but the memory has to come from somewhere. Does it get allocated with the `VkImage`? I kinda think it does.
17:07 karolherbst[d]: mhhh, probably yes
17:07 karolherbst[d]: I mean.. sharing data with a depth buffer is all kind of weird because of PTE kind and stuff
17:08 gfxstrand[d]: They're wildly different addresses but I think that's just because they're sparse binding the ZS data and just using the memory object address range for the zcull
17:08 karolherbst[d]: so they probably do either dedicated allocations or have a slab allocator
17:08 karolherbst[d]: mhhh, maybe?
17:09 karolherbst[d]: doing one allocation for both and just sub-allocating doesn't sound like a bad idea in general
17:09 gfxstrand[d]: You kinda have to with Vulkan
17:09 gfxstrand[d]: Allocating things side-band and getting the aliasing right is REALLY hard
17:10 karolherbst[d]: fair I guess
17:10 gfxstrand[d]: We went round and round with that at Intel trying to figure out how to do CCS-compressed images without a CCS modifier. It was a mess.
17:11 gfxstrand[d]: I think we ended up just special-casing WSI where we know there's a 1:1 mapping from images to memory objects and nothing else gets the optimization.
17:11 karolherbst[d]: heh
17:11 karolherbst[d]: *got reminded to look into MSAA export/import for cl stuff again*
17:11 gfxstrand[d]: Otherwise, it's a disaster of lookup tables and locks and reference counting to try and make aliasing look like it works even though part of your image is driver-allocated.
17:12 gfxstrand[d]: I really should get SWCTX hooked up in our push dumper
17:13 gfxstrand[d]: It's just class 2080, right?
17:13 gfxstrand[d]: Or is it 902d?
17:14 karolherbst[d]: it is its own subchannel
17:14 karolherbst[d]: is nvidia actually using it?
17:15 karolherbst[d]: wait...
17:15 karolherbst[d]: they map an entire subchannel to do GSP operations directly from the push buffer?
17:15 karolherbst[d]: mhhhhhhhhhhhh
17:15 karolherbst[d]: that's actually not a terrible idea...
17:15 karolherbst[d]: just needs entirely new code in nouveau probably
17:16 gfxstrand[d]: II see subch 5 sued for a few things
17:16 gfxstrand[d]: and subch 7
17:16 karolherbst[d]: nouveau uses 7 afaik
17:16 karolherbst[d]: but mhhhhh
17:16 karolherbst[d]: I think mapping to GSP stuff ain't a terrible idea
17:16 karolherbst[d]: because then you don't have to force a switch to a specific channel
17:16 gfxstrand[d]: Yeah, subch 7 looks likely
17:16 karolherbst[d]: you just enqueue a GSP command
17:17 karolherbst[d]: and take the channel id from whatever the interrupt tells you
17:17 karolherbst[d]: or so
17:17 karolherbst[d]: just fetch the data and ack the interrupt and then you unblock other stuff pretty quickly
17:17 gfxstrand[d]: `subch 7 mthd 3fc0` pops up a lot
17:17 karolherbst[d]: the old style SWCTX is poking at context switched MMIO registers directly, soo...
17:17 karolherbst[d]: gfxstrand[d]: yeah.. thing is.. it's all driver defined
17:18 gfxstrand[d]: Oh, sure.
17:18 gfxstrand[d]: I'm just trying to figure out if I can decode the SWCTX commands
17:18 karolherbst[d]: it should be somewhere in the source code
17:18 karolherbst[d]: unless
17:18 karolherbst[d]: GSP handles it
17:19 karolherbst[d]: which I wouldn't be surprised if it does
17:20 karolherbst[d]: might make sense to see how nvidia even binds the subchannel
17:21 karolherbst[d]: mhhh
17:21 karolherbst[d]: `3fc0 / 4` == `ff0` 🙃
17:21 gfxstrand[d]: Yeah, I think something is failing to decode
17:21 karolherbst[d]: nah
17:21 karolherbst[d]: it's probably correct
17:23 karolherbst[d]: since Mary improved the decoder, I think it's covering the entire thing properly
17:24 karolherbst[d]: it's just not unusual to count the methods backwards
17:24 karolherbst[d]: the subchannel swctx id is probably also a negative number
17:24 karolherbst[d]: or something
17:25 karolherbst[d]: there is a bit of odd stuff going on, because it can share the start of the method ids
17:25 karolherbst[d]: e.g. the bind method
17:25 karolherbst[d]: so going backwards is just easier once the part in the front changes
17:26 karolherbst[d]: anyway.. I can probably help with wiring up SWCTX in nouveau (as I've done that at least twice already 🙃 )
17:26 karolherbst[d]: though not sure how much of that changes with GSP
17:42 gfxstrand[d]: Yeah but the whole thing is `HDR 1fff0 subch 7 mthd 3fc0` so the top bit of the header is 0 and I'm not sure what that means.
17:42 gfxstrand[d]: It's not one of NINC/0INC/1INC/IMMD
17:42 gfxstrand[d]: And then we're failing to decode a bunch of other stuff. 😩
17:52 gfxstrand[d]: Okay, updated decoder helps some
17:52 karolherbst[d]: ehh..
17:52 karolherbst[d]: .yeah that kinda looks weird
17:53 gfxstrand[d]: It's the weird tert stuff
17:53 karolherbst[d]: I thought Mary added support for that?
17:53 karolherbst[d]: was that merged?
17:53 gfxstrand[d]: Yeah. I just hadn't pushed a newer decoder to my blob box
17:53 karolherbst[d]: ahhh
17:54 karolherbst[d]: but yeah.. I don't think the tert stuff is tested all that much so random bugs might be expected 🙃
17:54 marysaka[d]: I thought I forgot one form again :sweating:
17:55 gfxstrand[d]: I do see `5c97:0f60` in the stream which isn't documented but I don't think I care about that just yet
17:55 karolherbst[d]: mhhh
17:56 gfxstrand[d]: And some stray 0s
17:56 karolherbst[d]: yeah.. looks undocumented...
17:56 karolherbst[d]: maybe more VPC stuff whatever VPC is
17:57 karolherbst[d]: given there is `SET_VPC_PERF_KNOB` at `0x0f14`
18:17 gfxstrand[d]: Could be
18:18 gfxstrand[d]: In any case, I think I have enough information now that I can start tinkering on Monday. I should probably do Kronos catch-up next week, though.
18:21 marysaka[d]: I think nsight docs have some reference about VPC about it being related to clip/cull
18:22 marysaka[d]: <https://docs.nvidia.com/nsight-graphics/AdvancedLearning/index.html#overview-of-the-gpu>
18:22 gfxstrand[d]: It probably stands for "viewport clip" or something like that
18:25 marysaka[d]: oh I didn't thought about that