00:02 airlied[d]: this is going to run into the problem I solved in e9ba37d9f9a6872b069dd893bd86a7d77ba8c153
00:05 mohamexiety[d]: airlied[d]: what's up?
00:06 airlied[d]: I think you need to revert that to get the bo's to have larger sizes
00:11 mohamexiety[d]: oh damn I am dumb
00:12 mohamexiety[d]: yeah think that would explain things. I completely missed that the _bo.c code always goes for the smallest size
00:24 mohamexiety[d]: So good news
00:25 mohamexiety[d]: That did indeed improve things. The horrific ghosting/trails are gone now and I can actually launch apps
00:26 mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1344103184691499130/IMG_0228.jpg?ex=67bfb123&is=67be5fa3&hm=2b8174c90492497624a69f10f37f06cb70bdfee5f32d4d5acf7f72b2a13f007a&
00:26 mohamexiety[d]: The bad news is they don’t really launch properly. This is the Files app
00:26 mohamexiety[d]: And vkcube is just a black window
00:26 mohamexiety[d]: (With both of them faulting)
00:26 mohamexiety[d]: Will reboot and try console only with debug
00:30 gfxstrand[d]: That's... exciting.
00:31 mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1344104408803180555/IMG_0229.jpg?ex=67bfb247&is=67be60c7&hm=910f632b2162eedf8930f21980e368292575b44bd9b00e766be97095ca6b5c22&
00:31 mohamexiety[d]: Well that’s a bit disappointing. The same deqp test fails with the same issue — mismatched page sizes
00:31 mohamexiety[d]: So I guess it’s not enough in some cases :Thonk:
00:32 mohamexiety[d]: (I don’t think I can try vkcube like this so I just went with the CTS test)
00:37 mohamexiety[d]: gfxstrand[d]: airlied[d] I pushed the change if you want to tinker further. It doesn’t touch uvmm.c since should apply cleanly on top of your own branches. Too tired atm and have to head off but will follow up tomorrow
00:38 mohamexiety[d]: Same branch here https://gitlab.freedesktop.org/mohamexiety/nouveau/-/commits/vm-bind-test-4?ref_type=heads
00:47 gfxstrand[d]: I have no idea what's going on with this fault. The only thing I can get to fault with `HUB/DFALCON` is QMDs and those are reads. not writes. The address that's faulting isn't used anywhere near the code that's blowing up.
00:48 gfxstrand[d]: It's got to be something with either a cache or some sort of state smashing between 3D and compute.
00:53 gfxstrand[d]: Actually... Maybe I found something
00:53 gfxstrand[d]: I round the fault address being referenced by some QMDs
00:54 gfxstrand[d]: But why is it writing to QMDs?!?
01:22 airlied[d]: mohamexiety[d]: so all the allocations appears to have GART in them which forces 4k
01:25 skeggsb9778[d]: ah, yes, that'll be a problem
01:25 skeggsb9778[d]: hmm... i think it'd be ok (for >=fermi, anyway) to relax that a bit...
01:25 skeggsb9778[d]: (only if not compressed)
01:26 skeggsb9778[d]: prior to fermi it's impossible, the virtual address would have to change to swap between 4k/128k pages
01:27 skeggsb9778[d]: i *think* even for compressed, we can deal with it on >=turing easily enough
01:27 skeggsb9778[d]: prior to that, the drm probably needs to know full surface info to be able to migrate etc correctly
01:41 airlied[d]: gfxstrand[d]: I think the concept of LOCAL might be a problem
01:42 gfxstrand[d]: Oh?
01:43 airlied[d]: I'm not sure the kernel's local and nvkmd's local mean the same thing
01:43 airlied[d]: but even without that, the nvkmd local will force gart fallbacks which is 4k
01:43 gfxstrand[d]: Oh, I don't think they do
01:43 gfxstrand[d]: If the kernel is smart, GART fallbacks are okay
01:43 gfxstrand[d]: There are PTE kinds specifically to allow evicting memory
01:44 airlied[d]: did you validate the kernel was smart?
01:44 airlied[d]: like we will always evict memory, the flags more denote whether we will force it back
01:44 gfxstrand[d]: Oh, absolutely not. It has an attempt at implementing some of those smarts but I don't think it is that smart.
01:44 gfxstrand[d]: What do you mean by "always evict"?
01:45 airlied[d]: when we run out of VRAM the kernel can always kick stuff out
01:45 skeggsb9778[d]: the kernel flags are where they can be *used*, everything will still get evicted to system memory under pressure
01:46 skeggsb9778[d]: if the flags say "only vram", they'll be forced back when VM_BIND is called
01:46 gfxstrand[d]: Yeah, we set GART | LOCAL on purpose. We want it to go in VRAM but it's okay if it gets evicted.
01:46 gfxstrand[d]: Unless we can't do that for reasons
01:47 airlied[d]: okay the kernel doesn't treat GART like that
01:47 airlied[d]: GART means make sure I can use this from GART as well as VRAM
01:48 airlied[d]: even if that means bad page sizes
01:49 gfxstrand[d]: There's no reason why we need to force 4k pages in that case.
01:49 gfxstrand[d]: We use 4k pages if it's living in system ram and we can use whatever size pages we want when it's living in VRAM
01:50 gfxstrand[d]: We smash PTE kinds when it's in system RAM to disable compression and we use the compressible PTE kinds when it's in VRAM.
01:50 airlied[d]: that might be how we wished the kernel works, it doesn't currently
01:50 skeggsb9778[d]: yeah, as i said above, i'm pretty sure we can relax that on the kernel side for newer GPUs
01:50 gfxstrand[d]: As long as we also copy the compression tag information when we evict, everything's roses.
01:51 skeggsb9778[d]: fortunately from turing onwards, comptag info is directly tied to vram address
01:51 gfxstrand[d]: Yes
01:51 gfxstrand[d]: I don't care about any of this working nicely pre-Turing
01:52 gfxstrand[d]: If we have to pin things to VRAM, that means we'll probably not be able to compress things unless they're dedicated allocations. 😕
01:53 airlied[d]: why not?
01:53 gfxstrand[d]: Because otherwise we're pinning everything to VRAM and we don't get eviction
01:54 gfxstrand[d]: Or maybe we can create a type that gets used for color targets and hope the app targets things correctly
01:57 skeggsb9778[d]: i haven't looked anytime recently, but doesn't the proprietary vk driver have a bunch of special memory types that're used for things like that?
01:57 skeggsb9778[d]: perhaps that was just needed prior to turing to deal with the weirdness there
01:57 gfxstrand[d]: Yeah, it has a mess of types. I'm not sure what they're all used for.
01:57 airlied[d]: I think they fixed a bunch of those in later drivers post-turing
01:58 gfxstrand[d]: I don't have the blob on any of my boxes right now so I can't check easily
02:00 gfxstrand[d]: But yeah, we can play memory type tricks if we need to.
02:00 airlied[d]: we also need to fix nvk to align the vma base address
02:00 airlied[d]: 0000003fffe64000 0000000000040000
02:03 gfxstrand[d]: That's easy enough to fix
02:04 gfxstrand[d]: Here's the annoying bit of text:
02:04 gfxstrand[d]: > For images created with a color format, the `memoryTypeBits` member is identical for all `VkImage` objects created with the same combination of values for the tiling member, the `VK_IMAGE_CREATE_SPARSE_BINDING_BIT` bit and `VK_IMAGE_CREATE_PROTECTED_BIT` bit of the flags member, the `VK_IMAGE_CREATE_SPLIT_INSTANCE_BIND_REGIONS_BIT` bit of the flags member, the `VK_IMAGE_USAGE_HOST_TRANSFER_BIT`
02:04 gfxstrand[d]: bit of the usage member if the `VkPhysicalDeviceHostImageCopyProperties::identicalMemoryTypeRequirements` property is `VK_FALSE`, `handleTypes` member of `VkExternalMemoryImageCreateInfo`, and the `VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT` of the usage member in the `VkImageCreateInfo` structure passed to `vkCreateImage`.
02:05 gfxstrand[d]: So we can have a special memory type for all color targets or storage images. But we can't look at other details to determine compressability.
02:06 gfxstrand[d]: airlied[d]: We already should be. We just need to make NIL use 64K alignments for compressed images.
02:06 skeggsb9778[d]: probably even if not compressed
02:06 airlied[d]: don't we want large pages anyways?
02:06 skeggsb9778[d]: for the tlb benefits
02:06 gfxstrand[d]: But we can also do something where we use 2M alignment for anything bugger than 2M and 64K for anything not tiny.
02:08 skeggsb9778[d]: what's the earliest GPU nvk supports, however remotely?
02:08 gfxstrand[d]: Kepler
02:08 gfxstrand[d]: But don't tell anyone
02:09 gfxstrand[d]: What's the earliest GPU where I remotely care about perf? Turing.
02:09 skeggsb9778[d]: Well, this is just a simple thing to keep in mind
02:09 skeggsb9778[d]: Kepler, you want 128KiB alignment instead
02:10 skeggsb9778[d]: Anything before Pascal, really
02:10 skeggsb9778[d]: Big pages are bigger there 😛
02:10 gfxstrand[d]: Sure
02:10 gfxstrand[d]: But also I don't care
02:10 gfxstrand[d]: 😛
02:11 gfxstrand[d]: But sure
02:11 gfxstrand[d]: But yeah, if we're going to have alignment code, it should know the correct size of a big page
02:12 airlied[d]: okay I've hacked nvk and the kernel that smoke passes and uses at least one 64k page
02:13 skeggsb9778[d]: now watch cts melt your gpu
02:15 gfxstrand[d]: I really should stop trying to figure out why the Maxwell A SKED cache is busted
02:17 airlied[d]: mohamexiety[d]: drop that last patch, add checks against nvbo->page in the page size finders, then hack userspace https://paste.centos.org/view/raw/92db0571
02:19 airlied[d]: that at least gets me smoke passing with the 64k page, I'm not game enough to throw CTS at it 😛
04:24 gfxstrand[d]: I really should remember to just not test things on Maxwell A....
04:24 gfxstrand[d]: IDK what's up with Maxwell A compute but it's borked.
04:28 gfxstrand[d]: But without clocking up, Maxwell A isn't fast enough to pass all the tests, so... 🤷🏻‍♀️
04:29 gfxstrand[d]: (Which is to say that there are tests which fail because the GPU is too slow.)
05:41 gfxstrand[d]: Maybe one day I'll care enough about Maxwell to figure out how to deal with the SKED cache on MaxwellA/Kepler but it's not today. I gave up and plugged in a Maxwell B and it's running the CTS now.
07:48 mohamexiety[d]: airlied[d]: Understood, thanks so much!
07:48 mohamexiety[d]: I am confused by a few things though. GART is host RAM, right? And the issue is that if nvk says GART the kernel understands that it must live in both VRAM and sysmem. But what’s wrong with _not_ setting GART unless we explicitly need it? The behavior we want is that things live in VRAM and then get paged out if we run out, does missing GART mean this won’t happen?
07:48 mohamexiety[d]: The other thing is, wouldn’t dropping that last patch still lead to all things getting 4k sizing? Since it goes to the smallest supported size
08:02 airlied[d]: yeah gfxstrand[d] seems to think setting GART nearly always is fine and we should fix the kernel, I'm not sure we will get to fixing the kernel in a useful timeframe here, but maybe you can figure it out as part of this
08:03 airlied[d]: I ran the code with the patch dropped, it seemed to work, it should break out of the loop before it gets to page sizes smaller than the bo
08:20 mohamexiety[d]: Yeah I am not sure what’s wrong with not setting LOCAL if the client/app didn’t actually specify it
08:20 mohamexiety[d]: But at least we have something running, yay!
09:02 asdqueerfromeu[d]: mohamexiety[d]: Finally the real nouveau experience™️
12:41 OftenTimeConsuming: I got this crashdump - I wonder if it contains enough information to be useful; https://termbin.com/xmiv
14:07 gfxstrand[d]: If we need to force a few things to live in VRAM, we probably can. I don't know that I love that but it might be okay as a temporary solution if we think we can fix this in Nova.
14:11 gfxstrand[d]: But I would like to try and go as far as we can with nouveau so we can prove it out and find the corners. That way, when we build it for Nova, we can build it right.
14:16 mohamexiety[d]: gfxstrand[d]: I still don't understand the issue, sorry. what is wrong with forcing things to live in VRAM if that's where they would have lived in anyways :thonk:
14:17 mohamexiety[d]: from what I understand as well, if you run out of VRAM, they'll get evicted to sysmem as expected, and when needed they'll get back in VRAM also as expected
14:17 mohamexiety[d]: so it all sounds fine, no?
14:20 gfxstrand[d]: If something is allowed to live in both then, in a high pressure scenario, it gets evicted and we just use it from system RAM. It's slow because of PCIe but it's usable. If something VRAM-only gets evicted, our context cannot run until it's put back.
14:22 gfxstrand[d]: If multiple things in the system are using lots of VRAM-only BOs, we end up thrashing, DMAing things back and forth trying to run all the apps.
14:33 gfxstrand[d]: It's okay to have some things VRAM-only. We just don't want it to be more than, say, 20% of an app's resources.
14:37 gfxstrand[d]: And Vulkan's restrictions on color images and memory type bits make it harder than you might think to push only a handful of images to VRAM.
15:33 mohamexiety[d]: gfxstrand[d]: I guess the context can't run because you can no longer guarantee VRAM-only properties like non-4K pages or such? but this is a bit odd. maybe I am a bit too used to the windows model of things but I was under the impression that when VRAM pressure is too high, things get evicted to sysmem and the app keeps running (albeit much much slower). how do other drivers like Intel/amdgpu
15:33 mohamexiety[d]: handle this?
15:34 gfxstrand[d]: Exactly! And that's why we set the GART flag. Otherwise the kernel won't do that.
15:35 mohamexiety[d]: but the GART flag seems to just enforce host memory restrictions (like 4K pages) even on VRAM, so that sounds... off
15:35 gfxstrand[d]: You have to use 4K pages for system RAM.
15:36 gfxstrand[d]: But those restrictions drop when it's in VRAM.
15:36 mohamexiety[d]: wait isn't the issue we're seeing because the restrictions are still applied even when things are in VRAM (even though the restrictions don't have to be applied)?
15:40 gfxstrand[d]: Yres
15:40 gfxstrand[d]: And that's a kernel issue
15:41 gfxstrand[d]: The fact that the kernel looks at a BO, sees GART, and goes "This BO will always use 4K pages" is a kernel problem.
15:41 mohamexiety[d]: yeah that's the impression I got
15:42 HdkR: 64k page size arm64 kernels and NVIDIA GPUs would be happy at least
15:42 mohamexiety[d]: haha
15:45 mohamexiety[d]: is this a kernel issue in general or just nouveau? :thonk:
15:45 mohamexiety[d]: if it's a kernel issue I'd look at i915/Xe or amdgpu I guess, given those GPUs also have variable page sizes and should™️ handle this properly
15:46 mohamexiety[d]: I think
15:46 gfxstrand[d]: I think it's a nouveau issue
15:47 mohamexiety[d]: in that case I guess the next question is.. how bad would it be to fix this, then? it _sounds_ like a simple thing; we should apply restrictions based on usage/location
15:48 gfxstrand[d]: Not sure
16:51 gfxstrand[d]: With this and the bit I merged yesterday, Maxwell should be back to not being total trash: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33771
16:51 gfxstrand[d]: (Draw calls were busted at the beginning of the week. :blobcatnotlikethis: )
18:38 airlied[d]: I think nouveau concept of picking a nvbo page size is wrong, since afaik the concept of a fixed physical page size isn't a hw thing it's just when you map it
18:39 airlied[d]: So I'd start with experiment from Turing on trying to drop that
18:43 mohamexiety[d]: airlied[d]: hm.. what even uses the nvbo page size?
18:58 airlied[d]: there is some places it's used to validate things
18:58 airlied[d]: it's shorthand for I already checked this bo is aligned and suitable to use at this page size
18:58 airlied[d]: it might be possible to turn it into a bitfield
20:53 airlied[d]: gfxstrand[d]: can you click the marge on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33521
20:54 gfxstrand[d]: Should I read any of the code first?
20:56 airlied[d]: I'd save that energy for when I get the actual timings MR through CTS, damn you PRMT the docs say you aren't special, but you keep acting special
20:58 gfxstrand[d]: Looks the right amount of hacky for a first pass
21:15 gfxstrand[d]: Compilers are hard. 😭
21:17 karolherbst[d]: smh, should have gone with LLVM instead
21:18 gfxstrand[d]: 😛
21:20 gfxstrand[d]: Maybe we should just re-materialize all `load_const` in all the blocks that use them?
21:20 karolherbst[d]: mhhhhh you are so lucky that turing has 32 bit immediate supported in all the alu instructions
21:21 gfxstrand[d]: Even when you don't, burning a register for a long time probably isn't worth it
21:21 gfxstrand[d]: It's kinda okay if it's a ugpr
21:21 karolherbst[d]: I was considering writing a pass that spilled them into the driver ubo, because there you can fetch 32 bit immediates even if the instruction only supports like 20 bits 🙃
21:22 gfxstrand[d]: Yeah...
21:22 karolherbst[d]: it did help a lot
21:22 gfxstrand[d]: But I don't care about Maxwell perf. 😛
21:22 karolherbst[d]: I know
21:22 karolherbst[d]: that's why I said you are lucky 😄
21:22 gfxstrand[d]: hehe
21:22 karolherbst[d]: though
21:23 karolherbst[d]: I think it might matter for fp64
21:23 karolherbst[d]: it does pull 64 bit...
21:23 karolherbst[d]: sometimes
21:23 karolherbst[d]: if the offset is 4 aligned, it only pulls 32 bit
21:24 karolherbst[d]: if it's 8 aligned, it will pull 64 bits
21:24 karolherbst[d]: in case you weren't aware
21:25 gfxstrand[d]: Yeah, I know
21:26 gfxstrand[d]: The unaligned behavior isn't useful
21:26 karolherbst[d]: yeah...
21:27 gfxstrand[d]: Right now I'm fighting the fact that we have things in the nak::from_nir which assume certain things are constants but the LCSSA pass is breaking them because it sticks a phi in the way. :blobcatnotlikethis:
21:27 gfxstrand[d]: IDK why I'm only seeing the fails on Maxwell. It affects everything.
21:30 redsheep[d]: https://tenor.com/view/magic-gif-26166638
21:31 gfxstrand[d]: Oh, because pre-Volta, we lower out of structured and that process gets rid of those phis.
21:31 gfxstrand[d]: Probably?
21:34 redsheep[d]: I'm probably misunderstanding but didn't you just say it fails only on Maxwell, not the other way around? If pre-volta means a pass that gets rid of those wouldn't only Maxwell be working?
21:34 gfxstrand[d]: I meant Volta+
21:40 mhenning[d]: gfxstrand[d]: oh, that might be a regression from the mr that added nak_nir_mark_lcssa_invariants
21:42 mhenning[d]: gfxstrand[d]: maybe try switching back to `nir_convert_to_lcssa(nir, true, true);` pre-volta
21:46 gfxstrand[d]: But can't we still have this problem on Volta+?
21:47 gfxstrand[d]: I don't think anything we're doing right now guarantees the problem doesn't happen there. We're just not hitting it
21:47 mhenning[d]: TBH I'm not sure what specifically you're referring to
21:48 gfxstrand[d]: lcssa inserts phis between a constant and the use of that constant
21:48 gfxstrand[d]: Then `mark_lcssa_invariants` would stick in an `r2ur` as well
21:48 mhenning[d]: I know that, I don't know where we rely on that
21:48 gfxstrand[d]: It only makes the problem worse
21:48 gfxstrand[d]: We rely on it for lea and extract opcodes
21:50 mhenning[d]: Oh, right we don't have lcssa phis on volta by the time we hit nak
21:50 mhenning[d]: since we remove them and regenerate phis
21:51 mhenning[d]: although we could insert an r2ur
21:52 mhenning[d]: I guess we should just chase phis/r2ur in from_nir.rs?
21:52 gfxstrand[d]: We potentially could
21:52 gfxstrand[d]: Or we can just remat constants
21:52 gfxstrand[d]: I've been wanting to do that for a while
21:55 mhenning[d]: Yeah, although I tend to think of that kind of thing as more of an optimization than a required lowering
21:56 gfxstrand[d]: <a:shrug_anim:1096500513106841673>
21:56 gfxstrand[d]: Chasing through phis doesn't seem great, either
21:57 gfxstrand[d]: We could maybe use nir_scalar for it
21:57 gfxstrand[d]: But ugh
21:58 mhenning[d]: yeah, that's true
22:01 gfxstrand[d]: It's an annoying amount of chasing
22:01 gfxstrand[d]: And with all the other control-flow manipulations we have going on, I don't think I trust load_const to survive across blocks
22:02 gfxstrand[d]: I've got the pass written now
22:04 gfxstrand[d]: Ugh... Cell phone teathering is slow.....
22:20 gfxstrand[d]: mhenning[d]: nak/remat-const if you want to shader-db it
22:25 redsheep[d]: gfxstrand[d]: It boggles my mind how many people buy a 5g modem and think they've bought good Internet and get all indignant when something doesn't work and you suggest that's why
22:29 mhenning[d]: gfxstrand[d]: The statistics are:
22:29 mhenning[d]: fossilize-replay: ../src/compiler/nir/nir.c:1132: nir_instr_insert: Assertion `last == NULL || last->type != nir_instr_type_jump' failed.
22:31 gfxstrand[d]: :blobcatnotlikethis:
22:33 gfxstrand[d]: Pushed again
22:39 mhenning[d]: gfxstrand[d]: Totals:
22:39 mhenning[d]: CodeSize: 29665072 -> 29437344 (-0.77%); split: -0.92%, +0.16%
22:39 mhenning[d]: Number of GPRs: 157124 -> 156082 (-0.66%)
22:39 mhenning[d]: SLM Size: 148900 -> 146436 (-1.65%)
22:39 mhenning[d]: Static cycle count: 6840286 -> 6805711 (-0.51%); split: -0.98%, +0.47%
22:39 mhenning[d]: Spills to memory: 177779 -> 173337 (-2.50%)
22:39 mhenning[d]: Fills from memory: 177779 -> 173337 (-2.50%)
22:39 mhenning[d]: Spills to reg: 17692 -> 16731 (-5.43%)
22:39 mhenning[d]: Fills from reg: 12013 -> 11897 (-0.97%)
22:39 mhenning[d]: Max warps/SM: 309128 -> 309456 (+0.11%)
22:39 mhenning[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1344438845503967263/image.png?ex=67c0e9bf&is=67bf983f&hm=dce01179845915ceddf4554b00ab812decd646086398680a3d19be6c1cfab3fe&
22:39 mhenning[d]: Maybe it's easier to read with the colors
22:41 gfxstrand[d]: Looks good to me
22:42 mhenning[d]: yeah
22:42 mhenning[d]: The static cycle count might show fewer regressions once we merge scheduling
22:43 mhenning[d]: Also, you too could have shaderdb statistics at home if you want to review https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33773#note_2800293 and https://gitlab.freedesktop.org/mesa/shader-db/-/merge_requests/107
22:58 gfxstrand[d]: Yeah.
22:58 gfxstrand[d]: I'm on review duty tomorrow, I think
23:00 mhenning[d]: hmm. zcull context switching buffer is about 720 kB per context. Is that big enough that we want to avoid allocating that for contexts that never allocate a depth buffer?
23:01 mhenning[d]: it complicates our uapi if we don't want to allocate one for every context
23:02 airlied: does it have to be kernel allocated?
23:04 mhenning[d]: no
23:04 mhenning[d]: Proprietary does a dance where they query kernel for buffer size, allocate in userspace, then do another kernel call to set the pointer
23:10 airlied[d]: seems excessive to allocate it for every graphics context just in case, I'd prefer userspace remain in charge of allocations though
23:12 mhenning[d]: okay, I can probably allocate it in userspace then
23:19 snowycoder[d]: Is zcull the only thing that needs to be context switched?
23:29 mhenning[d]: no, there are other buffers for context switching
23:29 mhenning[d]: they're mostly handled by the kernel
23:30 mhenning[d]: but eg. an app that only does 2d rendering doesn't need zcull and therefore the context switching buffer doesn't have a benefit there
23:47 gfxstrand[d]: airlied[d]: Yeah. I'm not thrilled by having an ioctl to set a pointer but I'd rather manage it in userspace, too.
23:53 airlied[d]: mhenning[d]: is there a GSP RPC call then the kernel makes to the firmware?
23:53 skeggsb9778[d]: it's just PROMOTE_CTX
23:54 skeggsb9778[d]: OpenRM just doesn't promote zcull ctxsw buffers by default
23:54 airlied[d]: oh so do we need to tell the kernel at all then?
23:54 airlied[d]: or do they promote it in userspace?
23:54 airlied[d]: that'll give us a bit of a chicken and egg problem won't it
23:55 skeggsb9778[d]: i *think* (you might want to double check) it intercepts a PROMOTE_CTX rmctrl from the userspace, does its thing, then passes it onto GSP-RM
23:55 skeggsb9778[d]: you don't have to promote all ctxbufs at once
23:56 skeggsb9778[d]: openrm also does things like not allocating the graphics-specific ctxbufs if you only ever allocate the *_COMPUTE class
23:56 airlied[d]: I don't see a ZCULL in the promote ctx list of buffers
23:56 airlied[d]: NV2080_CTRL_CMD_GR_CTXSW_ZCULL_BIND
23:56 airlied[d]: is separate
23:57 skeggsb9778[d]: ah, yeah, i just found that too
23:57 skeggsb9778[d]: i wonder why...
23:58 airlied[d]: I assume we are only talking about supporting NV2080_CTRL_GR_SET_CTXSW_ZCULL_MODE_SEPARATE_BUFFER