00:06HdkR: It's kind of funny since mobiles did depth compression for a long time then it vanished, now it's back.
00:06mangodev[d]: skeggsb9778[d]: oh, so it's built-in to the hardware?
00:06mangodev[d]: neat
00:07HdkR: Just like framebuffer colour compression, but with your depth and stencil, woo
00:11mangodev[d]: HdkR: so does this mean that DCC would also be a similar effort?
00:11mangodev[d]: i thought it'd require more manual work on the way texture processing is done in order to work
00:23HdkR: Do they not already have framebuffer compression wired up? :)
00:24mangodev[d]: it's unchecked on an issue checklist somewhere
00:24mangodev[d]: i think it's zmike's "when can i run my games" tracker iirc
00:25HdkR: hah
00:25HdkR: Not necessarily required for playing games, but definitely good to have
02:00gfxstrand[d]: skeggsb9778[d]: How big is the pile of hacks? Because that sounds a lot like some of our missing perf.
02:09gfxstrand[d]: mangodev[d]: Probably some sort of run-length encoding per-gob with a tag that lives somewhere in memory saying how each GOB is compressed. (Or even just whether or not it's compressed.)
02:10gfxstrand[d]: The really good Z compression is zcull, which I still need to review.
02:11mangodev[d]: gfxstrand[d]: i thought that MR was lackluster? or was it just the way it was implemented (like some of the recent NAK MRs)?
02:11gfxstrand[d]: Unclear
02:12gfxstrand[d]: In theory, zcull is way better than simple compression because it's aware of the triangles themselves. In practice, IDK how much it helps if the caches are doing their jobs. It helps a LOT on Intel but bandwidth is shit there.
02:12airlied[d]: it might be for zcull to really shine you need zs compression
02:13gfxstrand[d]: That too
02:20skeggsb9778[d]: gfxstrand[d]: you don't want to use any of this for real, i was just trying to make sure the kernel bits at least sorta work
02:20skeggsb9778[d]: https://gitlab.freedesktop.org/bskeggs/mesa/-/commits/01.00-page-comp
02:21skeggsb9778[d]: vkcube and xonotic will work from a bare X session, but will look *super* funky if you start gnome too
02:22gfxstrand[d]: That's fine. If you can get the kernel bits working I can sort out Mesa.
02:23skeggsb9778[d]: the kernel bits are in the tree i posted before and ok'ish, but can clean them up better now i know what hw needs what
02:27skeggsb9778[d]: this is basically what i see from each step:
02:27skeggsb9778[d]: ```tu106 (xonotic,4k) min/avg/max
02:27skeggsb9778[d]: - base: 570.30 256 583 750
02:27skeggsb9778[d]: - +page: 574.80 252 588 755
02:27skeggsb9778[d]: - +2MiB: 699.14 290 723 993
02:27skeggsb9778[d]: - +zcomp: 761.98 314 791 1150
02:27skeggsb9778[d]: - +ccomp: 792.36 332 825 1210
02:27skeggsb9778[d]: ga104 (xonotic,4k) min/avg/max
02:27skeggsb9778[d]: - base: 633.24 288 644 794
02:27skeggsb9778[d]: - +page: 642.91 290 654 813
02:27skeggsb9778[d]: - +2MiB: 818.81 330 845 1142
02:27skeggsb9778[d]: - +zcomp: 888.66 369 921 1304
02:27skeggsb9778[d]: - +ccomp: 910.17 368 947 1370
02:32HdkR: That's pretty good
02:32gfxstrand[d]: Yeah. I'll happily take +50%
02:32HdkR: Alignment putting in some work :D
03:49mhenning[d]: gfxstrand[d]: There are a few ways to incrementally improve on the zcull in that MR - there are optimizations we can do to prevent it loading/saving the zcull buffers in some cases and the heuristic we use for the subregions doesn't quite match what the blob uses
03:50mhenning[d]: but also I literally only tried one game with that MR. it's possible some stuff will be harder hit than horizon zero dawn is
04:12gfxstrand[d]: Yeah, it's hugely app-dependamt. I suspect stuff with a Z pre-pass will be the most affected but I don't have a great example off hand.
04:59skeggsb9778[d]: mhenning[d]: where is the current versions of those patchsets again btw?
04:59skeggsb9778[d]: (mesa and kernel)
05:02mhenning[d]: skeggsb9778[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33861 and https://gitlab.freedesktop.org/mhenning/linux/-/commits/zcull2?ref_type=heads
05:03mhenning[d]: both need to be rebased and might have conflicts
05:12skeggsb9778[d]: awesome, thank you. i'll take another look at them soon
05:39chikuwad[d]: hype
06:09mohamexiety[d]: HdkR: Nope. I was working on it but hit some kernel issues a few months ago I couldn’t deal with then moved on to something else. Back at it now tho and things are moving
06:13mohamexiety[d]: gfxstrand[d]: Tbh I think the big thing we will have to deal with is this https://gitlab.freedesktop.org/bskeggs/mesa/-/commit/a2cfc2ecefb6c291f3e40dafaa2e6fe8b8b0467b
06:13mohamexiety[d]: the other stuff isn’t too bad to clean up/sort out but the GART | VRAM vs VRAM only thing is the big thing i don’t really have a good idea for
06:14mohamexiety[d]: And all of this (big pages, compression) will only work with those maps iiuc
06:17x512[m]: GART | VRAM is needed for CPU-mappable memory when large PCI BAR is not available. Without it a lot of software fails.
06:17HdkR: mohamexiety[d]: Ah, I see.
06:41marysaka[d]: We also need to stop uploading stuffs with a memcpy and GART only mapping when possible (shadow of the tomb raider seems to be hurt a lot by that on compute)
06:42marysaka[d]: x512[m]: Yes we need to just handle rebar better
06:45marysaka[d]: marysaka[d]: (QMD on VRAM and uploads via LOAD_INLINE_DATA would be a good start)
07:35skeggsb9778[d]: soo, i just copy+pasted this from OpenRM
07:35skeggsb9778[d]: https://gitlab.freedesktop.org/bskeggs/nouveau/-/commit/ded76d7046ba291402a7ffcbde41037bcfc8e2c6
07:36skeggsb9778[d]: the tu106 seems to need a vbios update or something to support it
07:36skeggsb9778[d]: more surprisingly, the ad104 and gb203 already seem to boot with BAR1 at the same size as VRAM
07:37chikuwad[d]: turing doesn't support ReBAR
07:37chikuwad[d]: or, well, rebar on turing is locked out by the vbios and there's no enabling it
07:37skeggsb9778[d]: https://nvidia.custhelp.com/app/answers/detail/a_id/5165/~/nvidia-resizable-bar-firmware-update-tool
07:37skeggsb9778[d]: that seems to say otherwise?
07:37chikuwad[d]: yeah, that's for ampere and above
07:38chikuwad[d]: not turing
07:38skeggsb9778[d]: ah, doh!
07:38chikuwad[d]: > This update is only available for RTX 30 Series GPUs.
07:38skeggsb9778[d]: but, anyway, that patch should in theory resize the BARs for GPUs that don't boot with them already bigger
07:39skeggsb9778[d]: and nvkm already handles the bigger BAR by default
08:05gfxstrand[d]: mohamexiety[d]: Oh, that's easy enough to sort out
08:05gfxstrand[d]: We just have to get creative with heaps/types
08:07gfxstrand[d]: Basically, we add a new type that is non-evictable and require that type for all compressed images. We only enable compression on images that are flagged as color or Z/S targets. Then not too much stuff should get directed at that type and we should be fine.
08:10mohamexiety[d]: gfxstrand[d]: Hmm fair :thonk:
08:11mohamexiety[d]: And this wouldn’t diminish the perf gains by e.g. not having most things non compressed right?
08:13gfxstrand[d]: I don't think compression really gains you much (or even really works) for block-compressed images which is 95% of your textures
08:13mohamexiety[d]: I see yeah
08:14marysaka[d]: tbh we need to do testing and see how it goes but yeah :aki_thonk:
08:15marysaka[d]: it might be cool to enable PLC at some point too but uum let's stay simple for now :maxpoeSweat:
08:15mohamexiety[d]: Yeah PLC is next on the list. I’ll type up cleaner patches for this first tho
08:15marysaka[d]: (no need for post L2 compression for now and it seems to depends on the address range you are)
16:27karolherbst[d]: I noticed that there is probably a gap in the coop matrix testing...
16:27karolherbst[d]: apparently the stride on load/stores is optional and vtn inserts a 0 if that operand doesn't exist...
16:31karolherbst[d]: though I think coming from GLSL it has to be set to something non 0
19:41mohamexiety[d]: skeggsb9778[d]: say if I force compression on for everything and we have small pages and big pages, the small pages would mmu fault, right?
19:42mohamexiety[d]: could the kernel be made to ignore the compressible PTE kind and treat it as uncompressed if the object is VRAM | GART or is small pages?
19:55skeggsb9778[d]: the kernel does something like that already in the non-VM_BIND path, but on the per-bo level
19:55skeggsb9778[d]: https://gitlab.freedesktop.org/bskeggs/nouveau/-/blob/07.00-page/drivers/gpu/drm/nouveau/nouveau_bo.c?ref_type=heads#L285
19:57skeggsb9778[d]: and a bit lower, see under the comment "Disable compression if ..."
19:58mohamexiety[d]: I see, thanks! so could adopt a similar strategy for the VM_BIND path then
19:59mohamexiety[d]: that could make the Vk side a bit simpler in some regards
22:43skeggsb9778[d]: There's pretty much finished versions of the compression patches in the same kernel branch now btw. They shouldn't change anything, but are cleaner, if you want to pick them up instead of the older versions
22:44skeggsb9778[d]: Still need to look into making RM preserve the backing store across suspend/resume, but I think that's the only thing
22:45mohamexiety[d]: nicee! thanks
22:46mohamexiety[d]: yeah I am just working on cleaning up the vk side since it's a bit messier than I thought it would be but I got a good plan now
22:46skeggsb9778[d]: awesome 🙂 looking forward to testing it!
22:48karolherbst[d]: okay.. got ldsm working on 8 bit integer matrices 🙂
23:45karolherbst[d]: mhh works great for A row major, but A col major throws misaligned address errors...
23:47karolherbst[d]: mhh the input stride is already not properly aligned.. what a bother
23:49karolherbst[d]: ohh right.. that's uhm.. 32 bytes, not 64... per group