IRC Logs of #nouveau on irc.freenode.net for 2025-07-24

09:13 jja2000[d]: I may unfortunately be the owner of a AGX Xavier in less than 12 hours
09:40 digoutpr[m]: Get in touch with this platform for greatness you’ll definitely thank me later... (full message at <https://matrix.org/oftc/media/v1/media/download/AWBPN_QAi8EHgtoEJ4YIp5FBTYyCr22842sN-5Y-3002SNr0Y5BJS59257LFUYCnCFwOXI83ndicRiWUx2MmRj5CeYg-dG-AAG1hdHJpeC5vcmcvaXlrR1ZKbW1IT3h3TnJMb0ZDWWVFb2Ja>)
12:26 digoutpr[m]: Get in touch with this platform for greatness you’ll definitely thank me later... (full message at <https://matrix.org/oftc/media/v1/media/download/AYX6umCaPJYC52GP0U8MW6UNfTknbEDEWj3zEsIB-brHgAxzxxZbxRxaotP7QvfZ10WoP74Q79iBSBx_DBN3p4pCeYhH-QwwAG1hdHJpeC5vcmcvcFptV0FrREFWTVRVdkpObHdpb3FqTmpF>)
13:53 gfxstrand[d]: I've got one on my desk somewhere
14:37 jja2000[d]: I bid on one on eBay before I noticed it was the 16GB RAM model
14:37 jja2000[d]: At least it's not __too__ expensive
15:30 HdkR: Be prepared for Carmel :D
15:34 jja2000[d]: What's that supposed to mean? :p
15:36 jja2000[d]: No I know, they get kinda hot right? I'm more concerned about Nouveau prolly not working :p
16:09 HdkR: Quite slow CPU.
16:47 HdkR: At least the speed of the CPU shouldn't affect testing the interconnect that much :P
17:28 jja2000[d]: HdkR: It's still based on the weird translation cores right? I thought I heard they weren't that bad
17:29 jja2000[d]: I guess that's why they switched to the A78 cores on Orin
17:29 HdkR: Yea, Carmel. They were interesting, technically better than Cortex-A57 usually.
17:29 HdkR: usually being key.
17:32 jja2000[d]: It never got better than Denver and Denver2? shame
17:33 jja2000[d]: Those were bundled with the A57 before, Carmel was the upgrade
17:35 HdkR: I personally would love to see a deep dive from Chips & Cheese on Carmel versus Denver2 versus A57 :D
17:51 jja2000[d]: I will gladly donate my sticky delaminated nexus 9 for Denver 1
17:51 jja2000[d]: What a failure of a tablet holy crap
17:52 HdkR: Mine had a spicy pillow so I had to take it behind the shed.
17:56 HdkR: My SO was using it and was sad to see it go, had to get a replacement tablet to fill the void.
18:09 cubanismo[d]: They're certainly interesting CPUs.
18:13 mohamexiety[d]: airlied[d]: skeggsb9778 alright so after some poking around kinda hit a deadend. with help from marysaka[d] the big page stuff is now working...ish. while everything mostly works and apps work etc, I am getting random mmu faults -- completely random and very rare -- and it's a bit painful to debug. the closest thing to a repro I have is running the entire binding_model Vk CTS tests which is ~ half
18:13 mohamexiety[d]: an hour. the very last test consistently crashes with this:
18:13 mohamexiety[d]: [ 2449.250588] nouveau 0000:07:00.0: gsp: mmu fault queued
18:13 mohamexiety[d]: [ 2449.420673] nouveau 0000:07:00.0: gsp: rc engn:00000001 chid:12 gfid:0 level:2 type:31 scope:1 part:233 fault_addr:0000003ffcf50000 fault_type:00000002
18:13 mohamexiety[d]: [ 2449.420690] nouveau 0000:07:00.0: fifo:c00000:000c:000c:[deqp-vk[14736]] errored - disabling channel
18:13 mohamexiety[d]: [ 2449.420701] nouveau 0000:07:00.0: deqp-vk[14736]: channel 12 killed!
18:13 mohamexiety[d]: [ 2449.478513] deqp-vk[14736]: segfault at 7fc80db47e80 ip 00007fc80db47e80 sp 00007ffca9d39588 error 14 likely on CPU 9 (core 16, socket 0)
18:13 mohamexiety[d]: [ 2449.478521] Code: Unable to access opcode bytes at 0x7fc80db47e56.
18:13 mohamexiety[d]: the fault_type maps to missing PTE but I am not sure what this implies. the kernel patches are here if maybe there's something we missed: https://gitlab.freedesktop.org/marysaka/linux/-/commit/93b1de75228f29937a6cf89b486965419af985dc and https://gitlab.freedesktop.org/marysaka/linux/-/commit/0d44d0a5d1022ebbb12b6b1aa268f0509dafa12e and for nvk:
18:13 mohamexiety[d]: https://gitlab.freedesktop.org/mohamexiety/mesa/-/commit/a2cfc2ecefb6c291f3e40dafaa2e6fe8b8b0467b and https://gitlab.freedesktop.org/mohamexiety/mesa/-/commit/13ef1e4376d5aa8a1eba7eec17d1f03bdfd16bbe (note that I also had an mmu fault with stock nvk/4k pages nvk)
18:16 mohamexiety[d]: there's also something kinda weird. we suspected invalidation errors at first and when we checked the code for that in nouveau, we found this: https://elixir.bootlin.com/linux/v6.16-rc1/source/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmtu102.c#L39 however the corresponding code in openrm is this: https://github.com/NVIDIA/open-gpu-kernel-
18:16 mohamexiety[d]: modules/blob/d89031330084ad34dbfe15997454a4bb1674e0b1/src/nvidia/src/kernel/gpu/mmu/arch/turing/kern_gmmu_tu102.c#L140. the confusing part is that `NV_VIRTUAL_FUNCTION_PRIV_MMU_INVALIDATE_PDB_ADDR_ALIGNMENT` maps to `0xc` rather than the `0x8` that nouveau does. when I did try changing it to match openrm everything was MMU faulting (with fault_type 2) so there's something I misunderstood here
18:47 skeggsb9778[d]: ```#define NV_VIRTUAL_FUNCTION_PRIV_MMU_INVALIDATE_PDB_ADDR 31:4```
18:47 skeggsb9778[d]: the addr field is defined like this, so it becomes ((addr >> 12) << 4)
18:48 skeggsb9778[d]: nouveau just assumes it's aligned and does >> 8
18:49 skeggsb9778[d]: i'm not sure about the other issues, but i'll have a play around with your patches today
18:51 mohamexiety[d]: skeggsb9778[d]: ohh ok that makes sense. sorry lol
18:54 skeggsb9778[d]: also - i'll point you at a patch later this morning that should, hopefully, fix filling in PTE comptags for >=tu102
18:54 skeggsb9778[d]: in case you want to play around with compression stuff too
18:55 mohamexiety[d]: yeah I was thinking about that but wasnt sure if it'd be a good idea to add another variable just yet. marysaka[d] tested these patches in a game already and got a nice perf uplift (not too big but measurable) so that made me curious about adding in compression
18:55 mohamexiety[d]: thanks!
18:56 skeggsb9778[d]: yeah, i have no clue what that'll need on the vulkan side. but the kernel side is pretty simple on turing, fortunately
18:56 mohamexiety[d]: in theory it should just be using the compressible pte_kinds. in practice we'll see :KEKW:
19:00 marysaka[d]: and ignore PLC at first as it seems weird :maxpoeSweat:
21:01 jja2000[d]: nvm got outbid at the last second by $2.50 lmao
21:03 HdkR: Someone hungry for that AI.
21:04 jja2000[d]: Probably, oh well
21:07 HdkR: Doesn't even have the new "AI" CPU extensions from the new NVIDIA platforms that are shipping soon :D
21:07 mohamexiety[d]: it's going to be EOL with the next CUDA release
21:07 mohamexiety[d]: kinda oof purchase if it's for AI haha
21:14 airlied[d]: https://airlied.blogspot.com/2025/07/ramalamamesa-benchmarks-on-my-hardware.html might be interesting to in here
22:30 skeggsb9778[d]: mohamexiety[d]: marysaka[d] https://gitlab.freedesktop.org/bskeggs/nouveau/-/commits/07.00-page?ref_type=heads
22:31 skeggsb9778[d]: some wip stuff there, but likely need a bit more (iirc: at least a regkey to make RM save the backing store on suspend/resume, and also taking that into account for the suspend/resume buffer size calcs)
22:32 mohamexiety[d]: Oooh awesome, thanks a lot! Will look at it and try it tomorrow
22:35 skeggsb9778[d]: i did hack up nvk to force compression for z/s, and nothing blew up, but i'm also not sure it's doing anything 😛 printks show the PTEs look filled in correctly at least though
22:36 mohamexiety[d]: Interesting, hopefully it is doing stuff at least :KEKW:
22:56 skeggsb9778[d]: yeah, appears it actually is. i'm not testing anything stressful (vkcube/xonotic) atm, but, bumping xonotic to 4k shows a noticeable (~570 -> ~630 fps) improvement with just zeta compression
22:57 skeggsb9778[d]: on tu106
23:11 esdrastarsis[d]: skeggsb9778[d]: RTX 2060?
23:12 skeggsb9778[d]: 2070
23:16 skeggsb9778[d]: forcing alignment etc to use 2MiB pages in a few more places gets to 765 😄
23:52 mangodev[d]: skeggsb9778[d]: what kind of compression is this? i'm curious
23:52 mangodev[d]: wait
23:53 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1398090737354150028/image.png?ex=688418f7&is=6882c777&hm=a323aad8baf3c1c318e3009e2e86f744468b279d8d99639d8f9be6d5556f1f0c&
23:53 mangodev[d]: i don't think this is what i was looking for
23:55 skeggsb9778[d]: ah, it's what nvidia calls the depth/stencil buffer
23:56 mangodev[d]: ahhhh okay that makes more sense
23:57 mangodev[d]: how would you even compress the depth buffer effectively? wouldn't it be quite varied for a larger scene?
23:57 mangodev[d]: and even a flat ground plane would be a gradient
23:57 skeggsb9778[d]: i've nfi how the hw actually does it 😛