00:43 redsheep[d]: mohamexiety[d]: I wonder if this is what they meant with the whole neural shaders but about tensor cores being used in rendering. That sounds like a change to make doing that more efficient, no?
01:01 mohamexiety[d]: Yeah
01:02 mohamexiety[d]: Neural rendering/shaders is afaict marketing speak for “concurrent graphics and tensor”, and this definitely helps with that
10:15 Pheoxy[AWSTUTC8][m]: Proton Log:... (full message at <https://matrix.org/oftc/media/v1/media/download/ATw3EJ5SWagI-EifFtCAyvV4BP1ZBh7y8ZIWTq1JLk-zYresooWDP3Tad4vaLnRBsEtci9ZVDfP6_O5EGw2UnDJCeVItr-2gAG1hdHJpeC5vcmcvaVVDeXh2SU5YZ1VWWEdhYVdTblhYeGNz>)
10:16 Pheoxy[AWSTUTC8][m]: Bloody uplay...
10:18 Pheoxy[AWSTUTC8][m]: i have also noticed using NVK with steam usually means disabling all preshading with my hybrid setup as it always compiles the wrong vulkan back end defaulting to the intel anv?
10:20 Pheoxy[AWSTUTC8][m]: Also right clicking steam games and going to properties isn't working half the time? driving me nuts changing launch options anyone else get that?
10:23 phomes_[d]: gfxstrand[d]: I tested it with a few games. No performance improvements but also no regression
10:24 phomes_[d]: I did get the ' error fencing pushbuf: -19' crash again though
10:25 phomes_[d]: I will do more testing soon. I got sick after fosdem so I am not doing very much atm
10:27 Pheoxy[AWSTUTC8][m]: Off the splitlocks... (full message at <https://matrix.org/oftc/media/v1/media/download/AVlvnUOs9sHWMekQxLF2ROrOMRwNEexp0OXaJAstC07wcQ0hHviKoveo45PWVDOquRxHeGVp9AMsweJjFhsvdghCeVIuXjLQAG1hdHJpeC5vcmcvWXBQdEtMdFhCWWVXRkNYTHdSVUxTbGdG>)
10:27 Pheoxy[AWSTUTC8][m]: * The splitlocks though...... (full message at <https://matrix.org/oftc/media/v1/media/download/AYUOAuj7oGSqxABxj5LgTfkDfVohHyktgd9OYOd2YAjoCPS_cxqeovqU5D4V0rEg0hYVyVbbbnI1dCpRb3NznZ1CeVIuZIrgAG1hdHJpeC5vcmcvY0lnWnRGVGxRbmdIR3NwckVWUlNkd0pT>)
13:25 gfxstrand[d]: mhenning[d]: RE: The current bits are wrong. I think Unchanged should be 4, not 3.
13:25 gfxstrand[d]: It's not that the bits are in the wrong spot.
13:26 gfxstrand[d]: But also it's not clear what the difference between EU and LU is
13:28 karolherbst[d]: .LU basically means "this data won't be used ever again"
13:28 gfxstrand[d]: What does LU stand for?
13:28 karolherbst[d]: last use
13:28 gfxstrand[d]: Ah
13:29 gfxstrand[d]: Yeah so we want EU
13:29 gfxstrand[d]: Because LU would mean it basically gets evicted now
13:29 karolherbst[d]: LU kinda allows invalidation in certain cases, not really sure what exactly
13:30 karolherbst[d]: I think there is more to it
13:30 karolherbst[d]: but yeah
13:31 karolherbst[d]: I think .EU simply means the policy won't be changed, and by default (hopefully) .EN (default) is used
13:32 karolherbst[d]: or well.. whatever was set the last time
13:34 gfxstrand[d]: Yeah, EU is "just leave it where it was in the eviction queue"
13:35 karolherbst[d]: there is a caveat though, apparently those aren't strong hints and the hardware might do whatever anyway
13:42 gfxstrand[d]: sure
14:08 gfxstrand[d]: I think this is all cleaned up by https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33427
14:22 gfxstrand[d]: mhenning[d]: Hey, look! I reviewed code!
16:46 gfxstrand[d]: I hate kernel and GSP oopses
16:52 gfxstrand[d]: IDK by how much but Setting `.ef` on `tld` has to have been hurting us.
16:52 gfxstrand[d]: That or the HW just ignores them.
17:08 karolherbst[d]: welcome to the club of doing random things and nothing changes in performance
17:09 gfxstrand[d]: I hate that club. I've been a member for a long time and they won't let me leave.
17:09 gfxstrand[d]: It's like the Hotel California over here.
17:12 mhenning[d]: gfxstrand[d]: These settings are well documented in ptx. See .lu in https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cache-operators and the whole section https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cache-eviction-priority-hints
17:13 gfxstrand[d]: Yeah but the that's missing one. :frog_upside_down:
17:14 gfxstrand[d]: Oh, they put LU in the cache operators even though it's not really. That was nice of them.
17:14 mhenning[d]: Yeah
17:15 karolherbst[d]: ahh
17:15 karolherbst[d]: "when restoring spilled registers and popping function stack frames to avoid needless write-backs of lines that will not be used again."
17:15 karolherbst[d]: yeah
17:15 mhenning[d]: I wonder what happens if you try to mix .lu with an eviction priority
17:15 karolherbst[d]: that's the thing I meant with "there is more to lu"
17:15 karolherbst[d]: that means it does change behavior
17:15 karolherbst[d]: proably
17:16 mhenning[d]: mhenning[d]: `ptxas memaccess.ptx, line 30; error : Modifier '.evict_unchanged' cannot be combined with modifier '.lu'`
17:16 mhenning[d]: But yeah, I guess .lu is different in that it's the only one that can break your program if you use it incorrectly
17:17 mhenning[d]: The others are just performance hints
17:21 gfxstrand[d]: But why aren't any of them performance gains?!? 😭
17:22 karolherbst[d]: the hardware just outsmarts you
17:22 gfxstrand[d]: Where are Ben's kernel branches now? skeggsb on GitLab doesn't exist anymore.
17:23 mohamexiety[d]: bskeggs
17:23 mohamexiety[d]: https://gitlab.freedesktop.org/bskeggs/nouveau/-/tree/01.03-r565?ref_type=heads
17:23 mohamexiety[d]: https://gitlab.freedesktop.org/bskeggs/linux-firmware/
17:23 tiredchiku[d]: nvidia name format
17:24 mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1337111691585916968/image.png?ex=67a641cd&is=67a4f04d&hm=70a2b8b53dbe997da5f626f83058162f279210e66f7e05c2a281b44dc3799644&
17:24 mohamexiety[d]: this is exciting :thonk:
17:26 mhenning[d]: gfxstrand[d]: I feel this way too. I have a feeling we're going to find out that we're bottlenecked on something silly, but I'm running out of ideas for what that is
17:27 karolherbst[d]: maybe you flush caches to aggressively
17:34 gfxstrand[d]: mohamexiety[d]: Let's try it, shall we?
17:35 skeggsb9778[d]: If you pulled it already, I'd suggest doing that again
17:35 gfxstrand[d]: I pulled a couple minutes ago
17:35 skeggsb9778[d]: I just (as in, seconds ago, "git push" hasn't even finished yet) pushed fixes from rebasing it on drm-misc-next
17:35 gfxstrand[d]: Says nothing to fetch
17:36 skeggsb9778[d]: gitlab is being slow 😛
17:36 skeggsb9778[d]: i'm still waiting here too...
17:36 gfxstrand[d]: lol
17:36 skeggsb9778[d]: ok, done apparently
17:36 tiredchiku[d]: :BlobhajShock:
17:36 karolherbst[d]: nice nice
17:36 gfxstrand[d]: Okay, pulling
17:36 tiredchiku[d]: I'm guessing there's also a linux-firmware commit somewhere
17:37 mohamexiety[d]: niiiice it's ready!
17:37 skeggsb9778[d]: yeah, there's a linux-firmware tree on the same gitlab
17:37 karolherbst[d]: hype
17:37 mohamexiety[d]: dont have the hardware yet but hopefully soon <a:vibrate:1066802555981672650>
17:37 karolherbst[d]: linux distros: wait what? another 100MB?!? 😭
17:37 gfxstrand[d]: I'm just hoping to get fault addresses out of it. :frog_upside_down:
17:37 skeggsb9778[d]: mohamexiety[d]: i haven't pushed the gb20x bits yet anyway
17:38 mohamexiety[d]: karolherbst[d]: realistically though can it be overwritten? as in with updates, distros just replace old gsp with new one, since new major versions come with new kernels anyway
17:38 karolherbst[d]: no
17:38 karolherbst[d]: well...
17:38 mhenning[d]: gfxstrand[d]: The r565 branch already had mmu fault addresses
17:39 karolherbst[d]: I guess distros only have to ship one version if they guarantee that all the modules they install can use the new one
17:39 karolherbst[d]: but... normally distros just yolo it
17:39 karolherbst[d]: well... at least until today
17:39 karolherbst[d]: it's not like we've mentioned this is gonna be a problem 2-3 years ago
17:39 gfxstrand[d]: mhenning[d]: Yeah, but why use 565 when you can use 570?
17:40 gfxstrand[d]: Yeah, we've been telling people about this problem for a long time.
17:40 karolherbst[d]: distros would get very angry if every version gets pushed 😄
17:40 gfxstrand[d]: Ironically, I think Fedora is the worst here. Most others have gotten off initram. :frog_upside_down:
17:40 karolherbst[d]: heh
17:40 karolherbst[d]: how are you doing full disc encryption stuff then?
17:40 karolherbst[d]: mhh btrfs I guess...
17:40 karolherbst[d]: though having displays show up would be grand
17:40 karolherbst[d]: 😄
17:41 gfxstrand[d]: If only there were someone inside Red Hat who could have raised the alarm a few years ago so they had time to prepare for this.
17:41 karolherbst[d]: well.. I did
17:41 gfxstrand[d]: I know
17:41 karolherbst[d]: I think there was a concept of a plan as a result
17:41 gfxstrand[d]: And I told the folks at Intel that 2MB pages wouldn't work for sparse. :frog_upside_down:
17:41 gfxstrand[d]: karolherbst[d]: Ah, yes! Decisive action was taken!
17:42 karolherbst[d]: I think one of the ideas was to pull firmwares out of the initrams and sign them
17:42 karolherbst[d]: or at least out of the per kernel initramfs
17:42 karolherbst[d]: so you only install it once, not 3 times
17:42 karolherbst[d]: but it doesn't really matter, because with btrfs subvolumes this wouldn't be an issue
17:42 karolherbst[d]: and there is this "linux as a bootloader" project as well
17:43 karolherbst[d]: a lot of long-term projects, guess we'll just have to wait
17:43 gfxstrand[d]: Or they could just make /boot 2GB, But they didn't.
17:43 karolherbst[d]: well.. have 6 gsps or something, 3 kernels, that's already like a lot, but yeha
17:43 karolherbst[d]: thing is...
17:44 karolherbst[d]: you also have old installs 😄
17:44 gfxstrand[d]: Oh, I know
17:44 gfxstrand[d]: All my machines now have at least 5 GiB of /boot
17:44 karolherbst[d]: I think a btrfs subvolume grub could just mount is the best path here
17:44 karolherbst[d]: then it's a non issue
17:44 karolherbst[d]: like entirely
17:44 karolherbst[d]: heck.. just copy the firmware files to /boot and let btrfs dedup them...
17:45 karolherbst[d]: or hardlink or whatever
17:51 gfxstrand[d]: gfxstrand[d]: Built! GPU go brrr?
17:55 karolherbst[d]: 50% faster
17:57 tiredchiku[d]: I wonder what other improvements simply a GSP upgrade would bring along
18:01 notthatclippy[d]: I don't know about "simply", but 535 version was a beta for desktop uses, and it's pretty fascinating that suspend/resume works at all on nouveau with it. I expect all that stuff and a bunch of power management bits to be a lot more stable. But might need kernel changes to make use of it.
18:02 tiredchiku[d]: well, yeah, nothing is truly simple when it comes to kernel stuff 😅
18:11 gfxstrand[d]: Okay, more correct hacky firmware install. :frog_upside_down: e
18:16 gfxstrand[d]: I think I'm just going to "make install" and hope nothing blows up
18:24 gfxstrand[d]: Yeah, "make install" is the new plan
18:27 gfxstrand[d]: The kernel REALLY needs to print what GSP version was loaded to dmesg.
18:28 gfxstrand[d]: I know they should be the same. But they aren't and it's impossible to debug "Oops! Loaded the wrong version."
18:33 skeggsb9778[d]: gfxstrand[d]: Earlier versions of the branch *did*
18:34 gfxstrand[d]: 😢
18:34 skeggsb9778[d]: But then I keep seeing Greg KH say things like "When drivers work properly, they should be quiet" (example from recently, on the nova-core series)
18:34 skeggsb9778[d]: I don't necessarily agree, but I'd rather not have that fight
18:35 skeggsb9778[d]: You *can* boot with nouveau.debug=gsp=debug to see what FW loads, but then you get the rest of the verbosity that comes with that too
18:35 gfxstrand[d]: We already print 4-5 lines of "We found a GPU!"
18:35 skeggsb9778[d]: Yeah I know
18:35 tiredchiku[d]: nouveau.debug=version when :P
18:35 gfxstrand[d]: Maybe we can just add (gsp=XXXX) to the line where we print out what GPU we found?
18:36 skeggsb9778[d]: *possibly*
18:36 skeggsb9778[d]: airlied[d]: ?
18:36 gfxstrand[d]: I don't really want to get into a fight with gregkh but I'd almost be willing to for this. 🙃
18:37 gfxstrand[d]: Like, the whole point of dmesg is to log important system information that might be needed to figure out a bug, right? This 1000% qualifies.
18:37 gfxstrand[d]: Hell, we print the bios version of the GPU. How is that useful?!?
18:38 gfxstrand[d]: In any case, if you can tell me where to put the printk, that'd get me going for now.
18:40 skeggsb9778[d]: https://gitlab.freedesktop.org/bskeggs/nouveau/-/commit/979fc2852ea0536fdc1cf018eb1c1333e486ae7e
18:40 skeggsb9778[d]: That's the patch from the older version. Not sure it'll directly apply, but, the same fn is in the same file on the current tree
18:41 orowith2os[d]: skeggsb9778[d]: If they're quiet, they could also not be working at all :)
18:42 orowith2os[d]: It's reassuring to see "everything working properly" vs nothing or "something's broken"
18:43 mhenning[d]: maybe you can check which version is loaded by doing an mmu fault and seeing if you get a crash address. easy, right?
18:44 tiredchiku[d]: skeggsb9778[d]: ~~by that logic the dmesg should be mostly empty~~
18:44 gfxstrand[d]: mhenning[d]: I'm not getting a crash address
18:45 gfxstrand[d]: Also, I don't really want to build a "Let's torture the GPU in very particular ways that break the GSP to see what version it is" tool.
18:45 gfxstrand[d]: As much fun as that might be. 😅
18:46 mhenning[d]: A more serious suggestion, if the dmesg print isn't popular, is that we could probably expose a file in /sys/kernel/debug/dri/ or whatever to query gsp version
18:46 gfxstrand[d]: Okay, I am indeed getting 570 but I'm also getting ubsan errors. 😬
18:47 gfxstrand[d]: Oh, wait. I am. 😄
18:54 gfxstrand[d]: Okay, I have fault addresses. They just make no sense
18:54 skeggsb9778[d]: How so?
18:56 gfxstrand[d]: Oh, I'm turning on image compression and seeing what happens. I'm sure it's me that's confused, not the kernel. :frog_upside_down:
18:57 mohamexiety[d]: image compression needs large page sizes, iirc the GPU kinda just freaks out if you go in with small page sizes
18:57 mohamexiety[d]: (which is what nouveau currently does)
18:58 gfxstrand[d]: Yeah, I suspect that's what this fault really is
18:58 gfxstrand[d]: Because there is definitely memory there.
18:58 skeggsb9778[d]: what's the fault say?
18:59 gfxstrand[d]: `[ 676.188169] nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:120 gfid:0 level:2 type:31 scope:1 part:233 fault_addr:0000003ffff7c000 fault_type:00000002`
18:59 mohamexiety[d]: the other missing part of the puzzle is compression tag calculation, but that's easy compared to large pages.
19:01 skeggsb9778[d]: yeah, that's saying the PTE is missing (i'd have not necessarily expected that, but perhaps HW only checks the large page table when compression is used)
19:02 skeggsb9778[d]: fault_type maps to NV_PFAULT_FAULT_TYPE (dev_fault.ref.txt in open-gpu-doc) btw
19:06 skeggsb9778[d]: mohamexiety[d]: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/nvidia/src/kernel/gpu/mem_mgr/arch/turing/mem_mgr_tu102.c#L504
19:06 skeggsb9778[d]: That's easy on Turing
19:06 mohamexiety[d]: yup
19:07 skeggsb9778[d]: you could probably implement that for GL easily enough actually - that code handles large pages fine
19:07 mohamexiety[d]: I may give the large page stuff another attempt soon but for now fixing a little something up in userspace/nvk quickly
19:08 mhenning[d]: gfxstrand[d]: not sure if you want to peek at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33420 before I merge
19:10 airlied[d]: mhenning[d]: Don't make me say it 🙂
19:11 mohamexiety[d]: is zcull really _that_ impactful though? :thonk:
19:11 mhenning[d]: airlied[d]: I started work on zcull, will probably do more this week
19:16 airlied[d]: I'm my experience on AMD and Intel getting proper HIZ support was 20-30% now maybe NVIDIA isn't as useful
19:16 skeggsb9778[d]: Yeah, I expect it to be a good boost
19:16 airlied[d]: But the two biggest uplifts I ever did were HIZ and a wired texture cache alignment thing on amd
19:17 airlied[d]: Weird texture cache
19:17 skeggsb9778[d]: Though I even more expect large pages + compression (which also enables faster clears etc) to have a bigger effect
19:17 mohamexiety[d]: yeah those are solid gains. it's just given we seem to be hard stuck on 7 FPS (at least in Veilguard) it feels like it's something even more insidious than zcull
19:17 airlied[d]: Fast clears I think felt like 5-10% zone back then
19:19 mohamexiety[d]: I wonder if nowadays it's an even bigger gain since with Ada and onwards, VRAM memory bandwidth actually took a big hit (outside of the flagship SKUs). depends I guess if the bigger caches are enough
19:19 mohamexiety[d]: (for compression/fast clears/etc)
19:23 airlied[d]: It's nearly always memory and it's never ALU 🙂
19:42 airlied[d]: Figuring out https://lists.freedesktop.org/archives/mesa-dev/2017-July/162212.html on radv was probably the biggest uplift I personally found
21:02 gfxstrand[d]: skeggsb9778[d]: Same. Especially with MSAA. Intel and AMD have MSAA-specific compression schemes. Nvidia's plan is "Sure, it sucks bandwidth but we've got awesome compression so it's fine."
21:03 gfxstrand[d]: But yes, zcull will be huge. Depth testing sucks when you don't have it.
21:13 airlied[d]: skeggsb9778[d]: I think adding the gsp to the same line we print the GPU family might be a good compromise, we don't add lines then 🙂
21:14 karolherbst[d]: I think if you remove two other lines, you might get away with getting one (just nuke one of the PCI ones)
21:17 airlied[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1337170398332190801/PXL_20250206_211535965.MP.jpg?ex=67a6787a&is=67a526fa&hm=974f794d8ffd26c5631a338f4cec36084ca87e121a7c8cb82a14f2b540fc9774&
21:17 airlied[d]: Status update on hacks
21:17 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1337170443475619901/9jc1pn.png?ex=67a67885&is=67a52705&hm=6a3e4573eb891e70530f0a479c07ec3d45d197e44a43f34e7ec0e486160b2b17&
21:17 gfxstrand[d]: karolherbst[d]:
21:18 gfxstrand[d]: airlied[d]: Sweet!
21:19 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1337170767477084221/9jc1vr.png?ex=67a678d2&is=67a52752&hm=d468143fc9ab2a8cd7428ad5e23d138bfba56e9d4d8a8d243783027d35164656&
21:19 gfxstrand[d]: There Fixed it.
21:19 djdeath3483[d]: airlied[d]: what kind of hack is that?
21:26 airlied[d]: proof of concept kernel stuff
21:26 HdkR: Is there anything special that the kernel driver needs to do for NVLINK support? Or does it basically behave like coherent PCIe as far as the kernel is concerned?
21:26 airlied[d]: the concept being I can run gears and vkcube 🙂
21:26 HdkR: NVLink between CPU and GPU I mean
21:27 airlied[d]: with that caveat I don't think there is a lot
21:42 mhenning[d]: When an MME macro does `$r1 = ADD $load0 $zero` that's loading data from the `NVC697_CALL_MME_DATA` macros, right?
21:42 mhenning[d]: How does `$load0` differ from `$load1`?
21:43 karolherbst[d]: mme instructions have a funky dual slot system
21:43 karolherbst[d]: so you can operate on different slots
21:43 karolherbst[d]: it's all in pair of two and instructions can reference either
21:45 karolherbst[d]: load0/load1 just load into the respective slot
21:47 karolherbst[d]: mme is cursed VLIW
21:52 karolherbst[d]: there is the mme simulator in mme_tu104_sim.c
21:52 djdeath3483[d]: airlied[d]: Close from being done then 😉
21:52 mhenning[d]: yeah, I'm looking at the simulator
21:54 karolherbst[d]: it's just very weird and cursed, lol
21:56 airlied[d]: I should probably add some 🦀 🦀
22:37 gfxstrand[d]: mhenning[d]: Literally just that load0 is the first one
22:38 gfxstrand[d]: And I don't think you're allowed to use load1 without using load0
22:40 karolherbst[d]: sure about that one? mhhh though might be true
22:43 gfxstrand[d]: I seem to recall that one errors
22:43 gfxstrand[d]: But it's been a minute since I tried it.