00:29 airlied[d]: okay anteriorFbSize was a diversion, the calcs end up the same
00:39 _lyude[d]: airlied[d]: it actually happens a bit differently on the newer firmware, one sec
00:40 airlied[d]: if the new fw dies on suspend, then the patch I wrote fixes that and it makes it like the old fw
00:40 _lyude[d]: or wait- hold on I forgot to reove the registry key change notthatclippy[d] suggested before testing
00:43 airlied[d]: that key really only seems to affect blackwell+
00:53 _lyude[d]: https://paste.centos.org/view/df840dc8
00:53 _lyude[d]: airlied[d]: i think it actually does change how channel allocation for gv100 works, there is code for handling that registry key that also gets added. But it's only enabled by default on blackwell so I'm not sure how useful it will be either
00:54 airlied[d]: it changes fault buffers in gv100 code, but we don't use those in nouveau at this level
00:54 _lyude[d]: btw ^ that log is just your patches + the additional suspend/resume patch
00:54 _lyude[d]: airlied[d]: ah, gotcha
00:58 airlied[d]: I do wonder if it's the reserved memory at start of framebuffer screwing it up somehow
00:59 airlied[d]: I don't see that on my blackwell
00:59 airlied[d]: will dig into that stuff a bit more later today/tomorrow
01:00 airlied[d]: it's possible nouveau trashes that memory, then it needs it for resume
01:12 _lyude[d]: airlied[d]: yeah - there was quite a lot of stuff I explored around the MMU lock register https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/nvidia/src/kernel/gpu/gsp/arch/turing/kernel_gsp_tu102.c#L725 . it doesn't seem like it gets used when booting my machine according to the tracing I did with openrm - but there's a couple of other (some seem like dead code) spots I've noticed
01:12 _lyude[d]: that register getting used https://github.com/NVIDIA/open-gpu-kernel-modules/blob/2ccbad25e1af6a6ee6f38cf569f89f8b65d658ab/src/nvidia/src/kernel/gpu/mem_mgr/arch/ampere/mem_mgr_ga100.c#L217
05:17 airlied[d]: just looked at the fbsr comp tag ga100 stuff, but didn't help either
18:32 marysaka[d]: phomes_[d]: mhenning[d] good news, I found the source of the MMU fault, will push some branch in a bit... could any of you test on blackwell?
18:33 mhenning[d]: marysaka[d]: Sure, I can test on blackwell
18:33 marysaka[d]: the problem is related to VM_BIND implementation for reference
18:34 marysaka[d]: the cleanup callback unref on unmap but by that point the fence is already signaled and the userspace will return too early
18:39 mhenning[d]: Oh, interesting. I guess that might also affect non-compression workloads then?
18:40 marysaka[d]: it could yes
18:49 marysaka[d]: mhenning[d]: phomes_[d] mohamexiety[d] https://gitlab.freedesktop.org/marysaka/linux/-/commits/nouveau-mmu-fixes-1
18:49 marysaka[d]: didn't CTS yet but the top two patches
18:50 marysaka[d]: if those patches are fine, I will clean them up and post them on the mailing list tomorrow
18:51 phomes_[d]: I can test on ada and also blackwell
18:54 marysaka[d]: I also didn't retest the CTS cases only my reproducer 😅
18:54 marysaka[d]: but I will do now just in case it's another bug
18:55 marysaka[d]: yeah no it works fine now on the test cases too
19:31 phomes_[d]: I built rc6 with your patches on top and testing on ada now. The reproducer you posted yesterday passed 10 times in a row. Same with the cts tests that were problematic before
19:32 phomes_[d]: I will continue and do various game tests etc
19:33 marysaka[d]: I see thanks!
19:48 mohamexiety[d]: i wonder if this should be backported given it's an original vm_bind bug. but it also looks like it doesn't show without compression
20:09 marysaka[d]: phomes_[d]: I pushed a new version of the top patch as it was timing out on CTS, now it's running fine so far
20:10 marysaka[d]: (destroying object too early wasn't useful anyway, only MMU ops needed to happen in the job run callback
20:10 phomes_[d]: I will test that. I hope that it might fix the issues I am seeing in a bunch of games
20:10 marysaka[d]: :SoniiPray:
20:12 karolherbst[d]: mohamexiety[d]: I wouldn't take any chances with race condition fixes, because a couple of other random bugs could be related
20:15 airlied[d]: marysaka[d]: the first patch shouldn't be needed, is that race actually happening?
20:15 marysaka[d]: yeah first patch isn't needed
20:16 marysaka[d]: it wasn't affecting that race in any way
20:16 airlied_: dakr: opinion on https://gitlab.freedesktop.org/marysaka/linux/-/commit/1b985166eff686819226e3515534b51b66283734
20:16 airlied: dakr: opinion on https://gitlab.freedesktop.org/marysaka/linux/-/commit/1b985166eff686819226e3515534b51b66283734
20:48 marysaka[d]: marysaka[d]: full CTS run ended without any regressions on my 4060
21:10 phomes_[d]: the new version fixed all the issues I noticed (several games would not even launch)
21:25 airlied[d]: marysaka[d]: so that patch violates the fence signalling rules, it more likely we need to fix the lower level fun
21:27 marysaka[d]: airlied[d]: so you want it wrapped in dma_fence_begin_signalling no?
21:27 marysaka[d]: (like what panthor MMU impl does)
21:29 airlied[d]: no you just can't move those things out of free because they hit mutexes that memory is allocated under
21:37 marysaka[d]: hmm not sure what mutexes you are refering unless it's nouveau_uvma one? but in that case this could probably be moved back to the cleanup callback
21:38 marysaka[d]: the thing that need to happen during the job run are both `nouveau_uvma_vmm_put` (OP_UNMAP) and `nouveau_uvmm_vmm_put` (`OP_REMAP`), the rest *should* be fine
21:52 airlied[d]: I think the actual bug is in the SPT/LPT handling down lower
21:56 marysaka[d]: The fact that we unref in cleanup is a problem still because we are signaling the completion fence too early
21:56 marysaka[d]: this allow the userspace to have time to queue a map operation before the cleanup even happen
21:59 airlied[d]: that should be fine, the kernel should handle that ordering afaik, but I need to reread a bunch of code
21:59 airlied[d]: but I do think there is a bug in the SPT/LPT reference counting
21:59 marysaka[d]: but I do agree there might be a problem in `ref_sptes` / `unref_sptes`
22:06 marysaka[d]: I still do think that any MMU operation should have been completed by the time the job fence is signaled in any cases as this is what you expect as a user of this api tbh
22:55 cockroach: hi! I'm trying to get hardware decoding of h264 working on an NVAC but according to vainfo it only supports MPEG2: https://paste.centos.org/view/4292c3e2
22:56 cockroach: I tried several versions of mesa, libva and the kernel but I always get the same result. any idea what I'm doing wrong?
22:58 mangodev[d]: cockroach: h264 is unsupported through libva, and has been completely removed in newer mesa versions
22:59 cockroach: mangodev[d]: ah, that would explain things. should I try to get vdpau working then?
22:59 mangodev[d]: i'd ask a developer about exactly why, but off the top of my head, the reason why support was pulled is because libva doesn't give enough info to properly decode h264
23:00 mangodev[d]: cockroach: vdpau support has been entirely removed from newer versions of mesa as well
23:01 cockroach: mangodev[d]: hmm, so the h264 support mentioned on https://nouveau.freedesktop.org/VideoAcceleration.html - is that no longer a thing then?
23:01 mangodev[d]: been dead code for years
23:01 mangodev[d]: it only existed because of old nouveau
23:01 mangodev[d]: the last time it was ever supported was many many years ago
23:01 mangodev[d]: Fermi iirc?
23:01 mangodev[d]: cockroach: most of those docs are probably out of date :/
23:01 cockroach: :(
23:01 mangodev[d]: good news is that if your device can run nvk, there has been work to get h264 nvdec support through vulkan video
23:02 cockroach: oh? I'll give that a try then!
23:02 mangodev[d]: uhhh
23:02 mangodev[d]: probably won't be for some time
23:02 mangodev[d]: the MR has been dead for a bit, although someone here has talked about reviving it
23:03 cockroach: ah so I'm a little late for the old way and a little early for the new one, I suppose.
23:03 mangodev[d]: mangodev[d]: mhenning[d] iirc?
23:03 mangodev[d]: either you or mary
23:03 mangodev[d]: brought up reviving vulkan video support
23:04 mangodev[d]: cockroach: yeah
23:04 mangodev[d]: open source nvidia is kind of at a weird point rn
23:05 mangodev[d]: faith has been working on panvk, and the others are working on fixing nasty MMU faults, memory leaks, and other bad things nouveau-drm side
23:05 cockroach: ah yikes
23:06 cockroach: that does seem more important than bringing video decoding to ancient hardware :)
23:06 mangodev[d]: i hope nova could be a bit more stable, given it's a ground-up rewrite, but it probably won't be ready for some years at the current pace it's going
23:06 cockroach: oh nice, nova. I hadn't heard of that before.
23:06 mangodev[d]: cockroach: tbf, nouveau hasn't really changed anything for old hardware since about… 5? 7? years ago
23:07 cockroach: oh wow, so I'm *really* late
23:07 mangodev[d]: so you could probably work with an older version of mesa that hasn't dropped support
23:07 mangodev[d]: although those code paths are also really buggy
23:07 mangodev[d]: cockroach: the code paths were only removed very recently
23:07 cockroach: ah maybe with the mesa-amber stuff?
23:07 mangodev[d]: they've just…
23:07 mangodev[d]: rotted.
23:08 mangodev[d]: cockroach: mayhaps? i have no experience with legacy nouveau
23:08 mangodev[d]: i run a 1660 super myself
23:08 cockroach: well thanks anyway, good to know that I can stop poking around and trying to find config options.
23:09 mangodev[d]: most of that nouveau site is for legacy nouveau
23:09 mangodev[d]: i don't think it has been updated since 2020
23:09 mangodev[d]: little up-to-date docs exist about nvk
23:10 mangodev[d]: someone here posted a link to a feature matrix a while ago
23:10 mangodev[d]: but other than the mesa3d.org page, there's not a ton to work with
23:10 cockroach: so the best bet would be the proprietery driver then?
23:11 mangodev[d]: i'd love to contribute to the mesa docs, but i don't want to get anything wrong :(
23:11 mangodev[d]: cockroach: yeah probably
23:11 mangodev[d]: through vdpau on x11
23:11 mangodev[d]: (since vdpau doesn't support Wayland)
23:12 cockroach: that sounds so ... 2005 :)
23:13 cockroach: but thanks for the information
23:24 maatee90[d]: What are you using to write Rust code in the project?
23:24 maatee90[d]: I've downloaded meson and rust-analyzer for VS Code. That way some rs files looks good and "jump to def" works, but mostly it doesn't. (at least it doesn't emit a bunch of errors either)
23:51 esdrastarsis[d]: mangodev[d]: h264 decoding is working, btw