IRC Logs of #nouveau on irc.freenode.net for 2024-10-16

14:34 gfxstrand[d]: How not functioning is it?
14:34 gfxstrand[d]: RADV has a lot of device info and it's pretty important to get right.
14:35 gfxstrand[d]: But I think I was working on a 7600xt which should be pretty similar
14:36 gfxstrand[d]: No, 7800xt
14:36 gfxstrand[d]: So at least the HW generation and shader stuff should be the same.
14:36 gfxstrand[d]: Image layouts might not be, though.
15:20 redsheep[d]: gfxstrand[d]: Looking back my wording wasn't quite what I meant. I'm still wrangling my wsl into behaving properly to begin with and I haven't even tried building your radv wddm branch yet. It seemed like it might be stuck on software raster and when I tried to troubleshoot my amd drivers on my host they appear borked so I'd probably need to do a round of DDU. Was just thinking if I'm going to
15:20 redsheep[d]: bother maybe I should swap the card to make sure it's got a good chance to work, because my igpu is rdna 2, not 3. I'm just going to use my htpc for initial testing though since I know it's got a matching card and working drivers.
15:21 redsheep[d]: Arch wsl doesn't seem to even ship dozen, I wasn't seeing any vulkan driver
15:43 zmike[d]: zmike[d]: this looks like a kde bug and should be in their next release
15:43 zmike[d]: so...don't use kde on zink for now
17:37 tiredchiku[d]: they're considering making a 6.2.1.1 release
17:38 tiredchiku[d]: to fix some regressions introduced in 6.2.1
17:51 zaraf: hello, I have a RTX 4060Ti 16GB using nouveau/GSP and at the beginning of the boot process the screen stays black for around 15 seconds and then booting continues (doesn't happen with proprietary driver). tested using kernel 6.11 (fedora 41 beta) but goes back to at least 6.9.x
17:51 zaraf: Example dmesg here https://pastebin.com/763Z7r0h (see 5.256151 following, there's a trace). Would this be something worth reporting or is this expected/known with current state of nouveau/gsp on RTX 4k series?
17:55 zaraf: I will check latest 6.12-rc if this is something worth chasing
17:56 redsheep[d]: I think I see something similar with my 4090. If you don't see it on the prop driver my suspicion would be something to do with the much older 535 gsp in use right now, hoping this kind of thing will improve when we upgrade
18:47 airlied[d]: the backtrace isn't related to GSP or stalling
18:47 airlied[d]: and is non-fatal
19:07 karolherbst[d]: airlied[d]: mhh, but it does take over 30 seconds for nouveau to become ready...
19:07 airlied[d]: well we need to work out what it is doing there 🙂
19:07 airlied[d]: maybe boot with nouveau.debug=trace
19:07 karolherbst[d]: yeah, that might help
19:08 karolherbst[d]: `nouveau.debug=subdev=trace` (I think) might be enough, because that prints how long each subsystem takes to init
19:08 karolherbst[d]: but might as well get the full thing
19:08 airlied[d]: probably want log_buf_len=4M as well
20:11 skeggsb9778[d]: the AE_AML_LOOP_TIMEOUT seems relevant
20:47 airlied[d]: oh maybe it's related then
21:57 gfxstrand[d]: eric_engestrom: we should chat about CI when I get back next year. At this point I think the only blocker is the fact that the GSP still falls over sometimes. I see it about every 3-4 full 36-thread 2-GPU CTS runs. Running 5% of the CTS or whatever we're actually planning to do for CI should make it a lot more stable. If you're happy with the stability, I'd say go ahead and turn it on for pre-
21:57 gfxstrand[d]: merge.
22:00 gfxstrand[d]: I also commented in gitlab
22:09 phomes_[d]: There were good ideas of how to always zero vram mentioned here. I think that is the way to go but they require kernel work too and seem like would take some time to fix. Do we want to do the driconf thing as a workaround until that materializes? There are games crashing and misrendering today due that we could fix
22:14 phomes_[d]: driconf MR is !29892. We can split the patches for radv out to a separate MR
22:39 gfxstrand[d]: Yeah, I think driconf is a good first step
22:43 airlied[d]: gfxstrand[d]: when gsp dies, what do you mean? like we get a vm fault or the whole thing just stops talking?
23:03 gfxstrand[d]: Spamming "vmm allocation failed"
23:05 airlied: ah okay yeah that's the bad one
23:05 airlied: would need to get Timur's patches and the gsp logs out of that
23:06 skeggsb9778[d]: i did figure out how to decode them a couple of months back - so i could take a look if one sends them to me
23:07 gfxstrand[d]: Maybe tomorrow I can stress test my machine and get a log of you tell me what options I need to enable to get the dump out of it.
23:08 skeggsb9778[d]: Timur has a patch series on the nouveau ml that wasn't yet merged, but you can just copy them out of debugfs with those applied
23:08 gfxstrand[d]: Okay. Link?
23:09 skeggsb9778[d]: https://lists.freedesktop.org/archives/nouveau/2024-September/045658.html
23:09 skeggsb9778[d]: ugh
23:09 skeggsb9778[d]: hangon
23:10 skeggsb9778[d]: https://patchwork.freedesktop.org/series/138492/
23:10 skeggsb9778[d]: that's better
23:12 airlied[d]: last time it was the TLB flushing race
23:12 airlied[d]: we'd get a fault on the bar where you shouldn't get faults