20:16 lovesegfault: Hi everyone
20:16 lovesegfault: the last update to the feature matrix was almost a year ago: https://nouveau.freedesktop.org/wiki/FeatureMatrix/
20:16 imirkin_: no news is good news
20:16 lovesegfault: Have there been any changes to the level of support for the NV130 line?
20:17 lovesegfault: In particular I recall some issues around needing a signed FW from nvidia
20:17 lovesegfault: Curious if anything has changed :)
20:19 imirkin_: pascal?
20:19 imirkin_: signed firmware was released for ctxsw fw
20:19 imirkin_: pmu firmware never will be
20:19 lovesegfault: imirkin_: What's ctxsw and pmu?
20:20 imirkin_: context switching
20:20 imirkin_: power management
20:21 imirkin_: all GL4.5 CTS tests now pass on pascal (but not when run together :) )
20:23 lovesegfault: imirkin_: does this mean nouveau + NV130 will always hog power?
20:23 imirkin_: no
20:23 imirkin_: it means that nouveau + NV130 will never hog power
20:23 imirkin_: even when you want it to
20:23 lovesegfault: now = master or latest kernel
20:23 lovesegfault: Oh, I meant regarding the pmu fw never being released
20:23 imirkin_: right
20:23 imirkin_: the gpu is stuck at boot clocks
20:24 imirkin_: which are a small fraction of the "max" clocks the chips are capable of
20:24 imirkin_: so you get like 5% of the perf
20:24 imirkin_: but on the bright side, a lot less power used
20:24 lovesegfault: Oh, I see
20:24 lovesegfault: Can I just turn it off? I have an intel igpu which I'd rather use
20:25 lovesegfault: Right now I have bumblebee + bbswitch + nvidia drivers
20:25 lovesegfault: and it's brittle and I need this proprietary bs
20:25 imirkin_: the gpu should auto-suspend
20:26 imirkin_: assuming it's in a laptop designed for this sort of thing
20:26 HdkR: Sadly it's been six years since Maxwell. Easy to tell that Nvidia won't ever release PMU blobs
20:26 imirkin_: (which it sounds like it is)
20:27 imirkin_: if it doesn't complain here, someone should be able to help
20:27 lovesegfault: So on kernel 5.5.6 I should be able to enable nouveau and it will just suspend my nvGPU and use the iGPU?
20:27 imirkin_: hoepfully
20:27 imirkin_: it doesn't work in some laptops
20:27 imirkin_: but it does work in others :)
20:27 lovesegfault: It's a `01:00.0 VGA compatible controller: NVIDIA Corporation GP107GLM [Quadro P2000 Mobile] (rev ff)`
20:27 imirkin_: that's irrelevant
20:27 lovesegfault: Which is why I worry, these pascal lines are always weirder
20:27 imirkin_: what's relevant is how your ACPI tables are laid out
20:28 lovesegfault: I am yet to learn how ACPI works
20:30 imirkin_: "rarely"
20:32 lovesegfault: lol
20:32 lovesegfault: alright, nuked nvidia, bumblebee, etc and enabled nouveau
20:32 lovesegfault: rebooting, hopefully brb
20:40 lovesegfault: imirkin: I got a kernel panic so large I can't see it all in dmesg
20:40 lovesegfault: lol
20:41 lovesegfault: from gdm trying to start with wayland
20:41 lovesegfault: Where/how/who do I report it to?
20:42 imirkin_: here's good. pastebin
20:42 lovesegfault: https://gist.github.com/365918e7212ff2ca380ca48e163e7b80
20:42 lovesegfault: There :)
20:42 imirkin_: yeah, that's not enough
20:42 imirkin_: can you boot with....
20:42 imirkin_: log_buf_len=4M ?
20:43 lovesegfault: absolutely, one moment
20:43 imirkin_: that should help you keep the start of that
20:43 imirkin_: coz clearly something goes wrong during load
20:43 imirkin_: which is very odd
20:43 imirkin_: do you have the firmware?
20:44 lovesegfault: I think I do, one moment
20:44 imirkin_: ls /lib/firmware/nvidia/gp107
20:45 lovesegfault: imirkin_: This is nixos, stuff is all in different places :P
20:45 imirkin_: ok, well you get the idea
20:46 lovesegfault: yep, does it come from `linux-firmware`?
20:46 lovesegfault: Or is it packaged independently
20:47 lovesegfault: imirkin_: is gp107 a folder or a file?
20:47 imirkin_: folder
20:47 lovesegfault: alright, let me see
20:47 lovesegfault: can you tell me a file inside it?
20:47 lovesegfault: makes it easier to use nix-locate on my cached file list
20:48 lovesegfault: wait
20:48 lovesegfault: I found it
20:48 lovesegfault: `79v4p51z4yqpk16dwpg08h87kf9byypz-firmware-linux-nonfree-2020-01-22/lib/firmware/nvidia/gp107`
20:49 lovesegfault: Rebooting, brb
20:56 lovesegfault: imirkin: https://gist.github.com/5d77ddbc49f2dae9e17c8be41ce4d8e6
20:56 lovesegfault: Bam, got it all the way to the start, I think
20:57 imirkin_: yeah, looks like it
20:57 imirkin_: [ 15.402277] nouveau 0000:01:00.0: gr: init failed, -16
20:58 imirkin_: sigh.
20:58 lovesegfault: What?
20:58 imirkin_: ok, but let's think about this
20:58 imirkin_: you have zero need for this GPU to actual do things other than suspend, right?
20:59 lovesegfault: In an ideal world it would stay available so I can use my HDMI ports, BUT I have lived without them for ~1 year and can continue to do so
20:59 lovesegfault: i.e. Yes, I just need it to not use power
21:01 imirkin_: so then you can boot with nouveau.noaccel=1 nouveau.nofbaccel=1
21:01 imirkin_: i bet that loads ok
21:01 lovesegfault: let me see
21:01 imirkin_: karolherbst: see above for GP107 failing on load
21:01 imirkin_: [ 13.402542] nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 409800 [ TIMEOUT ]
21:01 imirkin_: which in turn causes gr init to fail
21:01 imirkin_: or skeggsb --^
21:02 lovesegfault: Alright, I enabled those boot options, rebooting and brb
21:05 lovesegfault: imirkin: https://gist.github.com/90ce37e7d258171fe6ac7d227631f4fa
21:05 lovesegfault: bam!
21:05 lovesegfault: worked
21:05 lovesegfault: Let me check the power consumption with tlp
21:05 lovesegfault: s/tlp/powertop/
21:06 imirkin_: check vgaswitcheroo status
21:06 imirkin_: should say DynOff
21:06 lovesegfault: is that in /proc somewhere?
21:06 imirkin_: cat /sys/kernel/debug/vgaswitcheroo/switch
21:07 lovesegfault: 0:IGD:+:Pwr:0000:00:02.0
21:07 lovesegfault: 1:DIS-Audio: :DynOff:0000:01:00.1
21:07 lovesegfault: 2:DIS: :DynOff:0000:01:00.0
21:07 imirkin_: yay
21:07 imirkin_: apparently that didn't last...
21:09 karolherbst: imirkin_: my workroun ptch hould help
21:09 karolherbst: the gr off/on thingy
21:09 imirkin_: oh
21:09 imirkin_: heh
21:12 lovesegfault: imirkin: Something ain't right
21:12 imirkin_: =/
21:12 lovesegfault: if I unplug the power cord my system reboots!
21:12 lovesegfault: :D
21:12 imirkin_: logical...
21:12 lovesegfault: let's try again, let's see what happens
21:13 imirkin_: fwiw nouveau is sensitive to being on AC/DC
21:13 imirkin_: there were some ideas about having different perf level settings for them
21:13 imirkin_: and i think it still hooks into the acpi events
21:13 imirkin_: perhaps something goes wrong there? dunno
21:13 lovesegfault: Can I see logs from previous boot?
21:13 imirkin_: if your syslog caught them, sure
21:14 imirkin_: easiest thing is to be ssh'd in from a diff comp
21:14 lovesegfault: Mind you, it doesn't crash and burn reboot, it like normally and calmly reboots
21:14 imirkin_: and running "dmesg -w"
21:14 imirkin_: oh
21:14 imirkin_: then it's probably something in your system
21:14 imirkin_: which processes the event
21:14 imirkin_: and decides it's time to go
21:14 lovesegfault: is there a --kernel for journalctl
21:14 imirkin_: i don't do systemd
21:15 lovesegfault: OH
21:15 lovesegfault: I see it
21:15 lovesegfault: `Feb 27 13:09:23 foucault kernel: nouveau 0000:01:00.0: therm: temperature (511 C) hit the 'shutdown' threshold`
21:15 lovesegfault: 511C!
21:15 lovesegfault: wat
21:15 imirkin_: that's fairly toasty
21:16 imirkin_: and/or is 0x1ff
21:16 lovesegfault: and then a kernel panic
21:16 imirkin_: normally we suppress readout when the gpu suspends
21:16 imirkin_: karolherbst: --^
21:16 lovesegfault: https://gist.github.com/70c8d45057a13d97418d526103c17234
21:16 lovesegfault: Here's a partial log
21:16 lovesegfault: journalctl didn't seem to have the whole thing
21:17 imirkin_: the gpu is suspended, but something doesn't quite realize it? dunno
21:17 imirkin_: Feb 27 13:09:23 foucault kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
21:17 imirkin_: (pci is active-low, so when it's off, reads come back as all 1's)
21:18 lovesegfault: Oh, wait, I got the full logs
21:18 lovesegfault: https://gist.github.com/ea785423d5dd2c87e49c01946023c956
21:18 karolherbst: imirkin_: we already fixed that in 5.4 or 5.3 or so
21:19 karolherbst: ohhh
21:19 karolherbst: imirkin_: this is the GPU didn't resumed
21:21 karolherbst: ahh, yeah, the broken intel root port
21:21 lovesegfault: the what?
21:21 karolherbst: the intel PCIe root port is buggy (or at least I am convinced it's buggy)
21:22 karolherbst: lovesegfault: top two linux kernel patches will fix all your issues: https://github.com/karolherbst/linux/commits/runpm_fixes
21:22 lovesegfault: karolherbst: Let me try them out
21:22 lovesegfault: does this mean I should also remove the nouveau.noaccel boot flag?
21:22 karolherbst: git branch is based on 5.5.5.. I might just go ahead and rebase it
21:22 karolherbst: lovesegfault: yeah
21:22 lovesegfault: let me try those patches out
21:23 karolherbst: imirkin_: upstream is like: ohh, you base your tests and investigation on undocumented regs and claim the hw has a bug? no way we can accept this theory then :/
21:24 imirkin_: karolherbst: i don't think that's _quite_ what bjorn said...
21:24 imirkin_: but certainly having a PCIe analyzer would strengthen your argument
21:24 karolherbst: I don't think I was refering to what bjorn said
21:24 imirkin_: not to mention your net worth...
21:25 imirkin_: those things are probably like a few hundred K?
21:25 karolherbst: probably
21:25 karolherbst: and you know, we would need them for mxm cards... or ...
21:25 karolherbst: well
21:25 imirkin_: probably can rent one for a day
21:26 karolherbst: I don't have a laptop with an mxm module showing this bug :)
21:26 karolherbst: anyway.. I really should send out my newest version of the patch
21:27 karolherbst: just wanted to test it on my machine for some time
21:29 lovesegfault: Alright, rebuilding the kernel
21:30 lovesegfault: patches applied just fine to 5.5.6 karolherbst
21:32 karolherbst: imirkin_: the sad thing is, we already asked nvidia and intel about this bug... guess what
21:33 HdkR: "It's the other person's fault"
21:34 karolherbst: of course
21:34 karolherbst: fun thing is.... it might be nouveaus fault, but something is weird
21:34 karolherbst: if I unload nouveau, it starts working again
21:34 karolherbst: and I have no idea why
21:35 karolherbst: I am quite sure pci core is doing things differently, but.. mhh
21:39 lovesegfault: karolherbst: so wait, why can't these patches be upstreamed?
21:40 karolherbst: lovesegfault: I didn't say they can't be upstreamed, just that nobody knows what's up and everybody would like to know more about the issue
21:40 lovesegfault: Ah, got it
21:40 karolherbst: anyway, the new version might actually be accepted, I just wanted to test it to make sure it indeed fixes the issue
21:41 lovesegfault: Sure, I'll help as soon as this damn kernel builds :D
21:58 lovesegfault: I'm a brain genius and built the wrong kernel
21:58 lovesegfault: So now building the right one
22:49 lovesegfault: karolherbst: https://thumbs.gfycat.com/ImpartialWhichHorsemouse-size_restricted.gif
22:49 lovesegfault: it works!
22:49 lovesegfault: I've tested plug/unplug a few times too
22:49 lovesegfault: all working
22:50 lovesegfault: karolherbst, imirkin thank you!