17:55aric49: Hi everyone, super quick question.... I am running the Nouveau driver in Fedora 30 and am experiencing relatively frequent lockups in which my PC completely freezes (can't move mouse, everything is at a standstill)... The only thing I can do is ALT + SYSRQ + REISUB to hard reboot my laptop
17:55aric49: any idea what I can do going forward to recover from nouveau lockups without having to reboot my machine? Or maybe prevent the lock ups from happening all together?
17:58diogenes_: aric49, how you know it's nauveau causing lock ups?
17:58karolherbst: diogenes_: because it will be.
17:58aric49: ABRT is showing `WARNING: CPU: 7 PID: 73 at drivers/gpu/drm/nouveau/nvif/vmm.c:68 nvif_vmm_put+0x6a/0x80 [nouveau] [nouveau]` as the crash reason
17:58karolherbst: aric49: nouveau.runpm=0 will probably workaround that issue
17:58karolherbst: that's the different error
17:59karolherbst: nouveau.modeset=0 if you don't use the nvidia gpu
17:59diogenes_: aric49, have you tried in xorg? (in case you use wayland).
18:00aric49: yes, I've tried XORG and wayland --- I use the NVIDIA GPU for displayport monitors
18:00RSpliet: aric49: could you share a bit more of a complete log please?
18:00aric49: this crash happens maybe once a day or so
18:00aric49: sure --- one sec
18:00karolherbst: RSpliet: it will be either the secboot failing or the runpm failing bug, sadly :/
18:01RSpliet: karolherbst: just like in https://bugs.freedesktop.org/show_bug.cgi?id=110572 ?
18:01karolherbst: RSpliet: that's a different error :p
18:02RSpliet: Quite :-D Anyway, aric hasn't told us which GPU it is yet, perhaps the full dmesg will give us that info free of charge in a few secs :-)
18:02karolherbst: a full dmesg should tell us which bug it is
18:02aric49: here is the dmesg log
18:03RSpliet: GM107... that's pre secureboot isn't it?
18:03karolherbst: wait.. I know this bug
18:03karolherbst: that's the one I and ilia sometimes have
18:03karolherbst: but skeggsb never
18:04karolherbst: and nobody has any idea what's up
18:04karolherbst: but until now the machine never died from it
18:05RSpliet: aric49: that warning (not an error apparently) happened 83 seconds into boot. That's unlikely to be the reason of your hang...
18:06aric49: here is the output of me running the dmesg command.. https://gist.github.com/aric49/c9751b21e08f931d29102f6febebd180
18:06karolherbst: aric49: uhm.. you do run dmesg on a machine which froze?
18:07karolherbst: on a working boot that's kind of pointless
18:07karolherbst: maybe journalctl could give you a past log though
18:07karolherbst: but with a hard reset data can be lost
18:07RSpliet: aric49: sorry, I should have been a little clearer :-) ^ what karolherbst says, journalctl should be able to give you a log. If you used the magic sysrq for rebooting I suspect there's still traces of useful info in your logs
18:08aric49: sure. let me check
18:08RSpliet: karolherbst: Fun, in that second paste the WARN didn't occur. Sounds like vmm_put is reading potentially-uninitialised data
18:09karolherbst: nobody has any idea about it
18:09karolherbst: not even if there is any harm
18:10karolherbst: sometimes I get it like 100 of times and nothing breaks
18:11aric49: any tips for parsing journalctl for nouveau related events?
18:12karolherbst: journalctl --boot -1 --dmesg | grep -i nouveau
18:12karolherbst: and replace -1 with the number of which boot in the past crashed the machine
18:12karolherbst: -1 for the previous one, -2 for the one before that ;)
18:15aric49: looks liek that's the one ^^
18:15karolherbst: aric49: mhhh :/
18:15karolherbst: that's just a context crash
18:15karolherbst: do you know what triggers it?
18:16aric49: usually if i am doing too many tasks that engage the GPU. For example watching youtube videos, or engaging the Gnome-shell task switcher with the windows key are normally the biggest culprits
18:17aric49: i can go days without a crash.. but most of the time it happens around once or twice a day
18:17karolherbst: yeah.. well
18:17karolherbst: there are two parts of the issue
18:18karolherbst: 1. mesa (most likely) doing something which crashes the hardware context
18:18karolherbst: 2. X waiting forever for the application to do something
18:18karolherbst: normally crashing the culprit unfreezes your machine :/
18:18karolherbst: but.. we can't really do it automatically
18:19karolherbst: there is no easy way of doing it right now, but we should have a solution for that in the future
18:20RSpliet: karolherbst: My €0.02 on the VMM warning that I haven't seen myself: the "WARN_ON" would have been a lot more helpful if the return value of the nvif_object_mthd() call was reported. Might be worth slightly patching up nvif_vmm_put if you want to find out what goes wrong
18:20aric49: hmmm.. interesting. Thanks for the explanation
18:21aric49: any tips for somehow recovering from a crash without forcing a reboot?
18:21pmoreau: Too many tasks at once, could it be the MT issues striking again? Though I don’t remember it resulting in a context timeout, or?
18:21karolherbst: aric49: sshing into the machine and try to find the offending process :/
18:21karolherbst: and kill it
18:22karolherbst: aric49: I have some patches to do that automatically, but it requires patching the kernel, libdrm and mesa
18:22karolherbst: aric49: also using the nouveau ddx should give you the real process name instead of the logind stuff
18:22aric49: got it.. how do i use the nouveau ddx?
18:23karolherbst: ohh wait, do you actually use the nvidia gpu? because you said it's a laptop
18:23karolherbst: ohh p51, dedicated mode?
18:23karolherbst: maybe you want to use the intel gpu instead
18:23karolherbst: less issues for you overall
18:23karolherbst: there should be a switch in the uefi settings
18:24aric49: that was my first thought -- but i can't use the displayport ports on my dock if i disable the nvidia gpu
18:24karolherbst: and then you can use DRI_PRIME=1 to offload specific applications and do reclocking if you need perf
18:24karolherbst: aric49: you should be with reverse prime
18:24karolherbst: we fixed a few issues in that regard
18:24karolherbst: if not, ping Lyude
18:24karolherbst: Lyude should know what's up if something doesn't work
18:24aric49: sorry, reverse prime?
18:25karolherbst: uhm.. something technical :p
18:25karolherbst: it's something the Xorg server does automatically these days
18:25karolherbst: it uses the other GPU for displaying stuff
18:25karolherbst: so if intel is main, but you have some ports on the nvidia GPU, it can be used to display stuff rendered on the intel one
18:25karolherbst: normally that should just work out of the box
18:25karolherbst: maybe there are still some weird issues around...
18:26karolherbst: nouveau.runpm=0 could help, or replugging the display after the machine was bootet
18:26aric49: do i configure the DRI_PRIME=1 and nouveau.runpm=0 in GRUB? or somewhere else?
18:27karolherbst: when launching the process
18:27karolherbst: like "DRI_PRIME=1 glxinfo"
18:27karolherbst: nouveau.runpm=0 should be added to grub, yeah
18:27karolherbst: but this disables powering off the GPU
18:28karolherbst: so you should only use it if you really have issues with the nvidia gpu if using the intel as main
18:28aric49: ah gotcha
18:33aric49: so probably just wait for fixes in mesa and deal with it sometimes crashing until then?
18:33aric49: would it be beneficial to report this to the mesa team?
18:35karolherbst: well, your best bet would be to use the intel GPU as your main one as this would solve a lot of issues. And people are kind of aware of the issues, it's just nothing which can be easily fixed
18:36aric49: got it
18:36aric49: maybe i can start using VGA instead of display port
18:36aric49: let me see
18:37aric49: but then I wouldn't be able to use any of you guys wonderful hardwork developing a free software nvidia driver ;-D
18:38RSpliet: Well, the hard work will continue to be poured in whether you end up using nouveau or not. VGA is however a step back in picture quality
18:39aric49: yup.. exactly... which is why i'm a little hesitant to switch back to my intel GPU
18:39HdkR:pours a drink of Nouveau
18:40aric49: thanks for the help guys! I super appreciate it
19:12karolherbst: HdkR: uhm, wait, actually, how is the situation in your case? Would you be allowed to contribute to Nouveau now? :D Allthough I guess that would be hard in your case I guess :/
19:12karolherbst: or maybe not?
19:21HdkR: karolherbst: I could do kernel bits, nothing related to the nouveau shader compiler as far as I am aware
19:29karolherbst: mhh, okay, but what about implementing opengl bits?
19:29karolherbst: or well, command submission in general
21:26neur0sis: Hello, do the rtx 2080 ti 11gb is working with nouveau ? I can't get it work with X while the display work in tty. I do not use the firmware blob of nvidia. thanks
21:27neur0sis: black screen when I startx
21:27gnarface: neur0sis: i don't know, but you might even be the first one to try it
21:28neur0sis: I have 2 nvidia on my motherboard, the 1050 ti work nicely, the rtx doesn't want
21:28gnarface: known support statuses of various chips here: https://nouveau.freedesktop.org/wiki/FeatureMatrix/
21:28neur0sis: alirght not a big deal anyway
21:28gnarface: check on the feature matrix. if it's not even on there, this might be a prime opportunity for you to provide useful info
21:29neur0sis: will take a look
21:30neur0sis: NVIDIA Corporation TU102[#nouveau]
21:30neur0sis: this is the lspci return of the rtx
21:30neur0sis: GP107 for my 1050 which work without problem
21:31gnarface: yea, looks like the TU102 has nothing working so far
21:32gnarface: but really maybe partially because nobody has got their hands on one
21:32neur0sis: don't think it will help but my kernel is deblob (using the script of fsfla)
21:32gnarface: i'd recommend hanging out until a developer shows up to get info from you
21:32gnarface: you might be able to help them get it working
21:32neur0sis: alright, happy to help the dev if needed
21:33gnarface: maybe not immediately, but i'm sure some debugging info is where they start... mmiotraces and kernel debugging dumps. sorry i can't help you with specifics on how though.
21:33neur0sis: I will let run irssi through a screen so I won't be disconnected, brb
21:44karolherbst: neur0sis: I think a turing code should work with a new enough kernel
21:44neur0sis: https://pastebin.com/0PqLkkb6 lspci -vv of the card, if the dev need any info, query me
21:45neur0sis: I have now, should I use more recent ?
21:45HdkR: karolherbst: Maybe you should temper their expectations a bit :P
21:45karolherbst: 4.20 got support for gv100
21:45karolherbst: 5.0 should support turing
21:46HdkR: Working well enough to bring up a screen with software rasterizer is a sad state
21:46karolherbst: not that it is any better with a gv100
21:46karolherbst: but yeah
21:47karolherbst: and with a deblobed kernel the experience on a gp107 and tu102 will be more or less the same anyway :p
21:47neur0sis: alright will go for the mainline
21:48neur0sis: yeah sure I just avoiding me to change the hdmi every time I use my gaming system and my working system
21:48HdkR: karolherbst: I should file all complaints directly to you that you don't have Turing working and don't have the RT extensions running full speed right? :)
21:49HdkR: Prep a bin to start a trash fire in
21:49neur0sis: + the bios doesn't show up on my seconds gpu (1050)
21:49karolherbst: HdkR: well, or you could help out implementing all of that :p
21:49karolherbst: neur0sis: that's normal
21:49karolherbst: bios usually only supports the primary GPU
21:49HdkR: lol. Implement VK_NV_ray_tracing directly on my Radeon 7 :D
21:49karolherbst: depends highly on the firmware, but usually main GPU only
21:50neur0sis: If nouveau could work with my primary rtx, its' cool otherwise I just need to switch not a big deal. will try the mainline kernel as recommanded
21:52neur0sis: btw another question, vdpau is support by nouveau ?
21:53neur0sis: libvdpau_nouveau.so I have, but libvdpau from freedesktop only link to libvdpau_nvidia.so
21:54karolherbst: neur0sis: not witha deblobed kernel
21:55neur0sis: alright so making a patch to link to libvdpau_nouveau.so is basically useless
21:55neur0sis: or have libvdpau at all even
21:56karolherbst: libvdpau shouldn't link to any implementations actually
21:56karolherbst: the driver gets selected at runtime
21:58neur0sis: will probably remove the ebuild from my overlay then, https://github.com/g3ngr33n/emergeless/blob/master/x11-libs/libvdpau/files/nouveau.patch <--- useless then ?
21:59neur0sis: with and without the nivdia-firmware ?
22:00gnarface: the issue with the firmware is that nobody knows how it works or what it does exactly
22:00gnarface: i get the impression it's largely a black box even to nvidia employees
22:00gnarface: so it's a bit of voodoo
22:01gnarface: yes, you'll need it for any hope of video acceleration working
22:01gnarface: if you check that nouveau page though, you'll see that it currently only works for some cards, and nobody really knows why
22:02neur0sis: I rather prefer using the output of my seconds gpu instead of having a blob on this sytem
22:02gnarface: i've heard of attempts to reverse engineer it using mmio traces but i don't know if any of those efforts have borne much fruit
22:03gnarface: understandable. essentially you'd have to replace the firmware with something you wrote yourself.
22:03neur0sis: probably beyond my skill to this
22:03gnarface: i know it's way beyond mine
22:04gnarface: but i also know you'd start by getting mmiotraces from the official firmware while it loads
22:04gnarface: that's the best anyone can do at this point
22:05gnarface: if you can find a way to fake what it's doing, you might make some progress
22:05gnarface: afaik that has been useful for getting the official firmware working with nouveau, but not for replacing it entirely
22:06HdkR: Latest firmwares are also signed, so the hardware wouldn't even load handwritten ones
22:06gnarface: oh yea, that too. that's for what, everything after the 900 series?
22:06gnarface: or including the 900's too?
22:06HdkR: Including 900
22:07HdkR: It's Maxwell2+