17:55 aric49: Hi everyone, super quick question.... I am running the Nouveau driver in Fedora 30 and am experiencing relatively frequent lockups in which my PC completely freezes (can't move mouse, everything is at a standstill)... The only thing I can do is ALT + SYSRQ + REISUB to hard reboot my laptop
17:55 aric49: any idea what I can do going forward to recover from nouveau lockups without having to reboot my machine? Or maybe prevent the lock ups from happening all together?
17:58 diogenes_: aric49, how you know it's nauveau causing lock ups?
17:58 karolherbst: diogenes_: because it will be.
17:58 aric49: ABRT is showing `WARNING: CPU: 7 PID: 73 at drivers/gpu/drm/nouveau/nvif/vmm.c:68 nvif_vmm_put+0x6a/0x80 [nouveau] [nouveau]` as the crash reason
17:58 karolherbst: aric49: nouveau.runpm=0 will probably workaround that issue
17:58 karolherbst: ohhh
17:58 karolherbst: that's the different error
17:59 karolherbst: nouveau.modeset=0 if you don't use the nvidia gpu
17:59 diogenes_: aric49, have you tried in xorg? (in case you use wayland).
18:00 aric49: yes, I've tried XORG and wayland --- I use the NVIDIA GPU for displayport monitors
18:00 RSpliet: aric49: could you share a bit more of a complete log please?
18:00 aric49: this crash happens maybe once a day or so
18:00 aric49: sure --- one sec
18:00 karolherbst: RSpliet: it will be either the secboot failing or the runpm failing bug, sadly :/
18:01 RSpliet: karolherbst: just like in https://bugs.freedesktop.org/show_bug.cgi?id=110572 ?
18:01 karolherbst: RSpliet: that's a different error :p
18:02 RSpliet: Quite :-D Anyway, aric hasn't told us which GPU it is yet, perhaps the full dmesg will give us that info free of charge in a few secs :-)
18:02 karolherbst: yeah
18:02 karolherbst: a full dmesg should tell us which bug it is
18:02 aric49: here is the dmesg log
18:02 aric49: https://gist.github.com/aric49/edd0438933ca56686d4d5c5ad8c5ba8e
18:03 RSpliet: GM107... that's pre secureboot isn't it?
18:03 karolherbst: ohhh
18:03 karolherbst: wait.. I know this bug
18:03 karolherbst: that's the one I and ilia sometimes have
18:03 karolherbst: but skeggsb never
18:04 karolherbst: and nobody has any idea what's up
18:04 karolherbst: but until now the machine never died from it
18:04 karolherbst: fun
18:05 RSpliet: aric49: that warning (not an error apparently) happened 83 seconds into boot. That's unlikely to be the reason of your hang...
18:06 aric49: here is the output of me running the dmesg command.. https://gist.github.com/aric49/c9751b21e08f931d29102f6febebd180
18:06 karolherbst: aric49: uhm.. you do run dmesg on a machine which froze?
18:07 karolherbst: on a working boot that's kind of pointless
18:07 karolherbst: maybe journalctl could give you a past log though
18:07 karolherbst: but with a hard reset data can be lost
18:07 RSpliet: aric49: sorry, I should have been a little clearer :-) ^ what karolherbst says, journalctl should be able to give you a log. If you used the magic sysrq for rebooting I suspect there's still traces of useful info in your logs
18:08 aric49: sure. let me check
18:08 RSpliet: karolherbst: Fun, in that second paste the WARN didn't occur. Sounds like vmm_put is reading potentially-uninitialised data
18:09 karolherbst: dunno
18:09 karolherbst: nobody has any idea about it
18:09 karolherbst: not even if there is any harm
18:10 karolherbst: sometimes I get it like 100 of times and nothing breaks
18:11 aric49: any tips for parsing journalctl for nouveau related events?
18:12 karolherbst: journalctl --boot -1 --dmesg | grep -i nouveau
18:12 karolherbst: and replace -1 with the number of which boot in the past crashed the machine
18:12 karolherbst: -1 for the previous one, -2 for the one before that ;)
18:14 aric49: https://gist.github.com/aric49/f44835fbec517d3747f7bc934f4eb129
18:15 aric49: looks liek that's the one ^^
18:15 karolherbst: aric49: mhhh :/
18:15 karolherbst: that's just a context crash
18:15 karolherbst: do you know what triggers it?
18:16 aric49: usually if i am doing too many tasks that engage the GPU. For example watching youtube videos, or engaging the Gnome-shell task switcher with the windows key are normally the biggest culprits
18:17 aric49: i can go days without a crash.. but most of the time it happens around once or twice a day
18:17 karolherbst: mhh
18:17 karolherbst: yeah.. well
18:17 karolherbst: there are two parts of the issue
18:18 karolherbst: 1. mesa (most likely) doing something which crashes the hardware context
18:18 karolherbst: 2. X waiting forever for the application to do something
18:18 karolherbst: normally crashing the culprit unfreezes your machine :/
18:18 karolherbst: but.. we can't really do it automatically
18:19 karolherbst: there is no easy way of doing it right now, but we should have a solution for that in the future
18:20 RSpliet: karolherbst: My €0.02 on the VMM warning that I haven't seen myself: the "WARN_ON" would have been a lot more helpful if the return value of the nvif_object_mthd() call was reported. Might be worth slightly patching up nvif_vmm_put if you want to find out what goes wrong
18:20 aric49: hmmm.. interesting. Thanks for the explanation
18:21 aric49: any tips for somehow recovering from a crash without forcing a reboot?
18:21 pmoreau: Too many tasks at once, could it be the MT issues striking again? Though I don’t remember it resulting in a context timeout, or?
18:21 karolherbst: aric49: sshing into the machine and try to find the offending process :/
18:21 karolherbst: and kill it
18:22 karolherbst: aric49: I have some patches to do that automatically, but it requires patching the kernel, libdrm and mesa
18:22 karolherbst: aric49: also using the nouveau ddx should give you the real process name instead of the logind stuff
18:22 aric49: got it.. how do i use the nouveau ddx?
18:23 karolherbst: ohh wait, do you actually use the nvidia gpu? because you said it's a laptop
18:23 karolherbst: ohh p51, dedicated mode?
18:23 karolherbst: maybe you want to use the intel gpu instead
18:23 karolherbst: less issues for you overall
18:23 karolherbst: there should be a switch in the uefi settings
18:24 aric49: that was my first thought -- but i can't use the displayport ports on my dock if i disable the nvidia gpu
18:24 karolherbst: and then you can use DRI_PRIME=1 to offload specific applications and do reclocking if you need perf
18:24 karolherbst: aric49: you should be with reverse prime
18:24 karolherbst: we fixed a few issues in that regard
18:24 karolherbst: if not, ping Lyude
18:24 karolherbst: Lyude should know what's up if something doesn't work
18:24 aric49: sorry, reverse prime?
18:25 karolherbst: uhm.. something technical :p
18:25 karolherbst: it's something the Xorg server does automatically these days
18:25 karolherbst: it uses the other GPU for displaying stuff
18:25 karolherbst: so if intel is main, but you have some ports on the nvidia GPU, it can be used to display stuff rendered on the intel one
18:25 karolherbst: normally that should just work out of the box
18:25 karolherbst: maybe there are still some weird issues around...
18:26 karolherbst: nouveau.runpm=0 could help, or replugging the display after the machine was bootet
18:26 karolherbst: *booted
18:26 aric49: do i configure the DRI_PRIME=1 and nouveau.runpm=0 in GRUB? or somewhere else?
18:27 karolherbst: when launching the process
18:27 karolherbst: like "DRI_PRIME=1 glxinfo"
18:27 karolherbst: nouveau.runpm=0 should be added to grub, yeah
18:27 karolherbst: but this disables powering off the GPU
18:28 karolherbst: so you should only use it if you really have issues with the nvidia gpu if using the intel as main
18:28 aric49: ah gotcha
18:33 aric49: so probably just wait for fixes in mesa and deal with it sometimes crashing until then?
18:33 aric49: would it be beneficial to report this to the mesa team?
18:35 karolherbst: well, your best bet would be to use the intel GPU as your main one as this would solve a lot of issues. And people are kind of aware of the issues, it's just nothing which can be easily fixed
18:36 aric49: got it
18:36 aric49: maybe i can start using VGA instead of display port
18:36 aric49: let me see
18:37 aric49: but then I wouldn't be able to use any of you guys wonderful hardwork developing a free software nvidia driver ;-D
18:38 RSpliet: Well, the hard work will continue to be poured in whether you end up using nouveau or not. VGA is however a step back in picture quality
18:39 aric49: yup.. exactly... which is why i'm a little hesitant to switch back to my intel GPU
18:39 HdkR:pours a drink of Nouveau
18:40 aric49: thanks for the help guys! I super appreciate it
19:12 karolherbst: HdkR: uhm, wait, actually, how is the situation in your case? Would you be allowed to contribute to Nouveau now? :D Allthough I guess that would be hard in your case I guess :/
19:12 karolherbst: or maybe not?
19:21 HdkR: karolherbst: I could do kernel bits, nothing related to the nouveau shader compiler as far as I am aware
19:29 karolherbst: mhh, okay, but what about implementing opengl bits?
19:29 karolherbst: or well, command submission in general
21:26 neur0sis: Hello, do the rtx 2080 ti 11gb is working with nouveau ? I can't get it work with X while the display work in tty. I do not use the firmware blob of nvidia. thanks
21:27 neur0sis: black screen when I startx
21:27 gnarface: neur0sis: i don't know, but you might even be the first one to try it
21:28 neur0sis: I have 2 nvidia on my motherboard, the 1050 ti work nicely, the rtx doesn't want
21:28 gnarface: known support statuses of various chips here: https://nouveau.freedesktop.org/wiki/FeatureMatrix/
21:28 neur0sis: alirght not a big deal anyway
21:28 gnarface: check on the feature matrix. if it's not even on there, this might be a prime opportunity for you to provide useful info
21:29 neur0sis: will take a look
21:30 neur0sis: NVIDIA Corporation TU102[#nouveau]
21:30 neur0sis: this is the lspci return of the rtx
21:30 gnarface: hmmm
21:30 neur0sis: GP107 for my 1050 which work without problem
21:31 gnarface: yea, looks like the TU102 has nothing working so far
21:32 gnarface: but really maybe partially because nobody has got their hands on one
21:32 neur0sis: don't think it will help but my kernel is deblob (using the script of fsfla)
21:32 gnarface: i'd recommend hanging out until a developer shows up to get info from you
21:32 gnarface: you might be able to help them get it working
21:32 neur0sis: alright, happy to help the dev if needed
21:33 gnarface: maybe not immediately, but i'm sure some debugging info is where they start... mmiotraces and kernel debugging dumps. sorry i can't help you with specifics on how though.
21:33 neur0sis: I will let run irssi through a screen so I won't be disconnected, brb
21:44 karolherbst: neur0sis: I think a turing code should work with a new enough kernel
21:44 neur0sis: https://pastebin.com/0PqLkkb6 lspci -vv of the card, if the dev need any info, query me
21:44 neur0sis: 4.20-12
21:45 neur0sis: I have now, should I use more recent ?
21:45 HdkR: karolherbst: Maybe you should temper their expectations a bit :P
21:45 karolherbst: 4.20 got support for gv100
21:45 karolherbst: 5.0 should support turing
21:46 HdkR: Working well enough to bring up a screen with software rasterizer is a sad state
21:46 karolherbst: well
21:46 karolherbst: not that it is any better with a gv100
21:46 karolherbst: but yeah
21:47 karolherbst: and with a deblobed kernel the experience on a gp107 and tu102 will be more or less the same anyway :p
21:47 neur0sis: alright will go for the mainline
21:48 neur0sis: yeah sure I just avoiding me to change the hdmi every time I use my gaming system and my working system
21:48 HdkR: karolherbst: I should file all complaints directly to you that you don't have Turing working and don't have the RT extensions running full speed right? :)
21:48 karolherbst: sure
21:49 HdkR: Prep a bin to start a trash fire in
21:49 neur0sis: + the bios doesn't show up on my seconds gpu (1050)
21:49 karolherbst: HdkR: well, or you could help out implementing all of that :p
21:49 karolherbst: neur0sis: that's normal
21:49 karolherbst: bios usually only supports the primary GPU
21:49 HdkR: lol. Implement VK_NV_ray_tracing directly on my Radeon 7 :D
21:49 karolherbst: depends highly on the firmware, but usually main GPU only
21:50 neur0sis: If nouveau could work with my primary rtx, its' cool otherwise I just need to switch not a big deal. will try the mainline kernel as recommanded
21:52 neur0sis: btw another question, vdpau is support by nouveau ?
21:53 neur0sis: libvdpau_nouveau.so I have, but libvdpau from freedesktop only link to libvdpau_nvidia.so
21:54 karolherbst: neur0sis: not witha deblobed kernel
21:55 neur0sis: alright so making a patch to link to libvdpau_nouveau.so is basically useless
21:55 neur0sis: or have libvdpau at all even
21:56 karolherbst: libvdpau shouldn't link to any implementations actually
21:56 karolherbst: the driver gets selected at runtime
21:58 neur0sis: will probably remove the ebuild from my overlay then, https://github.com/g3ngr33n/emergeless/blob/master/x11-libs/libvdpau/files/nouveau.patch <--- useless then ?
21:59 neur0sis: with and without the nivdia-firmware ?
22:00 gnarface: the issue with the firmware is that nobody knows how it works or what it does exactly
22:00 gnarface: i get the impression it's largely a black box even to nvidia employees
22:00 gnarface: so it's a bit of voodoo
22:01 gnarface: yes, you'll need it for any hope of video acceleration working
22:01 gnarface: if you check that nouveau page though, you'll see that it currently only works for some cards, and nobody really knows why
22:02 neur0sis: I rather prefer using the output of my seconds gpu instead of having a blob on this sytem
22:02 gnarface: i've heard of attempts to reverse engineer it using mmio traces but i don't know if any of those efforts have borne much fruit
22:03 gnarface: understandable. essentially you'd have to replace the firmware with something you wrote yourself.
22:03 neur0sis: probably beyond my skill to this
22:03 gnarface: i know it's way beyond mine
22:04 gnarface: but i also know you'd start by getting mmiotraces from the official firmware while it loads
22:04 gnarface: that's the best anyone can do at this point
22:05 gnarface: if you can find a way to fake what it's doing, you might make some progress
22:05 gnarface: afaik that has been useful for getting the official firmware working with nouveau, but not for replacing it entirely
22:06 HdkR: Latest firmwares are also signed, so the hardware wouldn't even load handwritten ones
22:06 gnarface: oh yea, that too. that's for what, everything after the 900 series?
22:06 gnarface: or including the 900's too?
22:06 HdkR: Including 900
22:07 HdkR: It's Maxwell2+