17:50aron: hi
17:51aron: I have a GT710 (GK208B) with 2 monitors connected
17:51aron: I see short white lines on both monitors at random locations
17:52karolherbst: aron: like randomly when it's in use?
17:52karolherbst: and what kind of monitors are those?
17:52aron: they are appear randomly, but goes away if I move a window there
17:52karolherbst: huh...
17:52aron: one is a DSUB other is HDMI monitor
17:53karolherbst: any errors in dmesg?
17:53aron: I don't remember anything
17:53aron: (I had to boot my old system, it was super frustrating)
17:54aron: I can check in a second
17:54karolherbst: okay, cool
17:58aron: "dmesg | grep nouv" yields only some information about the card
17:58karolherbst: mhhh
17:58karolherbst: aron: what desktop are you using and what's mesas version?
17:59aron: xfce4, mesa-21.2.6
17:59karolherbst: hard to say what's going on, because normally that stuff should work. I'd blame the cables, but given one is HDMI and it's happening on both...
17:59karolherbst: mhhh
17:59karolherbst: aron: compositiing enabled in xfwm or disabled?
18:00karolherbst: or is that on by default these days?
18:00aron: it's on by default, but I instantly disable it
18:00aron: I don't really like it
18:00karolherbst: well... without compositing random glitches are not preventable sadly
18:01karolherbst: could be something else though
18:01aron: when I enable it, the short white lines move to a different location :)
18:01karolherbst: ahh.. annoying
18:01karolherbst: do you know if your X server uses the nouveau or modesetting driver?
18:01aron: it's an alpine line, based on musl libc, not glibc
18:02karolherbst: I hope that glibc vs musl isn't causing this, but who knows
18:03aron: hmm
18:03aron: (EE) Failed to load module "nouveau" (module does not exist, 0)
18:04karolherbst: guess it uses modesetting then
18:04aron: (II) modeset(0): 720x400@70Hz
18:04aron: etc
18:04karolherbst: which isn't bad, but given you are on mesa 21.2 you might also run on an older kernel where we fixed some issues around modesetting.
18:04karolherbst: you could install the nouveau X driver and see if that helps
18:05aron: kernel is 5.15.55
18:05karolherbst: mhh, looks new enough
18:06aron: installing xf86-video-nouveau
18:08aron: reboot done
18:08aron: pesky white lines GONE!
18:08aron: thank you very much!
18:08karolherbst: np
18:08karolherbst: still wondering why it happens though..
18:08karolherbst: probably some bug in glamor or mesa
18:12aron: an unfortunate development: nouveau tends to crash
18:12karolherbst: uhh... that's bad
18:12aron: I'm trying to get some log
18:14aron: trying dmesg first, let's see
18:17aron: yes, I got a few lines of error
18:17aron: [ 211.747682] nouveau 0000:01:00.0: fifo: FB_FLUSH_TIMEOUT
18:17aron: [ 211.747809] nouveau 0000:01:00.0: fifo: CHSW_ERROR 00000001
18:17aron: quite a few FB_FLUSH_TIMEOUT
18:18aron: [ 211.937342] nouveau 0000:01:00.0: fifo: fault 00 [READ] at 0000000003021000 engine 1b [CE2] client 18 [HUB/GR_CE] reason 0c [UNSUPPORTED_KIND] on channel 2 [003fbdc000 Xorg[2893]]
18:18aron: [ 211.937354] nouveau 0000:01:00.0: fifo: channel 2: killed
18:18aron: [ 212.159337] nouveau 0000:01:00.0: DRM: GPU lockup - switching to software fbcon
18:18aron: and then rest in pieces
18:25aron: https://paste-bin.xyz/77373
18:25aron: full log
18:25aron: (not too long)
18:26aron: software fbcon does not work, tho
18:26aron: I need to reset
18:31karolherbst: mhhh...
18:32karolherbst: might be you are unlucky and nouveau isn't supporting your GPU very well :/
18:32aron: first time I was able to login at least :)
18:32aron: actually, I also think that
18:32aron: according to this:
18:33karolherbst: I am mildly aware that there are those FB_FLUSH_TIMEOUT issues around, but we also don't have a good idea on why that's even happening
18:33aron: https://nouveau.freedesktop.org/CodeNames.html
18:33aron: 01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT
18:33aron: 710] (rev a1) (prog-if 00 [VGA controller])
18:33aron: it's not listed
18:33aron: only 710M
18:33karolherbst: we suspect that our own firmware is busted, but that's a really painful part of the driver
18:34karolherbst: yeah.. though that's more of a "best effort" list. I usually recommend the wikipedia page: https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units
18:34aron: oh, IC
18:35aron: the funny thing, I have 2 cards, but the second card is for vfio
18:35aron: this little 710 is just for X
18:37karolherbst: might also be that bandwidth is an issue
18:37karolherbst: could check if booting with "nouveau.config=NvClkMode0xf" fixes it
18:37karolherbst: ehh
18:37karolherbst: "nouveau.config=NvClkMode=0xf
18:37aron: where should I put this?
18:37karolherbst: on the kernel command line
18:38karolherbst: or as a module option via modprobe.conf
18:38karolherbst: but updating the kernel command line with "grubby" is usually straightforward: grubby --update-kernel=ALL --args="nouveau.config=NvClkMode0xf"
18:39karolherbst: _but_ not sure if your distro even wires all of that up
18:39aron: I just modify the grub.cfg
18:39karolherbst: or that
18:39aron: if it works, I do it properly
18:42aron: yay, login is now successful
18:42karolherbst: interesting...
18:42karolherbst: I'd bet that this also fixes those glitches with the modesetting driver
18:43aron: what it does, btw?
18:43karolherbst: increases GPU clocks
18:43aron: okay, it does not hurt much
18:43karolherbst: you can check "/sys/kernel/debug/dri/0/pstate" for available perf states
18:44karolherbst: might be that you have a third one in the middle which is "good enough" but doesn't increases power consumption/head generation too much
18:44aron: 07: core 405 MHz memory 810 MHz
18:44aron: 0f: core 653-954 MHz memory 5010 MHz AC DC *
18:44aron: AC: core 953 MHz memory 5009 MHz
18:44karolherbst: ahh yeah.. only have two then
18:45karolherbst: given that's a 710 anyway, you probably increases power consumption from 5 to 9 W or so :D
18:45karolherbst: anyway... long term we want that to happen automatically, but given that the clocking code is broken for quite a lot of cards...
18:45aron: nice
18:45karolherbst: is it a passively cooled GPU?
18:46aron: yup, it is
18:46karolherbst: might want to make that a "nouveau.config=NvClkMode0xf,NvPmEnableGating=1" instead so you reduce power consumption a little
18:46aron: it was a "design choice"
18:47karolherbst: NvPmEnableGating is a feature we are not sure is 100% reliable, but it does reduce power consumption
18:47karolherbst: so the GPU might stay a bit cooler
18:47aron: (first I tried open nvidia kernel driver module, but this card was not supported, what a same)
18:47aron: (reboot)
18:48karolherbst: only talking about 10-20% though, but for passively cooled cards it might matter
18:49karolherbst: anyway.. I'd monitor the GPU temperature a bit with the high clocks just in case
18:52aron: it's worse now :)
18:52karolherbst: :(
18:52aron: maybe that missing "=" sign again?
18:52karolherbst: guess there is a bug in the ohhh
18:53karolherbst: yeah...
18:53karolherbst: copy&paste fail
18:53aron: I didn't noticed that too :P
18:53aron: let me fix that quick
18:55aron: yep :)
18:55aron: it did the trick
18:55aron: oookay, now I can dump the open nvidia drivers
18:56karolherbst: have fun
18:56aron: haha, I have 14 cooling_device
18:56aron: will be fun to find to correct one
18:57aron: bunch of CPUs
19:00aron: it's here: /sys/class/graphics/fb0/device/hwmon/hwmon1
19:00karolherbst: nvidia exposes a hwmon file now?
19:00karolherbst: crazy
19:02aron: thanks for the help again
19:03aron: I think I can do some progress with the system now
19:03karolherbst: np
19:19mynacol: * this reason. See https://nouveau.freedesktop.org/KernelModuleParameters.html
19:20mynacol: karolherbst: Doesn't NvClkMode require a non-hex value? I have NvClkMode=15 for this reason. See https://nouveau.freedesktop.org/KernelModuleParameters.html
19:20mynacol: And thanks for the tip with NvPmEnableGating, trying that now as well
19:32karolherbst: mynacol: I think it can handle both
19:33karolherbst: well.. obviously it does handle both
19:33mynacol: Then you might adapt the documentation? ;)