19:18 dreamingcat: hi. I have GM206 and run Gentoo on kernel 5.14.14 and also Plasma over x11 through latest nouveau. All this ran smooth until I tried to up the kernel to 5.15.0 which alone introduced noticeable and very annoying flicker/tearing in my plasma on almost any window activity. Dmesg and Xorg logs do not show anything related. Trying to play with plasma compositor settings like vsync actually makes flicker/tearing much worse. I am desperate as to where
19:18 dreamingcat: to start to troubleshoot this kind of issue. Please advise.
19:34 karolherbst: dreamingcat: what DDX are you using?
19:34 karolherbst: nouveau or modesetting?
19:34 karolherbst: if nouveau, it might help to turn on DRI3
19:35 dreamingcat: how do I know what DDX I am using?
19:35 karolherbst: check the Xorg log
19:35 karolherbst: either it will be "nouveau" or "modesetting"
19:35 karolherbst: ehh..
19:35 karolherbst: modeset I think
19:35 karolherbst: yeah.. modeset
19:36 karolherbst: but given it's gentoo, I suspect you are using the nouveau one anyway
19:36 dreamingcat: here's my xorg log https://dpaste.com/2EBQN85PF
19:37 karolherbst: yeah, that's nouveau alright
19:38 karolherbst: create a file like /etc/X11/xorg.conf.d/99-nouveau.conf or so with this content: https://gist.githubusercontent.com/karolherbst/599645419bebc18d527a35d915ac6d0f/raw/72b072e862df23a654b4e5454f9ac342beddab45/gistfile1.txt
19:38 karolherbst: and try with that
19:38 karolherbst: _but_ to figure out whether you are just unlucky or if the kernel indeed changed something, you could try to git bisect the kernel if you are up to it
19:38 karolherbst: *for it
19:39 karolherbst: using DRI3 has the benefit of reducing CPU load and having better tearing prevention and stuff (requires "auto" tearing preventation in kwin)
19:39 dreamingcat: I guess if the kernel was the culprit then it'd be more popular problem
19:39 karolherbst: but there are some weird issues with dri3 and exa (the accell method we use in the nouveau ddx)
19:39 dreamingcat: but I haven't seen anyone complaining about exactly that
19:39 karolherbst: yeah...
19:39 karolherbst: I consider tearing to be just a part of X and unfixable anyway...
19:39 karolherbst: except by using wayland
19:40 karolherbst: DRI3 _does_ help a lot here though
19:40 dreamingcat: ok, trying now
19:40 karolherbst: maybe it's just the CPU scheduler being different or something silly... and that's why you noticed it more wiht 5.15...
20:12 dreamingcat: karolherbst, well the results are mixed. Most software ceased to flicker, but some did not. Notably, Firefox still flickers on scrolling. And, surprisingly, plasma's system setting applet still flickers heavily if I just hover the mouse cursor over certain rectangular areas. Other software seems to behave.
20:13 karolherbst: mhhh
20:13 karolherbst: but I guess that's less tearing and more graphical glitches?
20:13 imirkin: dreamingcat: you can also try switching to modeset vs nouveau
20:13 imirkin: i don't know that it'll flicker less, but it'll be different
20:14 imirkin: in fairly unpredictable ways
20:14 imirkin: (i generally recommend 'nouveau', as it's simpler, more stable, and deals with errors. but checking out the modesetting ddx won't harm anything.)
20:14 dreamingcat: karolherbst, not it's like when I hover over a panel, it turns filled solid black color for a fraction of a second.
20:14 karolherbst: yeah..
20:14 karolherbst: I saw the same with intel actually
20:15 karolherbst: another reason I ditched plasma at some point
20:15 imirkin:still very happy with WindowMaker
20:15 karolherbst: _but_ they could also be bugs in the ddx
20:15 karolherbst: it's always hard to tell
20:15 imirkin: yeah, it's a very complex system, and generally *every* component has issues
20:16 imirkin: and makes invalid assumptions about how the other components operate
20:16 karolherbst: imirkin: but you always mentioned there might be accel related issues with exa and dri3 :/ but seems like that with dri2 it wasn't better...
20:16 imirkin: karolherbst: afaik there are unfixable issues with DRI3 + EXA
20:17 imirkin: (or at least, unfixable without completely redoing EXA)
20:17 imirkin: now, if you don't happen to hit them, then all is well
20:17 imirkin: iirc it was some KDE thing that was esp adept at hitting them
20:17 karolherbst: ahh
20:17 imirkin: the drawing stuff works fine
20:17 imirkin: it's some sync-related issues
20:17 karolherbst: dreamingcat: do you see those issues with compositing disabled?
20:17 imirkin: so you end up with black or flipping issues or something? i don't quite remember.
20:18 imirkin: but those issues don't happen with DRI3
20:18 imirkin: er
20:18 imirkin: sorry
20:18 imirkin: but those issues don't happen with DRI2 <-- is what i meant
20:18 karolherbst: kwin still uses OpenGL so it could also be some driver bug :/
20:18 imirkin: karolherbst: basically with EXA, the server draws to some pixmap
20:18 karolherbst: but it sounded like that those are more application internal issues...
20:19 imirkin: karolherbst: but if that pixmap is shared with other processes, there's no way to enforce the ordering of X drawing and other things drawing
20:19 imirkin: or ... something like that.
20:19 karolherbst: ohh...
20:19 imirkin: this is Not A Thing (tm) with DRI2
20:19 karolherbst: because I guess we don't share those with dri3 disabled
20:20 imirkin: well, DRI2 has no such sharing mechanism
20:20 imirkin: it's server-allocated, and it's a handle
20:20 imirkin: or ... something. i dunno. i haven't thought about it in a long time. MrCooper is the one who diagnosed the more general issue.
20:21 karolherbst: I just tell people to use wayland and move on :D
20:21 imirkin: and i just tell people i don't care about tearing and keep using X ;)
20:21 imirkin: both valid approaches i suppose
20:21 dreamingcat: karolherbst, yes I see them with compositing disabled, although subjectively the black rectangles now appear for shorter time period
20:21 karolherbst: but at least we fixed that OUT OF MEM issue hit by glamor
20:21 karolherbst: dreamingcat: annoying
20:21 imirkin: karolherbst: ah nice
20:21 imirkin: karolherbst: i'm all for fixing glamor ;)
20:22 karolherbst: imirkin: well.. it was a race condition inside nouveaus kernel driver :)
20:22 imirkin: oh. the SUPER thing?
20:22 karolherbst: yeah
20:22 imirkin: i think nouveau ddx was hitting it too
20:22 imirkin: random allocations would fail
20:22 karolherbst: random memory stuff just failed
20:22 karolherbst: yep
20:22 imirkin: which led to those "-2" prints
20:22 karolherbst: yeah
20:22 karolherbst: same thing I guess
20:22 karolherbst: I was able to hit it at some point
20:22 imirkin: the ddx handled the failure though
20:22 imirkin: (you just lost accel on those pixmaps, so ... who cares)
20:23 karolherbst: yeah.. glamor was fixed to not crash as well
20:23 karolherbst: but given how often x gets released...
20:23 imirkin: ah, at least with the ddx it was pretty rare
20:23 imirkin: like ... once a week or so
20:23 imirkin: (but also definitely in bursts)
20:23 imirkin: like 5 would fail in a row
20:23 karolherbst: yeah...
20:24 karolherbst: race conditions are always annoying like this :)
20:24 imirkin: anyways, glad to know that super thing is fixex
20:24 karolherbst: yeah.. it was a very common issue
20:26 karolherbst: dreamingcat: always I think those rectangles could be some synchronisation problems we have in nouveau... I am not sure, but I think I noticed something like that but could never figure out what was the problem :
20:32 dreamingcat: karolherbst, so is it theoretically possible to solve this by moving on wayland?
20:32 karolherbst: dunno
20:33 karolherbst: it could make it less likely.. but e.g. if it's a bug inside mesa it might happen regardless
20:33 karolherbst: I think the bigger question is, are those issues contained inside applications or are windows etc.. also hit by it
20:37 dreamingcat: all right, let's remember that it happened to my system when I tried to use kernel 5.15.0. I am now looking at kernel sources at github ( linux/drivers/gpu/drm/nouveau - is this the right place to look ) and it says the most recent commit was March 25 this year. So I guess this means nothing has actually changed in the nouveau kernel drm driver
20:41 karolherbst: dreamingcat: there should be changes though
20:42 karolherbst: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/drivers/gpu/drm/nouveau?h=v5.15
20:43 dreamingcat: yes, figured it out just now
20:47 dreamingcat: well... my linux expertise level is a bit below kernel hacking, so I guess I'll just wait for next kernel release hoping it'll be fixed somehow; meanwhile sticking to 5.14.14
20:48 karolherbst: dreamingcat: I mentioned this a bit furhter up, but what might help is to figure out what broke it
20:48 karolherbst: it sadly just takes a while :/
20:48 karolherbst: I can also check if I can reproduce it tomorrow on my system as well
20:49 karolherbst: I even have a gm206, so maybe it's a common problem.. who knows
20:49 dreamingcat: also I forgot to mention I have four monitor setup on my GM206
20:49 karolherbst: mhhh
20:49 karolherbst: I assume it also happens if you run just one of them?
20:49 dreamingcat: checking....
20:54 dreamingcat: yes, it happens :(
20:55 karolherbst: okay... it's already 10pm here, so I doubt I will check today if I can hit this issue, so you can either wait until I check tomorrow or you can already start git bisect if you got a lot of spare time (will probably take like 12 full kernel recompiles)
20:56 karolherbst: my CPU isn't too shaby, so even with a distribution config I need like 19 minutes for one full go... but given you use gentoo you might don't even need 10 :D
20:56 karolherbst: or an hour, no idea how fast your CPU is
20:56 dreamingcat: it's ryzen 3 so it's decently fast
20:58 dreamingcat: the thing is I am kind of wary of using vanilla kernels. Gentoo comes with it's own kernel patches and I might break something somewhere
20:58 karolherbst: from my experience those patches don't really matter, but maybe imirkin knows more
20:59 karolherbst: let's see..
21:01 dreamingcat: karolherbst: anyway thanks for helping, I'll contact you tomorrow
21:01 karolherbst: ahh... okay.. the changes are quite irrelevant
21:01 karolherbst: okay
21:02 imirkin: karolherbst: fwiw i've never used a "gentoo" kernel on my gentoo system
21:02 karolherbst: imirkin: apparently it matters if you use xattr and compile inside tmpfs
21:03 imirkin: the only time i had problems with self-built kernels that vendor kernels didn't have was in the 2.2 timeframe
21:03 imirkin: er no, 2.0
21:03 imirkin: something to do with the KX133 chipset? i forget.
21:03 imirkin: it ended up that i was missign some weird kernel option
21:40 HdkR: imirkin: If you play in ARM land then self built kernels have problems all over the place :P
21:43 HdkR: I can understand not wanting to deal with that amount of self-harm though
21:44 karolherbst: HdkR: it's a gentoo user though :p
21:45 HdkR: True. What's a bit of kernel pain to add to your userspace pain
21:45 karolherbst: for maximum pain: USE=-* and globally enabled lto and -Ofast
21:47 HdkR: I guess with thinlto now it might not be quite as terrible
21:47 karolherbst: the problem was never lto in itself
21:47 karolherbst: it's buggy software which is
21:48 HdkR: Of course
21:48 karolherbst: Ofast is even weirder though
21:50 HdkR: That's the option for asking programs to break.
21:50 karolherbst: I should have asked about that
21:50 karolherbst: "are you using Ofast?" and then just close as "not our bug" if the answer is yes
21:51 karolherbst: apparently GL and the CTS kind of rely on IEEE float behavior.. who would have guessed
22:28 imirkin: esp fun are programs that enable float exceptions, and then are surprised that llvmpipe triggers them