08:51qq[IrcCity]: hello. I have garbage https://www.flickr.com/photos/187764219@N06/51171770426/ on GeForce 8400GS with nouveau 1.0.17-1 after suspend.
08:55qq[IrcCity]: Xorg is the latest (1.20.11-1), Linux is also the latest. No other symptoms, nothing suspicious in Xorg.0.log.
08:57qq[IrcCity]: GeForce served me for years (under Linux-4.3).
08:59qq[IrcCity]: I am not sure that power management works correctly on my box, but have no idea how to isolate the trouble.
09:00qq[IrcCity]: And wait, the frmebuffer (namely, in text consoles) is fine.
09:15qq[IrcCity]: Amazingly, the mouse pointer is present (you can see it in the upper left corner of flickr.com/photos/187764219@N06/51171770426/) and live, but all other images in X are garbage.
09:23qq[IrcCity]: More precisely, the box has been in a text console (fb) when I suspended it. The text screen resumed without problems, but the next switch to X produced the garbage screen above.
09:26qq[IrcCity]: Should I try to debug the failed X server (while it is still running)?
10:26RSpliet: qq[IrcCity]: better drop a copy of your Xorg.0.log and dmesg on a paste website and share the URL(s) here
11:59qq[IrcCity]: RSpliet: http://www.superstructure.info/linux/5.12/artix-dmesg.txt interesting things can be found at [ 315.428228], [ 484.388585] and [ 4181.027388].
12:00qq[IrcCity]: I begin to suspect that booted a poor kernel.
12:01qq[IrcCity]: poor quality, that is, flawed.
12:39qq[IrcCity]: RSpliet: should I now kill Xorg and look for more kernel crap? Or do anything to /dev/fb0, /dev/dri/card0 or whatever?
15:57RSpliet: qq[IrcCity]: sorry, had a busy day, I'm not really a nouveau dev anymore. But experience tells me that that looks bad
15:58qq[IrcCity]: These kernel messages?
15:58RSpliet: karolherbst: can you look at that log please? http://www.superstructure.info/linux/5.12/artix-dmesg.txt
15:58RSpliet: yep
15:59RSpliet: looks like buffer management is going udders-up.
16:01qq[IrcCity]: Does it look like a reckless programming in the driver rather than merely memory corruption (probably because of unrelated bugs)?
16:07karolherbst: RSpliet: ohhh
16:07karolherbst: okay
16:07karolherbst: I see it
16:07karolherbst: sooo
16:08karolherbst: nouveau_bo_init fails
16:08karolherbst: and we want to clean it up
16:08karolherbst: but we don't have to
16:08karolherbst: I think I saw a patch...
16:08karolherbst: let's see
16:09karolherbst: RSpliet: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.13-rc1&id=925681454d7b557d404b5d28ef4469fac1b2e105 :D
16:10karolherbst: but that fixes something a bit above that
16:10karolherbst: maybe we need to do something similiar
16:11karolherbst: ahh yeah
16:12karolherbst: ttm_bo_release is called by ttm_bo_init
16:13karolherbst: yep
16:14karolherbst: qq[IrcCity]: you are the user hitting this?
16:14karolherbst: mind remove the call to nouveau_bo_ref inside nouveau_gem_new?
16:14karolherbst: this should be enough to fix it
16:14qq[IrcCity]: yes, I reported this today.
16:15qq[IrcCity]: Is the git.kernel.org thing a tetnative fix to the buffer eviction crap?
16:15qq[IrcCity]: *tentative
16:15karolherbst: qq[IrcCity]: it's a fix for a similiar issue I fixed in the past
16:20qq[IrcCity]: karolherbst: so you suspect my bug to result from another instance of bad nouveau_bo_ref? In another routine?
16:21qq[IrcCity]: Or not necessarilly nouveau_bo_ref?
16:21karolherbst: I have no idea what your bug is all about, just that the kernel is doing something stupid
16:23qq[IrcCity]: The garbage really looked like an unrelated chunk of memory got mapped to the framebuffer.
16:23karolherbst: or you are out of VRAM and crap happens
16:24karolherbst: the user after free is already after your issue
16:24karolherbst: something happens before so allocation fails
16:24karolherbst: fixing this will fix the user after free messages, but probaly not your actual issue
16:25qq[IrcCity]: the use, you mean?
16:25karolherbst: yes
16:35karolherbst: ahh crap
16:35karolherbst: messed up sending the email :/
16:36karolherbst: https://lists.freedesktop.org/archives/nouveau/2021-May/038659.html
16:36imirkin: in what way did you mess it up?
16:37karolherbst: imirkin: didn't send it to the nouveau ML
16:37imirkin: uhm
16:37imirkin: seems to be on the nouveau ML just fine
16:37imirkin: do you mean dri-devel?
16:37karolherbst: imirkin: what you see is my second try
16:37imirkin: oh :)
19:05pmoreau: I see someone played with some CI 🙂; I’ll need to look at that tomorrow and see what is going on.
19:06imirkin: hm?
19:07pmoreau: “Ci Scripts | ci: the ci (!1)” and a couple other similar MRs
19:09pmoreau: https://gitlab.freedesktop.org/nouveau/ci-scripts
19:40karolherbst: :D
19:41karolherbst: yeah..
19:41karolherbst: playing around with how people could submit patches via gitlab with a proper CI pipeline
19:41karolherbst: atm just checkpatch there but also want to add build testing with various configs
20:09pmoreau: 👍️
20:09pmoreau: Do you need runners for the build testing?
20:09karolherbst: not yet
20:09pmoreau: Ok
20:09karolherbst: we can use the runners we get from fdo for the CPU based testing like building stuff
20:09karolherbst: hw testing will be more interesting
20:20karolherbst: pmoreau: main issue is infrastructure though. We have to make sure all forks are on the same instance, so that forking doesn't take 20 minutes :D