08:51 qq[IrcCity]: hello. I have garbage https://www.flickr.com/photos/187764219@N06/51171770426/ on GeForce 8400GS with nouveau 1.0.17-1 after suspend.
08:55 qq[IrcCity]: Xorg is the latest (1.20.11-1), Linux is also the latest. No other symptoms, nothing suspicious in Xorg.0.log.
08:57 qq[IrcCity]: GeForce served me for years (under Linux-4.3).
08:59 qq[IrcCity]: I am not sure that power management works correctly on my box, but have no idea how to isolate the trouble.
09:00 qq[IrcCity]: And wait, the frmebuffer (namely, in text consoles) is fine.
09:15 qq[IrcCity]: Amazingly, the mouse pointer is present (you can see it in the upper left corner of flickr.com/photos/187764219@N06/51171770426/) and live, but all other images in X are garbage.
09:23 qq[IrcCity]: More precisely, the box has been in a text console (fb) when I suspended it. The text screen resumed without problems, but the next switch to X produced the garbage screen above.
09:26 qq[IrcCity]: Should I try to debug the failed X server (while it is still running)?
10:26 RSpliet: qq[IrcCity]: better drop a copy of your Xorg.0.log and dmesg on a paste website and share the URL(s) here
11:59 qq[IrcCity]: RSpliet: http://www.superstructure.info/linux/5.12/artix-dmesg.txt interesting things can be found at [ 315.428228], [ 484.388585] and [ 4181.027388].
12:00 qq[IrcCity]: I begin to suspect that booted a poor kernel.
12:01 qq[IrcCity]: poor quality, that is, flawed.
12:39 qq[IrcCity]: RSpliet: should I now kill Xorg and look for more kernel crap? Or do anything to /dev/fb0, /dev/dri/card0 or whatever?
15:57 RSpliet: qq[IrcCity]: sorry, had a busy day, I'm not really a nouveau dev anymore. But experience tells me that that looks bad
15:58 qq[IrcCity]: These kernel messages?
15:58 RSpliet: karolherbst: can you look at that log please? http://www.superstructure.info/linux/5.12/artix-dmesg.txt
15:58 RSpliet: yep
15:59 RSpliet: looks like buffer management is going udders-up.
16:01 qq[IrcCity]: Does it look like a reckless programming in the driver rather than merely memory corruption (probably because of unrelated bugs)?
16:07 karolherbst: RSpliet: ohhh
16:07 karolherbst: okay
16:07 karolherbst: I see it
16:07 karolherbst: sooo
16:08 karolherbst: nouveau_bo_init fails
16:08 karolherbst: and we want to clean it up
16:08 karolherbst: but we don't have to
16:08 karolherbst: I think I saw a patch...
16:08 karolherbst: let's see
16:09 karolherbst: RSpliet: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.13-rc1&id=925681454d7b557d404b5d28ef4469fac1b2e105 :D
16:10 karolherbst: but that fixes something a bit above that
16:10 karolherbst: maybe we need to do something similiar
16:11 karolherbst: ahh yeah
16:12 karolherbst: ttm_bo_release is called by ttm_bo_init
16:13 karolherbst: yep
16:14 karolherbst: qq[IrcCity]: you are the user hitting this?
16:14 karolherbst: mind remove the call to nouveau_bo_ref inside nouveau_gem_new?
16:14 karolherbst: this should be enough to fix it
16:14 qq[IrcCity]: yes, I reported this today.
16:15 qq[IrcCity]: Is the git.kernel.org thing a tetnative fix to the buffer eviction crap?
16:15 qq[IrcCity]: *tentative
16:15 karolherbst: qq[IrcCity]: it's a fix for a similiar issue I fixed in the past
16:20 qq[IrcCity]: karolherbst: so you suspect my bug to result from another instance of bad nouveau_bo_ref? In another routine?
16:21 qq[IrcCity]: Or not necessarilly nouveau_bo_ref?
16:21 karolherbst: I have no idea what your bug is all about, just that the kernel is doing something stupid
16:23 qq[IrcCity]: The garbage really looked like an unrelated chunk of memory got mapped to the framebuffer.
16:23 karolherbst: or you are out of VRAM and crap happens
16:24 karolherbst: the user after free is already after your issue
16:24 karolherbst: something happens before so allocation fails
16:24 karolherbst: fixing this will fix the user after free messages, but probaly not your actual issue
16:25 qq[IrcCity]: the use, you mean?
16:25 karolherbst: yes
16:35 karolherbst: ahh crap
16:35 karolherbst: messed up sending the email :/
16:36 karolherbst: https://lists.freedesktop.org/archives/nouveau/2021-May/038659.html
16:36 imirkin: in what way did you mess it up?
16:37 karolherbst: imirkin: didn't send it to the nouveau ML
16:37 imirkin: uhm
16:37 imirkin: seems to be on the nouveau ML just fine
16:37 imirkin: do you mean dri-devel?
16:37 karolherbst: imirkin: what you see is my second try
16:37 imirkin: oh :)
19:05 pmoreau: I see someone played with some CI 🙂; I’ll need to look at that tomorrow and see what is going on.
19:06 imirkin: hm?
19:07 pmoreau: “Ci Scripts | ci: the ci (!1)” and a couple other similar MRs
19:09 pmoreau: https://gitlab.freedesktop.org/nouveau/ci-scripts
19:40 karolherbst: :D
19:41 karolherbst: yeah..
19:41 karolherbst: playing around with how people could submit patches via gitlab with a proper CI pipeline
19:41 karolherbst: atm just checkpatch there but also want to add build testing with various configs
20:09 pmoreau: 👍️
20:09 pmoreau: Do you need runners for the build testing?
20:09 karolherbst: not yet
20:09 pmoreau: Ok
20:09 karolherbst: we can use the runners we get from fdo for the CPU based testing like building stuff
20:09 karolherbst: hw testing will be more interesting
20:20 karolherbst: pmoreau: main issue is infrastructure though. We have to make sure all forks are on the same instance, so that forking doesn't take 20 minutes :D