00:09ccr: hmm. I may have found the cause and a kludge-fix of https://gitlab.freedesktop.org/drm/nouveau/-/issues/11 ..
00:42imirkin: ccr: wow, subtle!
00:43imirkin: i had reviewed that commit and also noticed something weird about it. but also couldn't really convince myself that there was any real problem.
00:43ccr: hopefully someone with more understanding can do a cleaner fix, but \:D/
00:43imirkin: skeggsb: --^
00:44imirkin: and danvet, who appears to be offline atm...
00:50skeggsb: i'll take a closer look later this arvo
00:51imirkin: (that's "afternoon" for those who don't speak aussie)
00:53ccr: to my non-kernel-dev eye that whole "revalidate" loop logic seems a bit sketchy, not sure why it is needed .. but shrug
00:57imirkin: orbea: i'm not aware of any such issues in the nouveau ddx
00:58imirkin: orbea: DRI3 having issues with nouveau ddx is a known thing
00:58imirkin: unfortunately EXA and DRI3 aren't really compatible
01:34orbea: imirkin: the right click menu in firefox was a red herring, it came back with modesetting. I guess it just doesn't reproduce right away
01:34orbea: the main issues seem to be all nine related
01:35orbea: with DRI3 i would change xorg sessions and when I came back, the nine game would have frozen video (input still worked)
01:35orbea: with DRI2 it doesn't work at all so :shrug:
01:35imirkin: frozen forever, or just for a bit?
01:36orbea: i had to restart the game
01:36imirkin: also when you "change" sessions, what does that mean precisely?
01:36orbea: i start the game from a empty tty with: xinit /path/to/game.sh -- :1 vt2
01:37orbea: and then vtswitch between xorg sessions
01:37imirkin: i see
01:37imirkin: so like whole different xorg servers running on different vt's
01:37imirkin: wow yeah, can't imagine what that does
01:37imirkin: i did actually fix some stuff
01:37imirkin: that i keep forgetting to release a new ddx for
01:37imirkin: which *might* hit that case
01:38orbea: i could try the 9999 package in a bit assuming it exists
01:38imirkin: if there's a xf86-video-nouveau-9999, give that a shot
01:38orbea: im busy redoing wine which I screwed up...
01:38orbea: i'll let you know later when that is done
01:38imirkin: thankfully nouveau is a bit faster to build :)
01:38orbea: yes :)
01:39imirkin: that commit specifically i think *might* affect you
01:39imirkin: i saw it more with GPUs with very little vram, but the xserver switch could trigger something similar i think
01:40imirkin: (the original reports were on a NV5 gpu with 32mb vram)
01:41orbea: worth checking at least
04:05orbea:doesn't get why gentoo has xorg-drivers 9999 which builds ddx drivers which are clearly not the lastest git commit...
04:05orbea: and I see no 9999 here https://packages.gentoo.org/packages/x11-drivers/xf86-video-nouveau
10:03ccr: danvet, hi. sorry to pester you, but since you authored the commit (referenced in the issue) which caused this problem ( https://gitlab.freedesktop.org/drm/nouveau/-/issues/11 ) perhaps you could look at the ugly "fix" I came up with and perhaps figure out a better solution? :)
10:15danvet: ccr, apologies for entirely missing it
10:16danvet: your fix looks correct
10:16danvet: can you pls submit it to skeggsb and mailing lists, with all the usual things included?
10:17ccr: danvet, I am wondering about the whole "revalidate" loop logic in there, though .. do you remember why it was added in? is there chance that the data being validated could actually change?
10:17danvet: https://dri.freedesktop.org/docs/drm/process/5.Posting.html?highlight=submitting%20patches in case you need extensive howto
10:17ccr: (of course this is just my uninformed view, as I don't know anything about this)
10:18danvet: ccr, do you mean the goto revalidate?
10:18danvet: yeah it's not pretty, but it's needed
10:18danvet: it's a locking inversion problem
10:19danvet: validate is a bit strangely named, it makes sure the buffers are in the right place and stay there
10:19danvet: and to stay there we hold the buffer locks
10:19danvet: but while holding buffer locks you can't take a page fault
10:19danvet: and copying the relocations might cause a page fault
10:20danvet: but since userspace just wrote these relocations, it's very unlikely
10:20danvet: and doing the dummy copy would be a bit wasted
10:20danvet: so hence we just try, and in the unlikely case it fails, we go the slow way
10:20danvet: if I remember this all correctly
10:21ccr: I see.
10:21ccr: another question .. that reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc)); does call copy_from_user() inside it, also inside the fences, is that not harmful? or is it not affected by the fence?
10:22ccr: sorry, my terminology is probably wrong
10:22ccr: but on the patch messages it says "We can't copy_*_user while holding reservations" etc
10:23ccr: and to my unrefined eye that code does copy_from_user() inside the same code region where you moved the logic around to avoid it in the other case
10:24ccr: but alas I don't fully understand what is being held and what requires it, so perhaps it is ok.
10:26ccr:is sorry for asking silly questions. :)
11:33danvet: ccr, it's not silly, since from a quick look I don't see how it works exactly anymore either
11:34ccr: heh :)
11:35ccr: well, it took a year for anyone (?) to notice that there was a problem in the first place .. or at least until I noticed. so no blame on you for not remembering. :D
11:35danvet: ccr, ah right there's no fastpath
11:35danvet: we just say "sucks to be using relocs"
11:35danvet: drop the locks again in validate_fini
11:35danvet: do the copying
11:35danvet: then restart
11:35danvet: could do better, but iirc review said "meh"
11:35danvet: so it's not quite as clever as I described
11:38ccr: well, as long as it's not going to cause deadlocks, I'm fine with it .. just wanted to know if it was something you accidentally missed.
11:39ccr: can't stress enough how little I understand of this anyway.
11:42ccr: I'll try to formulate a My First Kernel Patch(tm) with stuff later today. I wonder if I should squash the two things together into one, as they are kinda related and rather trivial, or submit separately.
11:43danvet: ccr, which two things?
11:43danvet: oh and feel free to pastebin the patch here for some pre-review before submitting
11:44danvet: just ping me
11:44ccr: the u_free() double-free mentioned in the issue and this apply_relocs
11:58ccr: e.g. https://gitlab.freedesktop.org/drm/nouveau/-/issues/11#note_699195
11:59c_nix: hooray! I finally type channel name correctly!
12:00c_nix: I got a following question does display port work on quadro cards ok typically?
12:05karolherbst: typically yes
12:11c_nix: thank you
16:12TimurTabi: If the GSP were to issue an interrupt, is there already a function that will receive it?
16:12TimurTabi: Oops, wrong window
22:53ccr: ah, danvet gone again. anyway .. https://tnsp.org/~ccr/0001-drm-nouveau-fix-relocations-applying-logic-and-a-dou.patch
23:10ccr: the above ^ is my first kernel patch, so I'm open for input in case the description etc. seem wrong/verbose/terse or has something missing/formal stuff. added CC: skeggsb as requested by danvet and nouveau mailing list etc.