10:32 karolherbst: imirkin: if you got some time, some user report issues with gtk4
10:33 karolherbst: uhm.. on nv50
11:54 HerrSpliet: imirkin: remember when I said I would be happy to do some HDR/4K experiments once I get my new monitor?
11:54 HerrSpliet: About that... that monitor's been delayed by 3 weeks already
11:54 HerrSpliet:cries in Brexit
11:58 karolherbst: RSpliet: have fun :p
11:59 karolherbst: but I do have an HDR/4K display, so I could try out things as well
15:23 imirkin: karolherbst: probably not today, but maybe on the weekend
15:58 biovoid: how does nouveau.noaccel=1 in the kernel differ from NoAccel in xorg.conf? xorg logs say "falling back to NoAccel" with the kernel param, but directly using NoAccel instead gives a substantial drop in performance
16:00 imirkin: noaccel doesn't allow userspace to submit any commands to the GPU
16:00 imirkin: (but still allows fbcon to be accelerated)
16:00 imirkin: NoAccel in the xorg driver is just telling the Xorg driver to use internal xorg rendering instead of accelerated commands
16:01 imirkin: their perf should be identical if you're just using xorg
16:07 biovoid: "just xorg" as opposed to what? the difference I am experiencing is substantial
16:07 imirkin: hm
16:07 imirkin: surprising
16:07 imirkin: which one's slower?
16:08 biovoid: NoAccel in xorg.conf is slower
16:08 imirkin: surprising
16:08 imirkin: don't have an explanation, sorry
16:09 biovoid: Well thanks for the prior insight regardless
16:09 imirkin: perhaps NoAccel enables some stuff which is slower than it needs to be
16:09 imirkin: dunno
16:29 imirkin: biovoid: what are you doing which is slower? i might try it out later
16:29 imirkin: biovoid: also can you confirm which ddx is getting used when the kernel has noaccel=1
16:34 biovoid: imirkin: I just loaded an Xfce session, and simple things like opening menus and moving windows were painfully chunky and sluggish
16:34 imirkin: ok thanks
16:34 imirkin: and what CPU / GPU?
16:35 biovoid: re ddx, think the answer you want is llvmpipe, but if you have a command to get a specific string I can do that
16:35 imirkin: llvmpipe is a gallium backend
16:35 imirkin: the ddx is the driver Xorg loads, which is completely unrelated to mesa
16:36 imirkin: you can work it out from the xorg log
16:36 imirkin: does it say like NOUVEAU(0) or modeset(0) (those are the likely options)
16:37 biovoid: Oh whoops. I see NOUVEAU(0)
16:37 imirkin: in both cases presumably?
16:37 biovoid: I'm running dual Opteron 4280 with GTX 670
16:38 imirkin: is that an ancient CPU? (sorry, less familiar with the opteron line), or is it approximately the same age as the GTX 670?
16:38 imirkin: so like ... ~6 years?
16:39 biovoid: iirc the opteron is 2013
16:40 biovoid: maybe 2011
16:41 biovoid: reasonably close to the 670's 2012
16:41 imirkin: ok. but not like the original line from 2006 or whatever :)
16:41 biovoid: heh nah
16:42 imirkin: wow, 670 was 2012? time flies.
16:42 imirkin: yeah, i guess maxwell came out in 2014 or so.
16:55 biovoid: Thinking of upgrading (slightly), but should I expect anything Kepler to require noaccel to function? (I get random freezes without it)
17:07 imirkin: kepler's the best-supported arch
17:07 imirkin: you get reclocking/etc
17:08 imirkin: but nouveau's not perfect, so some software may not play nice with its foibles
17:09 imirkin: i would not imagine a different arch would behave very differently for the same software
17:19 biovoid: To clarify, I'm looking for a configuration which does not require disabling hw accel. Or at least some insight as to why it is so commonly needed
17:21 imirkin: do you run lots of software whose name starts with a 'k'?
17:22 biovoid: You mean like KDE stuff? not particularly
17:23 imirkin: yes :)
17:23 imirkin: esp plasmashell
17:23 imirkin: which i suppose doesn't start with a 'k'
17:23 biovoid: Actually none of my explicitly insalled packages start with k, as it so happens :P
17:23 imirkin: lol
17:24 biovoid: but no, no plasma
17:24 imirkin: not even kpathsea?
17:24 imirkin: (which has nothing to do with kde)
17:24 imirkin: anyways, having common hangs is rather unexpected
17:25 imirkin: esp if you're not doing anything fancy
17:25 biovoid: I'm on Parabola—kpathsea appears to be bundled in texlive
17:25 imirkin: there are some GTX 660's which play particularly poorly with nouveau
17:25 imirkin: (but not all GTX 660's)
17:25 imirkin: but i've never heard of that with GTX 670
17:26 biovoid: I've had hangs when literally doing nothing in X
17:26 biovoid: I mean the session existed, but I'd be remoted in and suddenly unresponsive
17:26 imirkin: is there any additional info on top of "it hangs"?
17:26 imirkin: huh
17:26 imirkin: anything in dmesg?
17:27 imirkin: also any type of custom configuration in the xorg.conf for the nouveau driver?
17:29 biovoid: From HangDiagnosis, previously I had problems at 4, but recent attempts seem to stop at 3 (can ssh but can't interact on the physical input)
17:29 biovoid: nothing in dmesg that I recall; I will double-check
17:31 biovoid: minimal xorg.conf—just the Device section with Identifier and Driver
17:32 biovoid: I have GXVBlank on right now but it didn't seem to affect this
17:32 biovoid: GLXV*
17:41 biovoid: I found some saved dmesg logs, but they are several configurations stale; I will get a fresh one later
17:58 imirkin: biovoid: ok =/
17:59 imirkin: definitely weird to get hangs if you're not doing anything
18:08 biovoid: I'm back under my unstable configuration; I will report back if/when it fails
18:12 imirkin: you might consider doing one full reclock
18:12 imirkin: it could be that the default boot clocks produce some sort of instability under load
18:12 imirkin: the blob driver insta-reclocks on load
18:13 imirkin: you can see the list of levels (pstates) in /sys/kernel/debug/dri/0/pstate
18:13 imirkin: and switch to one by echo'ing the id into that file
18:13 imirkin: this is likely to cause a brief flicker
18:15 biovoid: wait, echo back into the same file?
18:16 imirkin: yes
18:16 imirkin: but just the level
18:16 imirkin: e.g. 7
18:16 imirkin: or f or whatever
18:16 biovoid: I have five 2-digit options
18:16 imirkin: (diff boards have diff levels)
18:16 imirkin: you can put the leading 0 in, no problem
18:21 biovoid: imirkin: thanks—I'll wait for a failure as things are, then I'll try cycling through those states
18:22 imirkin: on those boards, usually 7 should be default (look at "AC" for current clocks)
18:29 biovoid_: well that didn't take long
18:29 imirkin: wow
18:29 imirkin: that bad??
18:29 imirkin: anything in dmesg?
18:30 imirkin: like a SCHED_ERROR or CTXSW_TIMEOUT?
18:42 biovoid: I have a bunch of general protection faults
18:45 imirkin: can you pastebin?
18:47 imirkin: (general protection makes it sound like cpu-side rather than gpu-side issues)
18:47 biovoid: I see v4l2 warnings alongside... now that I think of it, I have a bttv card installed (not in use though)—I wonder if that has anything to do with it
18:48 imirkin: i added some Xv sse acceleration logic a long while back
18:48 imirkin: perhaps i screwed up the enablement of it
18:55 biovoid: pastebin said it was too long but I've hosted it here: https://mirrod.in/kernel.log
18:55 imirkin: whoa
18:56 biovoid: ?
18:56 imirkin: yeah, that sould definitely seem like v4l is unhappy
18:56 imirkin: those are just WARNING's
18:56 imirkin: but probably not great
18:56 biovoid: Yeah, but I don't see anything that looks like a showstopper
18:57 imirkin: if nothing else, i'd nuke it for now to clean up the logs
18:57 imirkin: if that's an option
18:57 imirkin: ok
18:57 imirkin: so you really do have a nouveau problem
18:57 biovoid: what'd you find?
18:58 imirkin: at the end
18:58 imirkin: http://paste.debian.net/plainh/e8f7f82f
18:59 imirkin: both in kmem_cache_alloc_trace, whatever that is
18:59 imirkin: honestly i'd kill the v4l stuff for now, see if the issues still happen
18:59 imirkin: i dunno what those warnings are, but they can't be good
19:00 imirkin: note that another thread also dies in kmem_cache_alloc_trace, so perhaps nouveau is just a big user who gets caught up in the troubles
19:01 RSpliet: biovoid: perhaps out of abundance, but... is your DRAM alright?
19:02 biovoid: I don't understand the question
19:03 biovoid: oh the `DRAM ECC disabled` thing
19:03 biovoid: no clue
19:03 RSpliet: When there's so many random warnings in your logs, I'd also make sure your hardware is still alright. DRAM can break, and usually it's specific cells that break
19:03 RSpliet: Oh not even that
19:03 RSpliet: ECC is quite often disabled
19:03 imirkin: it's not random logs though
19:03 imirkin: there's one thing which is logging a lot
19:03 imirkin: seemingly some issue in v4l_usercopy or something
19:03 imirkin: perhaps it's doing something wrong or illegal, dunno
19:04 imirkin: actually i guess in v4l_querycap, so maybe bttv driver needs an update
19:04 imirkin: or maybe it needs firmware and you're using some weird distro which requires firmware to be shipped on the ROM of a board rather than uploaded from disk?
19:06 biovoid: this is Linux-libre, if that's what you mean
19:06 imirkin: yes, that is what i mean ;)
19:06 imirkin: iirc bttv stuff needs firmware
19:08 biovoid: Interesting; I have used the card successfully in the past with no issue
19:08 biovoid: Well, not from its active use
19:09 imirkin: maybe not, and i'm wrong
19:09 imirkin: or maybe you weren't using linux-libre at the time
19:12 biovoid: Hm I thought that was under this install but the logs disagree, so you might be right there
19:13 imirkin: last i used bttv proper was ... 2003? something like that
19:13 imirkin: so my memory isn't so crisp
19:42 biovoid: I've blacklisted bttv; no v4l spam yet... but we'll see if I'm done hanging
19:42 imirkin: :)
19:42 imirkin: with nouveau, hangs always lurk
19:42 biovoid: For what it's worth, the card still works under linux-libre ;)
21:45 biovoid: Still hangs, without a sign of v4l in dmesg
21:46 biovoid: only two lines logged at the time of crash
21:46 biovoid: general protection fault, probably for non-canonical address 0x45fb618697bef064: 0000 [#1] PREEMPT SMP NOPTI
21:46 biovoid: CPU: 4 PID: 741 Comm: Xorg Not tainted 5.10.6-gnu-1 #1
21:47 imirkin: can you include the errors in a pastebin somewhere?
21:48 biovoid: What errors?
21:48 imirkin: that eror
21:48 imirkin: wait, those were literally the only 2 lines?
21:48 imirkin: there wasn't some long trace?
21:49 imirkin: this is not an error mode i've ever seen tbh
21:49 biovoid: Not at the time of the crash, no
21:49 imirkin: while it's usually a safe bet to blame stuff on nouveau
21:50 imirkin: i'm not 100% sure that it's a nouveau issue
21:58 biovoid: For the moment I'm still playing with pstates I guess... Will also probably try other hardware, but I don't really know where else to look
22:01 biovoid: Oh I don't have drm.debug=14
22:29 biovoid: alright, I got something this time
22:30 biovoid: several traces after the `protection fault` line
22:30 biovoid: put the whole thing on https://mirrod.in/kernel.log
22:30 biovoid: see: nouveau_fence_new+0x33/0xb0
22:42 imirkin: biovoid: i don't think this is a GPU-side issue
22:43 imirkin: biovoid: so you can ignore the advice on pstates
22:45 biovoid: I think I've tried them all anyway now, ha
22:56 imirkin: biovoid: note that btrfs runs into the same problem
22:56 imirkin: i really don't think this is a nouveau issue, just that nouveau gets hosed by something
22:57 imirkin: i'd search around for kmem_cache_alloc_trace
22:57 imirkin: maybe others have run into those kinds of problems
22:57 imirkin: biovoid: someone else had that sort of issue: https://bugzilla.redhat.com/show_bug.cgi?id=1849198
22:58 imirkin: looks like it broke in some kernel, and is working again in some later kernel
22:59 imirkin: someone else with a similar problem: https://lore.kernel.org/lkml/333cfb75-1769-c67f-c56f-c9458368751a@molgen.mpg.de/
23:02 biovoid: Mmm I see