10:32karolherbst: imirkin: if you got some time, some user report issues with gtk4
10:33karolherbst: uhm.. on nv50
11:54HerrSpliet: imirkin: remember when I said I would be happy to do some HDR/4K experiments once I get my new monitor?
11:54HerrSpliet: About that... that monitor's been delayed by 3 weeks already
11:54HerrSpliet:cries in Brexit
11:58karolherbst: RSpliet: have fun :p
11:59karolherbst: but I do have an HDR/4K display, so I could try out things as well
15:23imirkin: karolherbst: probably not today, but maybe on the weekend
15:58biovoid: how does nouveau.noaccel=1 in the kernel differ from NoAccel in xorg.conf? xorg logs say "falling back to NoAccel" with the kernel param, but directly using NoAccel instead gives a substantial drop in performance
16:00imirkin: noaccel doesn't allow userspace to submit any commands to the GPU
16:00imirkin: (but still allows fbcon to be accelerated)
16:00imirkin: NoAccel in the xorg driver is just telling the Xorg driver to use internal xorg rendering instead of accelerated commands
16:01imirkin: their perf should be identical if you're just using xorg
16:07biovoid: "just xorg" as opposed to what? the difference I am experiencing is substantial
16:07imirkin: which one's slower?
16:08biovoid: NoAccel in xorg.conf is slower
16:08imirkin: don't have an explanation, sorry
16:09biovoid: Well thanks for the prior insight regardless
16:09imirkin: perhaps NoAccel enables some stuff which is slower than it needs to be
16:29imirkin: biovoid: what are you doing which is slower? i might try it out later
16:29imirkin: biovoid: also can you confirm which ddx is getting used when the kernel has noaccel=1
16:34biovoid: imirkin: I just loaded an Xfce session, and simple things like opening menus and moving windows were painfully chunky and sluggish
16:34imirkin: ok thanks
16:34imirkin: and what CPU / GPU?
16:35biovoid: re ddx, think the answer you want is llvmpipe, but if you have a command to get a specific string I can do that
16:35imirkin: llvmpipe is a gallium backend
16:35imirkin: the ddx is the driver Xorg loads, which is completely unrelated to mesa
16:36imirkin: you can work it out from the xorg log
16:36imirkin: does it say like NOUVEAU(0) or modeset(0) (those are the likely options)
16:37biovoid: Oh whoops. I see NOUVEAU(0)
16:37imirkin: in both cases presumably?
16:37biovoid: I'm running dual Opteron 4280 with GTX 670
16:38imirkin: is that an ancient CPU? (sorry, less familiar with the opteron line), or is it approximately the same age as the GTX 670?
16:38imirkin: so like ... ~6 years?
16:39biovoid: iirc the opteron is 2013
16:40biovoid: maybe 2011
16:41biovoid: reasonably close to the 670's 2012
16:41imirkin: ok. but not like the original line from 2006 or whatever :)
16:41biovoid: heh nah
16:42imirkin: wow, 670 was 2012? time flies.
16:42imirkin: yeah, i guess maxwell came out in 2014 or so.
16:55biovoid: Thinking of upgrading (slightly), but should I expect anything Kepler to require noaccel to function? (I get random freezes without it)
17:07imirkin: kepler's the best-supported arch
17:07imirkin: you get reclocking/etc
17:08imirkin: but nouveau's not perfect, so some software may not play nice with its foibles
17:09imirkin: i would not imagine a different arch would behave very differently for the same software
17:19biovoid: To clarify, I'm looking for a configuration which does not require disabling hw accel. Or at least some insight as to why it is so commonly needed
17:21imirkin: do you run lots of software whose name starts with a 'k'?
17:22biovoid: You mean like KDE stuff? not particularly
17:23imirkin: yes :)
17:23imirkin: esp plasmashell
17:23imirkin: which i suppose doesn't start with a 'k'
17:23biovoid: Actually none of my explicitly insalled packages start with k, as it so happens :P
17:24biovoid: but no, no plasma
17:24imirkin: not even kpathsea?
17:24imirkin: (which has nothing to do with kde)
17:24imirkin: anyways, having common hangs is rather unexpected
17:25imirkin: esp if you're not doing anything fancy
17:25biovoid: I'm on Parabola—kpathsea appears to be bundled in texlive
17:25imirkin: there are some GTX 660's which play particularly poorly with nouveau
17:25imirkin: (but not all GTX 660's)
17:25imirkin: but i've never heard of that with GTX 670
17:26biovoid: I've had hangs when literally doing nothing in X
17:26biovoid: I mean the session existed, but I'd be remoted in and suddenly unresponsive
17:26imirkin: is there any additional info on top of "it hangs"?
17:26imirkin: anything in dmesg?
17:27imirkin: also any type of custom configuration in the xorg.conf for the nouveau driver?
17:29biovoid: From HangDiagnosis, previously I had problems at 4, but recent attempts seem to stop at 3 (can ssh but can't interact on the physical input)
17:29biovoid: nothing in dmesg that I recall; I will double-check
17:31biovoid: minimal xorg.conf—just the Device section with Identifier and Driver
17:32biovoid: I have GXVBlank on right now but it didn't seem to affect this
17:41biovoid: I found some saved dmesg logs, but they are several configurations stale; I will get a fresh one later
17:58imirkin: biovoid: ok =/
17:59imirkin: definitely weird to get hangs if you're not doing anything
18:08biovoid: I'm back under my unstable configuration; I will report back if/when it fails
18:12imirkin: you might consider doing one full reclock
18:12imirkin: it could be that the default boot clocks produce some sort of instability under load
18:12imirkin: the blob driver insta-reclocks on load
18:13imirkin: you can see the list of levels (pstates) in /sys/kernel/debug/dri/0/pstate
18:13imirkin: and switch to one by echo'ing the id into that file
18:13imirkin: this is likely to cause a brief flicker
18:15biovoid: wait, echo back into the same file?
18:16imirkin: but just the level
18:16imirkin: e.g. 7
18:16imirkin: or f or whatever
18:16biovoid: I have five 2-digit options
18:16imirkin: (diff boards have diff levels)
18:16imirkin: you can put the leading 0 in, no problem
18:21biovoid: imirkin: thanks—I'll wait for a failure as things are, then I'll try cycling through those states
18:22imirkin: on those boards, usually 7 should be default (look at "AC" for current clocks)
18:29biovoid_: well that didn't take long
18:29imirkin: that bad??
18:29imirkin: anything in dmesg?
18:30imirkin: like a SCHED_ERROR or CTXSW_TIMEOUT?
18:42biovoid: I have a bunch of general protection faults
18:45imirkin: can you pastebin?
18:47imirkin: (general protection makes it sound like cpu-side rather than gpu-side issues)
18:47biovoid: I see v4l2 warnings alongside... now that I think of it, I have a bttv card installed (not in use though)—I wonder if that has anything to do with it
18:48imirkin: i added some Xv sse acceleration logic a long while back
18:48imirkin: perhaps i screwed up the enablement of it
18:55biovoid: pastebin said it was too long but I've hosted it here: https://mirrod.in/kernel.log
18:56imirkin: yeah, that sould definitely seem like v4l is unhappy
18:56imirkin: those are just WARNING's
18:56imirkin: but probably not great
18:56biovoid: Yeah, but I don't see anything that looks like a showstopper
18:57imirkin: if nothing else, i'd nuke it for now to clean up the logs
18:57imirkin: if that's an option
18:57imirkin: so you really do have a nouveau problem
18:57biovoid: what'd you find?
18:58imirkin: at the end
18:59imirkin: both in kmem_cache_alloc_trace, whatever that is
18:59imirkin: honestly i'd kill the v4l stuff for now, see if the issues still happen
18:59imirkin: i dunno what those warnings are, but they can't be good
19:00imirkin: note that another thread also dies in kmem_cache_alloc_trace, so perhaps nouveau is just a big user who gets caught up in the troubles
19:01RSpliet: biovoid: perhaps out of abundance, but... is your DRAM alright?
19:02biovoid: I don't understand the question
19:03biovoid: oh the `DRAM ECC disabled` thing
19:03biovoid: no clue
19:03RSpliet: When there's so many random warnings in your logs, I'd also make sure your hardware is still alright. DRAM can break, and usually it's specific cells that break
19:03RSpliet: Oh not even that
19:03RSpliet: ECC is quite often disabled
19:03imirkin: it's not random logs though
19:03imirkin: there's one thing which is logging a lot
19:03imirkin: seemingly some issue in v4l_usercopy or something
19:03imirkin: perhaps it's doing something wrong or illegal, dunno
19:04imirkin: actually i guess in v4l_querycap, so maybe bttv driver needs an update
19:04imirkin: or maybe it needs firmware and you're using some weird distro which requires firmware to be shipped on the ROM of a board rather than uploaded from disk?
19:06biovoid: this is Linux-libre, if that's what you mean
19:06imirkin: yes, that is what i mean ;)
19:06imirkin: iirc bttv stuff needs firmware
19:08biovoid: Interesting; I have used the card successfully in the past with no issue
19:08biovoid: Well, not from its active use
19:09imirkin: maybe not, and i'm wrong
19:09imirkin: or maybe you weren't using linux-libre at the time
19:12biovoid: Hm I thought that was under this install but the logs disagree, so you might be right there
19:13imirkin: last i used bttv proper was ... 2003? something like that
19:13imirkin: so my memory isn't so crisp
19:42biovoid: I've blacklisted bttv; no v4l spam yet... but we'll see if I'm done hanging
19:42imirkin: with nouveau, hangs always lurk
19:42biovoid: For what it's worth, the card still works under linux-libre ;)
21:45biovoid: Still hangs, without a sign of v4l in dmesg
21:46biovoid: only two lines logged at the time of crash
21:46biovoid: general protection fault, probably for non-canonical address 0x45fb618697bef064: 0000 [#1] PREEMPT SMP NOPTI
21:46biovoid: CPU: 4 PID: 741 Comm: Xorg Not tainted 5.10.6-gnu-1 #1
21:47imirkin: can you include the errors in a pastebin somewhere?
21:48biovoid: What errors?
21:48imirkin: that eror
21:48imirkin: wait, those were literally the only 2 lines?
21:48imirkin: there wasn't some long trace?
21:49imirkin: this is not an error mode i've ever seen tbh
21:49biovoid: Not at the time of the crash, no
21:49imirkin: while it's usually a safe bet to blame stuff on nouveau
21:50imirkin: i'm not 100% sure that it's a nouveau issue
21:58biovoid: For the moment I'm still playing with pstates I guess... Will also probably try other hardware, but I don't really know where else to look
22:01biovoid: Oh I don't have drm.debug=14
22:29biovoid: alright, I got something this time
22:30biovoid: several traces after the `protection fault` line
22:30biovoid: put the whole thing on https://mirrod.in/kernel.log
22:30biovoid: see: nouveau_fence_new+0x33/0xb0
22:42imirkin: biovoid: i don't think this is a GPU-side issue
22:43imirkin: biovoid: so you can ignore the advice on pstates
22:45biovoid: I think I've tried them all anyway now, ha
22:56imirkin: biovoid: note that btrfs runs into the same problem
22:56imirkin: i really don't think this is a nouveau issue, just that nouveau gets hosed by something
22:57imirkin: i'd search around for kmem_cache_alloc_trace
22:57imirkin: maybe others have run into those kinds of problems
22:57imirkin: biovoid: someone else had that sort of issue: https://bugzilla.redhat.com/show_bug.cgi?id=1849198
22:58imirkin: looks like it broke in some kernel, and is working again in some later kernel
22:59imirkin: someone else with a similar problem: https://firstname.lastname@example.org/
23:02biovoid: Mmm I see