07:48shadow: some weird errors are happening when I plug in a second display to this Dell Precision M6500 (NVIDIA G92 / Quadro FX 2800M / Quadro FX 3800M)
07:50shadow: I'm almost at the point of installing a copy of Windows OS (I haven't touched that in 15+ years!) just to see if it works on a supported OS
08:03shadow: should I begin filling out a bug report somewhere in a ticketing system, or does someone want to triage it with me first ?
12:06PaulePanter: Hi. My colleague told me, building Mesa with OpenSWR, the X.Org server fails to start when loading the Nouveau module.
12:07PaulePanter: The cause might be an added AVX dependency.
12:09karolherbst: shadow: do you have a kernel log or anything?
12:09karolherbst: PaulePanter: ohh, interesting
12:09karolherbst: I don't see how it can affect things.. but having logs might help?
12:11karolherbst: ahh yeah
12:11karolherbst: "Illegal instruction at address" mhh
12:11PaulePanter: He told me, building without OpenSWR, two shared libraries libavx* something are not built, and then it works.
12:11karolherbst: but this is inside nouveau_dri.so ...
12:12karolherbst: maybe some compiler flags are added
12:12karolherbst: PaulePanter: what version of mesa?
12:14karolherbst: PaulePanter: mind filing a bug against mesa for this?
12:14PaulePanter: karolherbst: With 20.1, but my collegue said, it’s also present with Mesa 19.3.
12:14PaulePanter: karolherbst: Will do. Thank you.
12:14karolherbst: I guess some compiler options make their way through gallium or just lto/function inlining whatever breaks stuff
12:14karolherbst: needs a deeper analysis probably
12:25imirkin: shadow: pastebin dmesg, probably?
12:26imirkin: shadow: did it ever work in linux?
13:53PaulePanter: karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3077
13:53PaulePanter: Thank you again.
14:08linkmauve: So I finally managed to bench the 1070 against the UHD630!
14:09linkmauve: Unsurprisingly (to me), it was beaten pretty easily by the Intel one.
14:09linkmauve: The lowest framerate I’ve seen in the intro of Metroid Prime in Dolphin was 36fps, vs. 139fps for Intel.
14:10linkmauve: That’s about four times faster.
16:21imirkin: linkmauve: i'm a bit surprised it got beaten by that much
16:21imirkin: i guess depends on resolution
16:21imirkin: intel (in this configuration) has WAY higher memory bandwidth
16:21imirkin: coz it's not clocked to 10mhz
16:22imirkin: or 50 or whatever it is
16:28RSpliet: Don't think DRAM is ever clocked below 100-150MHz
16:39karolherbst: yeah.. that was only done on fermi/tesla
16:39karolherbst: since kepler it's usually higher
16:39karolherbst: but still.. laughable slow
16:40imirkin: still 10-20x slower than what intel would be getting
16:40karolherbst: probably not
16:41karolherbst: well.. 10x but not 20
16:42karolherbst: seems like on my gp107 it's like 810 vs 7000 MHz
16:42imirkin: with higher end gpu's, iirc, the default is lower
16:43imirkin: would be interesting to see though
16:43karolherbst: not anymore, this was the case on earlier gens
16:43imirkin: ah ok
16:43karolherbst: but even high end keplers boot with high clocks
16:43karolherbst: with.. turing this all changed anyway
16:44karolherbst: as even memory clocks are set on a finer scale
16:44karolherbst: the 780tI I checked is 648 vs 7000
16:45karolherbst: so.. kind of in this 10x difference area
16:45imirkin: yeah ok
16:45imirkin: so really just 5x or so
16:45imirkin: since intel is slower
16:45imirkin: but probably 2-3ghz range?
16:45karolherbst: ddr4 can be fast
16:46karolherbst: like 4ghz easily
16:46imirkin: consumer chips meant for standard laptop/desktop + cpu?
16:46karolherbst: the default is around 3.8
16:46karolherbst: or 3.6
16:46imirkin: been a while since i looked at hardware i guess
16:46imirkin: i thought ddr4-2333 was the thing
16:46karolherbst: there are tons of cheap 3GHz ones though
16:47karolherbst: but normally for machines you use for workstations you just pick 3.6 or 3.8
16:47karolherbst: not 2133?
16:47karolherbst: well 2133 is the slowest anyway
16:47karolherbst: then 2400
16:48karolherbst: but the range is like 2133 up to 4800
16:48imirkin: but ultimately that's just the rating
16:48imirkin: it's the cpu (now) that drives it
16:48imirkin: it could be rated for 100thz
16:48karolherbst: yeah.. but most CPUs doe XMP
16:48imirkin: which is?
16:48karolherbst: automatic RAM overclocking :p
16:49imirkin: i didn't realize we lived in the future
16:49imirkin: thanks for letting me know :)
16:49karolherbst: it is based on profiles
16:49karolherbst: so there is a "stock" speed like 3200
16:49karolherbst: and then the RAM and the CPU agree on XMP profiles both support
16:49imirkin: pfft, that's just peeking behind the curtain
16:49imirkin: simpler to treat it all as magic
16:49karolherbst: and clock up to 3800 or whatever
16:49imirkin: bigger number = better
16:49imirkin: MOAR FASTER
16:50karolherbst: Zen 2 supports up to 4266 :)
16:50karolherbst: beyond that you have to do your manual overclocking :p
18:56shadow: karolherbst: imirkin: I have persistence on in systemd log, and I can replicate the problem. When the problem occurs nouveau is freaking out and the system becomes mostly unusable (SysRq doesn't respond reliably)
18:58shadow: what I can see as a user is that Gnome display settings briefly shows the second monitor information after it has been plugged in, and then the screen goes mostly black except for showing what looks like a corrupted copy of the BIOS boot screen ("Press F12 for device selection" text is visible very small and that only happens at boot)
18:58shadow: the attached display never shows anything
18:59shadow: any tips or tricks for pulling out the info y'all would like to see from journalctl?
19:37RSpliet: Don't think Tesla clocked DRAM clocks that low either. Was really just Fermi
19:52shadow: RSpliet: what is "disp: ERROR 4 [INVALID_ARG]"? https://gist.github.com/eshattow/f59ca06b1d41a852ea7811c4049ac0a2
19:53RSpliet: No idea. Don't grep logs, there's useful information in the rest of the entries too
19:55shadow: RSpliet: okay, updated
20:04RSpliet: lyude: think you're the closest we're going to get to a display expert
20:04imirkin: you don't think skeggsb is closer?
20:05imirkin: (to a display expert)
20:05RSpliet: imirkin: he's further away... but he wrote the code, so probably ;-)
20:06imirkin: ah, so you mean distance from nvidia hq? :)
20:06imirkin: anyways, let's see what this means, gimme a sec
20:06shadow: imirkin: :)
20:07RSpliet: The method 700: Is it trying to set "pixel depth" (PIOR_SET_CONTROL) - which isn't supported on g92?
20:08imirkin: yeah, that's not in 827d
20:09imirkin: only there starting with 837d
20:09imirkin: 827d = g84, g86, g92
20:09imirkin: g94+ have ... well, something else
20:10imirkin: (diff ones have diff things)
20:10RSpliet: Yep, that's as far as I got. Been scanning through the open-gpu-doc a bit
20:10imirkin: yeah. same. but i know how to use it ;)
20:10imirkin: and know lots of the class ids offhand
20:10RSpliet: I'm learning as we speak :-)
20:10imirkin: although there's a nice README.txt file there
20:10imirkin: which tells you a lot of this info
20:11imirkin: so wtf is pixel depth? i forget. shadow is this a laptop and you're driving LVDS or eDP or something?
20:11RSpliet: Yep, that's how I thought it might be the pixel depth bits that shouldn't be there :-D
20:12shadow: imirkin: this is an older laptop (Dell Precision M6500) with displayport, and I just received a new monitor (BenQ EX3501R ultrawide 100hz)
20:13shadow: if it doesn't work, that's fine, or if it's not supposed to work. What's strange is the responsiveness of the computer slows to a crawl and I see error messages, and the computer is unusable. So I think possibly this is a bug and would like to do my part to provide any information needed.
20:14RSpliet: The other error is trying to set a pixel clock mode that doesn't exist...
20:14imirkin: shadow: no, it's fine. just trying to understand what kind of panel is being driven.
20:14imirkin: shadow: can you provide edid from that panel?
20:15imirkin: oh, it's in your log
20:16imirkin: but it's the wrong edid
20:16imirkin: rather, it's the edid for your main panel
20:17imirkin: shadow: ok, so it's trying to drive the DP panel at 30bpp
20:17imirkin: which it most likely supports, but your hw does not
20:17imirkin: shadow: there have been recent patches to actually force DP to be 8bit no matter (24bpp)
20:18imirkin: (or less, like 18bpp for some internal panels, but not yours)
20:18imirkin: shadow: i expect if you grab kernel 5.7, everything will be fine
20:18imirkin: or if you set the "max bpc" property to 8 on the connector
20:18shadow: really exciting to hear that, I will look for one now
20:18imirkin: OTOH, perhaps we always set the pixel depth value, and it will always complain. dunno.
20:19shadow: I'll look for a mainline kernel build to install on this Debian OS
20:20RSpliet: imirkin: https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/dispnv50/pior507d.c line 31 is scarily not-very-conditional given that field doesn't exist on 507d
20:21imirkin: RSpliet: ehhh maybe. or maybe or.depth is supposed to be 0 on those GPUs. could go either way.
20:21imirkin: although probalby isn't :)
20:21imirkin: so yeah. there should probably be a pior837d
20:22imirkin: and also a thing to make sure depth == "default" for earlier things
20:22imirkin: shadow: unclear if 5.7 will help you actually..
20:26RSpliet: The clock issue is probably "just" nouveau trying to drive a head at a frequency it doesn't support. Mode validation should reject that mode
20:28RSpliet: 639500KHz sounds a bit excessive
20:30imirkin: that's a wee high
20:30shadow: imirkin: I just tried with 5.7.0-rc5 as that was quick and easy to do, no change. Looking for a 5.7.0 final release
20:31RSpliet: shadow: these are bugs worth reporting to... oh, we don't have a bug tracker
20:32shadow: I did look for a bug tracker it seemed to be a broken trail of crumbs
20:32RSpliet: Bugzilla is extinct. Gitlab was the promised follow-up, but it's not really set up in a meaningful way
20:33imirkin: yeah, bz was nice
20:33imirkin: had the ux that worked for me (and lots of others, i think)
20:33imirkin: unfortunately had various security issues, and the maintainers threw up their hands and installed gitlab
20:34RSpliet: Querying bugs was a bit slow, but otherwise it worked fine.
20:35imirkin: most importantly it had email integration
20:35imirkin: e.g. bug updates would go to nouveau@, etc
20:35imirkin: and i see email as the great equalizer
20:35imirkin: gitlab is a closed community
20:35shadow: Updating that gist with output from 5.7.0-rc5 kernel
20:36imirkin: but apparently i'm in a minority of people who feel that way, so ... i lose
20:38imirkin: yeah, so it still hates the value "6", which makes sense. but also a bit odd that it still gets a 6 and not a different value
20:44imirkin: shadow: so i'm guessing that there are a few problems
20:44imirkin: like that the display is advertising much higher bw things than the gpu can handle, and somehow we're not filtering them out
20:51shadow: imirkin: that makes sense, I hardly expect this laptop's GPU to drive the two panels as an extended desktop, but I was hopeful for mirrored display in 1080p
20:51RSpliet: shadow: it would be possible once the bugs are fixed, although no idea what kind of maximum refresh rate it'd go to
20:52imirkin: shadow: yeah, but it immediately tries to drive it at like 100hz or whatever
20:52imirkin: and 1440p@100hz is ... a lot
20:52imirkin: i'm guessing
20:53RSpliet: imirkin: aren't there kernel params we can use to force nouveau into a certain mode for that head?
20:53imirkin: 542mhz, looking at the "standard" cvt mode
20:53imirkin: RSpliet: i don't think so
20:53shadow: :) it is a lot, the desktop build I am putting together with RX 5700XT (amdgpu) is not expected to be able to drive it at full rate most of the time
20:53imirkin: shadow: yeah, but when you plug in, i think the DE will just max it out by default.
20:54shadow: imirkin: this last attempt I was able to manage to ctrl-alt-F3, alt-F3, and get to a login on the vt 3; it was responsive enough then I could run "DISPLAY=:0 xrandr"
20:55shadow: maybe I could do that before as well, I'm open to suggestions if there is a workaround or something I may try
20:55RSpliet: shadow: your computer is very busy hearing the NVIDIA GPU nagging that the display block is in an illegal state, and not doing anything about it.
20:57RSpliet: imirkin: I guess the "force-feed DRM/xrandr an EDID with some default modes" doesn't work either?
21:01shadow: updated gist https://gist.github.com/eshattow/f59ca06b1d41a852ea7811c4049ac0a2 for 5.7.0 stable kernel, confirm issue remains.
21:01imirkin: shadow: unfortunately i don't have time to look into it
21:02shadow: imirkin: okay :)
21:02shadow: also xrandr output https://gist.github.com/eshattow/5969050adce1ae423954710e10c26c9e
21:03imirkin: mind doing xrandr --verbose
21:03imirkin: shadow: also mind doing like
21:03shadow: okay will try
21:03imirkin: xrandr --output DP-1 --mode 1920x1080 --right-of LVDS-1
21:03imirkin: er wait
21:03imirkin: don't do that
21:03shadow: maybe the 60hz one 1080i?
21:04imirkin: xrandr --output DP-1 --mode 1920x1080 --rate 60 --right-of LVDS-1
21:05imirkin: i don't know why we don't filter out those crazy high frequencies
21:05imirkin: would be interested in the edid (which should be seen with xrandr --verbose)
21:06karolherbst: imirkin: https://github.com/NVIDIA/open-gpu-doc/commit/7df700e91c0a9d07af30a7f826349865957b28ed
21:07imirkin: karolherbst: cool
21:07karolherbst: not the classes but finally some graphics stuff
21:07imirkin: the CLASS_ERROR_* stuff a nice
21:07karolherbst: also the blcg stuff
21:07karolherbst: imirkin: search for zcull :p
21:08imirkin: i'd rather not
21:08karolherbst: :D it's not that bad
21:08karolherbst: mostly just enable/disable bits and some counters
21:09karolherbst: NV_PGRAPH_PRI_FE_SEMAPHORE_STATE_D_REPORT_ZCULL_STATS0-3 might help implementing it
21:09karolherbst: NV_PGRAPH_CLASS_ERROR_CODE_ERROR_ZCULL_SUBREGION_LIMBO :D
21:11shadow: imirkin: updated gist for the xrandr --verbose output https://gist.github.com/eshattow/5969050adce1ae423954710e10c26c9e
21:12shadow: imirkin: "xrandr --output DP-1 --mode 1920x1080 --rate 60 --right-of LVDS-1" failed to set the display I don't have the exact error it was something like crts
21:13karolherbst: imirkin: NV_PGRAPH_TRAPPED_ADDR :O
21:13karolherbst: ohh wait
21:13karolherbst: we already use it :D
21:14imirkin: shadow: fyi, this is the decoded edid for your monitor: https://hastebin.com/ohocejexit.sql
21:14imirkin: oh wow, continuous frequency. never seen that before.
21:18RSpliet: Even the monitor specifies a max dotclock of 600MHz. Why is nouveau trying to set a clock of 639.5MHz?
21:18RSpliet: Is it cocking up its calculations?
21:19imirkin: it's also way out of range of what you can do even with 4 lanes
21:19RSpliet: Or am I misinterpreting the clock field :-) thought it was simply in KHz
21:19imirkin: no, you're not
21:19imirkin: it might be in 10khz, i forget
21:19imirkin: either way, the numbers make no sense
21:20RSpliet: If it were 10khz, it would try to set 6.3GHz
21:20imirkin: but it's a 16-bit field ... or is it 21?
21:20RSpliet: I mean, wrong is wrong, there's no limits to how wrong it could be :-D
21:20RSpliet: It's larger
21:21RSpliet: 22 bits even
21:22imirkin: ok, so max value is ~4 million
21:49Lyude: imirkin: i can try taking a look soon, I've still got a bit of work to do though :s
21:49Lyude: also if you guys are adding clock limits you should also probably hook them up in mst while you're at it, but someone needs to review my pbn mode_valid patches for drm/i915 first