11:25Ormu: hello, I'm having problems with Arch Linux + Nouveau + Wayland + KDE/Enlightenment. KDE keeps freezing or crashing, and Enlightenment doesn't even start with Wayland (apparently segfaults). GPU is GTX 980 Ti
11:25Ormu: I don't know if this is a Nouveau problem, but at least it's not limited to a single desktop environment
11:26linkmauve: Ormu, it’s probably better to go see the developers of both of these DEs.
11:42dviola: linkmauve: are you sure it's a problem with the DEs? he tried weston also and he's able to reproduce the issue with that as well
11:42linkmauve: Then probably not. :)
11:43linkmauve: A stack trace would be helpful.
11:43Ormu: weston can be started on a tty but it causes that tty to freeze until the weston processes are terminated
11:46Ormu: how to produce a stack trace?
11:46Ormu: should i try starting weston as root?
11:47linkmauve: If you are using systemd, you just have to run `weston` as your user, it will take care of setting up the TTY using logind.
11:47linkmauve: No need for root.
11:48linkmauve: First, try getting the output of this weston process by redirecting its stderr into a file.
12:06Ormu: (X was running on tty1 when I started weston on tty2, and then I used tty3 to terminate weston)
12:16linkmauve: [14:54:14.594] failed to create kms fb: Invalid argument
12:16linkmauve: [14:54:14.594] failed to get drm_fb for bo
12:16linkmauve: Sounds like a DRM issue.
12:17linkmauve: Try writing 0xff into /sys/module/drm/parameters/debug, then start it, close it, write back 0 in that file, and then look at dmesg for what Nouveau has to say about it.
12:17linkmauve: I have to leave, so post that for someone else to debug.
12:18linkmauve: Also, try Ctrl-Alt-Backspace instead of another TTY to close Weston, it doesn’t look any frozen or crashed here.
12:18Ormu: hm, ok
12:19Ormu: is it enough to do dmesg | grep nouveau after doing that ^^^
12:24Ormu: dmesg | grep nouveau: https://bpaste.net/show/76b35d8d4e7e
12:26Ormu: dmesg grep -i drm: https://bpaste.net/show/0102997f40ee
12:41karolherbst: mslusarz: decided to print the code after COMPUTE.FLUSH = CODE, but it looks like that: https://gist.githubusercontent.com/karolherbst/bdf3b5be320267c93825c4528cce38fc/raw/b6e17b3c08dacf55f1c52b3818272e5a81363019/gistfile1.txt :/
12:44karolherbst: (when doing that mmiotrace+mmt thing)
12:59karolherbst: mhh, fflush helps
13:06karolherbst: wow, mmt or demmt can't keep up with that and my test application doesn't even reach the end
13:11karolherbst: ohh, that's just mmt crashing
13:13karolherbst: mhh, could be some weird SVM interactions
13:15karolherbst: ahh, it only happens with MALLOC_MMAP_THRESHOLD_=0
13:16karolherbst: it crashin within nvidia libraries
13:16karolherbst: inside the clEnqueueSVMMemFill call
14:57pabs3: I had to reboot my system because input didn't do anything and the display wasn't updating (even when switching VCs) (except the mouse cursor until I replugged it) (but sound was still playing).
14:57pabs3: after reboot I looked at the logs and found this: https://paste.debian.net/hidden/3f18a5b9/
14:57pabs3: any thoughts?
15:02imirkin: karolherbst: there are funny flush issues when demmt crashes
15:03imirkin: Ormu: when in doubt, it's a nouveau issue. tons of people have trouble with KDE. stick to simpler environments.
15:03imirkin: or AMD gpu's
15:14Ormu: imirkin: I'm having similar problems with Enlightenment + Wayland too
15:15karolherbst: imirkin: it seems to be some nvidia bug though, and it wasn't demmt crashing, just envydis never calling fflush
15:15imirkin: enlightenment is also a very heavy/complex environment
15:15imirkin: karolherbst: but when envydis exits its outputs should get flushed anyways
15:15imirkin: oh, but demmt calls in directly...
15:15karolherbst: and was writing into the tracer mark file
15:15imirkin: funny - never ran into anything like that
15:15Ormu: imirkin: and plain Weston started from a tty has problems too
15:15imirkin: karolherbst: sounds like memory corruption maybe then
15:16karolherbst: well, you only get it when you do an mmiotrace + synced mmt+demmt
15:16imirkin: Ormu: ok. well it'd be good to start from a working environment and going from there
15:16imirkin: start with something that works, and then make changes until it doesn't work, to see which change is the one that breaks everything
15:17Ormu: everything works well if I don't use Wayland, e.g. just KDE with X11
15:17Ormu: but none of these three Wayland implementations work properly
15:17imirkin: i see.
15:17imirkin: that's a good starting point.
15:17imirkin: start with weston
15:18imirkin: that one's pretty simple.
15:18imirkin: if that has problems, then everything else will too
15:20imirkin: so ... what happens when you run weston?
15:21imirkin: pabs3: i think a PTE fault from PROP means that the rendertarget is missing somehow
15:22Ormu: imirkin: it prints a few messages and the tty freezes until Weston is terminated -> https://bpaste.net/show/fa5bbc26a74d
15:25imirkin: i see. that's what linkmauve was looking at.
15:26imirkin: as he pointed out, "failed to create kms fb: Invalid argument" does not inspire one with great confidence
15:27imirkin: my guess is that something nouveau does tickles a weston bug
15:27imirkin: let's try something simpler
15:28imirkin: Ormu: can you build code? (what's your general skill level?)
15:29Ormu: imirkin: <linkmauve> Try writing 0xff into /sys/module/drm/parameters/debug, then start it, close it, write back 0 in that file, and then look at dmesg for what Nouveau has to say about it. ----> the result is here: https://bpaste.net/show/76b35d8d4e7e
15:29Ormu: imirkin: depends on code, maybe
15:29imirkin: yeah i saw it -- too much output, i'm not familiar enough with what it's supposed to be
15:30imirkin: grab that git tree, autogen.sh, configure, make
15:30imirkin: (no need to install)
15:30karolherbst: imirkin: mhh, I think I actually have to write a cuda application to get nvidia to actually use ther trap handler :( Now I can reliably trigger a fault, but nvidia just messes up as we do
15:30karolherbst: throws an "NVRM: Xid (PCI:0000:01:00): 31, Ch 00000008, engmask 00000101, intr 10000000" into dmesg and the process just hangs (until it gets killed)
15:31imirkin: karolherbst: i spent a bit of time thinking about how to make a presentable api for fault handling and single stepping for shaders
15:31imirkin: karolherbst: i came up with nothing :)
15:31imirkin: the problem is that you have a bunch of parallel invocations
15:31karolherbst: I am not that far either
15:31karolherbst: that isn't a problem though
15:31karolherbst: as you can pause all threads and continue one
15:31imirkin: and you want to be able to look at one or another
15:31imirkin: so how to enable selection of all that to the user
15:31karolherbst: well, warp that is, I doubt we can continue threads
15:31imirkin: in a way that makes sense to them
15:31karolherbst: maybe with volta we can
15:32karolherbst: yeah.. dunno
15:32karolherbst: I am not sooo far yet
15:32karolherbst: main goal is just to dump state of the warp/thread triggerin the fault
15:32imirkin: well, i was, at the time, just thinking about how to make that presentable
15:32imirkin: to figure out what the end goal is
15:32imirkin: and i couldn't even see that.
15:32karolherbst: and this is trivial compared to coming up with the debugger API
15:32imirkin: so i didn't spend too much time on it
15:32karolherbst: I see
15:32karolherbst: there are some special regs which can tell the trap handler what happened
15:33imirkin: and esp once you add graphics to the mix
15:33imirkin: i.e. what goes into the making of a single pixel
15:33karolherbst: so my frst goal was to read out what happened, and just dump state
15:33imirkin: it's soooo incredibly complicated
15:33karolherbst: I think if we know which shader and which instruction triggered it is already better than what we have today :)
15:34karolherbst: dumping regs to give some context
15:34imirkin: karolherbst: for traps, sure
15:34imirkin: i was looking at it with an eye towards debugging
15:34karolherbst: yeah... would be nice to have, but that would mean spending quite a lot of time into figuring everything out
15:34imirkin: on nv50, we would actually get an interrupt with the 8 bytes of faulting instruction iirc
15:34karolherbst: nvidias ttrap handling code is _huge_
15:34imirkin: (or a pointer into the code segment, i forget)
15:34karolherbst: it is so huge, they don't even upload it as one segment
15:35imirkin: with nvc0 ... trap handlers, etc. sad.
15:35karolherbst: and have various calls into each other
15:35imirkin: now you know why those GPUs have so much ram -- to handle the trap code ;)
15:36Ormu: imirkin: done
15:36karolherbst: imirkin: I have 499 flushed to compute code :)
15:36karolherbst: so basically 499 code uploads
15:36karolherbst: for one tiny cl kernel
15:36karolherbst: could be a lot more stuff though
15:36imirkin: Ormu: ok. so once you run kmscube, you SHOULD be able to press any key to exit (or at least space, or escape, or enter)
15:36karolherbst: like the cuda builtin lib
15:36imirkin: if you kill it -- it doesn't exit so gracefully as weston might
15:36imirkin: even if you don't, it might not be so graceful
15:37imirkin: iirc you'll have to switch vt's back and forth
15:37imirkin: when and if it runs, you should see a spinning cube with some texture
15:37Ormu: imirkin: should i launch it from inside X11 or from a tty?
15:37imirkin: a tty. it's like a weston-lite
15:37imirkin: super-lite :)
15:41imirkin: karolherbst: what's going on with the compat-mode support? did you land your patch(es) for that?
15:41karolherbst: imirkin: I only had that one and I pushed it
15:41imirkin: ok awesome
15:41imirkin: are you aware of any additional specific issues?
15:41karolherbst: but we got an internal bug for some VNC setup doing something
15:41karolherbst: but... it shouldn't be a compet issue per se
15:41imirkin: presumably i can't see that bug?
15:41karolherbst: yeah, you can't sadly
15:42karolherbst: it isn't a compat bug though
15:42imirkin: no, that makes me happy
15:42karolherbst: just some application needing more than 3.1
15:42imirkin: if i can't see it, it doesn't exist.
15:42imirkin: well, i'm going to try to get 4.5 up and running
15:42karolherbst: the issue is something like VNC only displays black windows when there is no display attached :)
15:42imirkin: or 4.4 or whatever the max-du-jour is
15:42karolherbst: it should work afaik
15:42karolherbst: at least when I was running the piglit tests the last time they all passed
15:42karolherbst: there are some missing bits
15:43imirkin: which will get filled in over time
15:43karolherbst: ask mareko, he should know more or tarceri
15:43imirkin: or are you aware of anything specific?
15:43imirkin: i mean missing *in nouveau*
15:43imirkin: the feedback/select stuff is missing, but that's more general
15:43karolherbst: nothing if you compare to readonsi
15:43imirkin: ok cool
15:43karolherbst: no idea if they pushed more patches
15:44imirkin: alright. i'll play around with it. thanks for the update.
15:44karolherbst: would be nice to do a piglit run, one with compat disabled one with enabled
15:44karolherbst: and check all new tests
15:44Ormu: imirkin: it works, and printed this: https://bpaste.net/show/43f5568c6bdc
15:45imirkin: ok. so something works. that's nice.
15:46imirkin: Ormu: what if you downgrade weston? you're using a pretty new version... perhaps it plays with modifiers and fails somehow.
15:47Ormu: hm :o
15:52Ormu: imirkin: which version is recommended? https://archive.archlinux.org/packages/w/weston/
15:52imirkin: dunno. just try one that's older than yours (5.0.0 iirc)
15:54imirkin: i just use Xorg ;)
15:54imirkin: i've played with weston a bit exclusively for testing purposes
15:54Ormu: ok, Weston 2.0.0(-2) installed...
15:57Ormu: this version doesn't start because it can't find libraries... https://bpaste.net/show/ddc0c83f203e
15:58Ormu: let's try v. 3
16:00imirkin: hurray for binary distros
16:04Ormu: imirkin: Weston 3.0.0 works in a tty
16:04imirkin: you should check with the weston folk
16:05imirkin: they'll be much better able to debug this than i
16:09Ormu: here's Weston 3.0.0 output: https://bpaste.net/show/42aa11eeaed9
16:10imirkin: not sure what you're looking for.
16:11Ormu: trying to figure out why neither of those Wayland desktops works :|
16:11imirkin: you said weston worked
16:11Ormu: version 3, but not version 5
16:11imirkin: so go talk to the weston guys
16:14Ormu: hm :|
17:45vita_cell: guys can not to reclock again
17:46vita_cell: how to mount debug filesystem?
17:46imirkin: mount -t debugfs none /sys/kernel/debug
17:47imirkin: but it's probably already mounted.
17:48vita_cell: no, I am in Void, it doesn't mount automatically, and I dont know why
17:48vita_cell: thanks imirkin, it does work now
18:57pendingchaos: mwk: thoughts on the proposed envyas feature for automatic scheduling information calculation?
18:58pendingchaos: (probably also with an independent tool for it)
19:02mwk: tbh, I don't like it
19:02mwk: the envyas is just a big hack with no understanding of anything, and adding such complexity to it is bound to leave an even bigger mess
19:02mwk: it might be a better idea to do it as a separate preprocessor
19:03mwk: but if you have a plan on how to do it in envyas and want to go through with it, I won't stop you
19:05karolherbst: imirkin: so, compat results are good?
19:06karolherbst: imirkin: why did you start to care that much to actually send a patch though?
19:14imirkin: karolherbst: why not
19:14imirkin: it's a thing to do.
19:15imirkin: i'm gonna try nv50 too
19:15karolherbst: sure, but when I asked you back then you said something like we should check if we are missing anything important or something like that
19:15karolherbst: just wondering
19:16karolherbst: or maybe I missunderstood
19:16imirkin: oh, well there were a few concrete things that i thought were missing
19:16imirkin: but turned out to work
19:16karolherbst: ahh, okay
19:16imirkin: like the whole clip distance thing
19:16imirkin: i'm gonna look at clip vertex for a bit
19:16imirkin: but i think it should all work
19:16imirkin: esp with your fix
19:16karolherbst: there are piglit tests for that I think
19:16karolherbst: at least for hte clip vertex stuff
19:17imirkin: well it's mostly around changing the number of planes
19:18imirkin: i may write a more targeted test just to be sure
19:26imirkin: karolherbst: btw, had you noticed that gl_TexCoord issue with crashing? i'm surprised that i had to fix it since it's in common code
19:29karolherbst: uhm, don't know
19:29karolherbst: maybe some regression?
19:29karolherbst: which test was triggering it?
19:30karolherbst: ohh found it
19:30karolherbst: can't remember if I saw it and simply forgot
19:31imirkin: k, well wtvr
19:31imirkin: it's "fixed"
19:31karolherbst: I see
19:31imirkin: with the first patch i sent
19:32imirkin: someone who hasn't yet lost the will to live can do that patch "for real", i.e. doing that opt properly in the presence of a gl_in.
19:32karolherbst: I hope drivers don't depend on that, but hopefully somebody will shout out
19:32imirkin: hmmmm ... everything passes on nv50 too. that's highly suspicious.
19:33karolherbst: well most of the stuff is handled within gallium and codegen anyway
19:33imirkin: well - even if they do, my fix is a true fix - the problem won't occur. just you lose the opt.
19:33Riastradh: So how do I use this mmiotrace thing? Can I set a kernel boot option to store all nouveau mmio in a buffer that I can retrieve from userland later, or do I need to compile a custom kernel?
19:33imirkin: Riastradh: you need some stuff in the kernel
19:33imirkin: is this for netbsd, or linux?
19:33karolherbst: well most distributions ship it though
19:34Riastradh: imirkin: Want to compare linux and netbsd to see why screen blanks on netbsd but not linux.
19:34karolherbst: Riastradh: check cat /sys/kernel/debug/tracing/available_tracers
19:34imirkin: Riastradh: https://wiki.ubuntu.com/X/MMIOTracing
19:34imirkin: this is a good guide
19:34imirkin: but basically it needs kernel support - it messes with the PTE's for the mmio space to cause traps
19:34imirkin: and then those ops get single-stepped
19:34imirkin: results recorded, etc
19:34Riastradh: I want to do this for everything from probe to first mode switch (and maybe a little further).
19:35Riastradh: Oh, I see.
19:35imirkin: however note that the actual mode switches on G80+ are done in a mostly-invisible-to-mmio way
19:35karolherbst: Riastradh: if you want to port it over to netbsd, you can ask me questions, but... don't expect good answers
19:35Riastradh: karolherbst: I already did so; I'm updating it now!
19:35imirkin: karolherbst: he already has.
19:35Riastradh: Updating it partway, anyway.
19:35karolherbst: I see
19:35karolherbst: have fun then
19:35karolherbst: mmiotrace ain't fun to debug
19:35imirkin: Riastradh: which gpu are you testing with?
19:35Riastradh: On 4.4 now, aiming at 4.18 eventually.
19:36Riastradh: (once I get 4.4 to do more than blank the screen)
19:36imirkin: there was a MAJOR update in 4.3
19:36Riastradh: [ 1.044469] nouveau0 at pci1 dev 0 function 0: vendor 10de product 040c (rev. 0xa1)
19:36Riastradh: [ 1.044469] nouveau0: info: NVIDIA G84 (084c00a2)
19:36imirkin: Riastradh: you may also be interested in https://github.com/skeggsb/nouveau/ which is a standalone "userspace" nouveau repository
19:37imirkin: it's on you to provide OS hooks to do mmio
19:37imirkin: but assuming it has pci/etc control it can work
19:37imirkin: doing modesetting is tricky with it though
19:37Riastradh: So, in what way is the mode setting invisible to mmiotrace in G80+?
19:37imirkin: well, in previous GPU's modesetting was done by writing a bunch of registers
19:38imirkin: which would happen over mmio
19:38imirkin: starting with G80, there's actually a display engine which consumes a command stream much like the 3d engine
19:38Riastradh: Oh, but this uses the evo_mthdwhatever to write to a channel in DMAable memory?
19:38imirkin: so in the mmiotrace you'd see the various channel setup/etc being programmed
19:38imirkin: but the actual commands are dma'd
19:38Riastradh: So I guess diffing the mmiotrace might not be so helpful.
19:39imirkin: well ... depends what's getting messed up
19:39imirkin: can you check if 3d works?
19:39Riastradh: I have a diff of the display registers, from older NetBSD to newer NetBSD and from Linux 4.4 to newer NetBSD.
19:39imirkin: i mean, is it just the display that turns off
19:39Riastradh: Dunno, how do I check?
19:39imirkin: but everything else works
19:39imirkin: i'm assuming you can ssh into the machine
19:40imirkin: check that X starts, and the nouveau ddx is all happy
19:40Riastradh: With some work, yes. Mostly I'm just typing blind to save dmesg and reboot.
19:40imirkin: (or modesetting ddx, whichever)
19:40imirkin: ahhh ... hrm
19:40imirkin: well, can you check the xorg logs from such a boot?
19:40imirkin: does X start?
19:40Riastradh: Something went wrong when I started X, can't remembe rwhat.
19:40Riastradh: Lemme see.
19:40imirkin: is nouveau ddx all happy?
19:40imirkin: and do you have the full (drm + nouveau) dmesg?
19:41Riastradh: Here's dmesg: https://www.netbsd.org/~riastradh/tmp/20180821/dmesg.20180821.0
19:41Riastradh: Contrast with the old one (from 3.15ish Linux) that works: https://www.netbsd.org/~riastradh/tmp/20180821/old.20180821.0
19:41imirkin: [ 1.044469] drm kern info: nouveau: DRM:00000000:00000002: fini completed in -999845us
19:41imirkin: that's a *fast* computer...
19:42imirkin: ok, so it gets modes from LVDS-1
19:42imirkin: that's a good sign
19:42Riastradh: Yes, I fixed some bugs in mode _detection_ already.
19:42Riastradh: (All my fault -- truncated some 32-bit register somewhere.)
19:43imirkin: are the stack-traces your debugging, or are they indicative of errors?
19:43Riastradh: My debug messages.
19:43Riastradh: No crash in that dmesg.
19:43imirkin: [ 1.044469] nouveau0: autoconfiguration error: error: disp: ERROR 5 [INVALID_STATE] 0b  chid 1 mthd 0080 data 00000000
19:44imirkin: that means you're done.
19:44imirkin: these GPUs are *not* good at recovering from display config errors
19:44imirkin: done == "blank screen"
19:44Riastradh: OK...but that happens in the old working one too.
19:44imirkin: doesn't make it right :p
19:44Riastradh: [ 1.045096] drm kern error: nouveau E[ PDISP][nouveau0] INVALID_STATE [UNK0B] chid 1 mthd 0x0080 data 0x00000000
19:45imirkin: i believe you...
19:45imirkin: ok, well skeggsb could tell you what's wrong with that while drunk and his right eye closed... it'll take me a bit to figure it out.
19:46Riastradh: Hm. The numbers are slightly different.
19:46Riastradh: Working: [ 1.045096] drm kern error: nouveau E[ PDISP][nouveau0] 0x0094: 0x00000000 -> 0xcafe0000
19:46Riastradh: Not working: [ 1.044469] nouveau0: info: disp: 0094: 00000000 -> f0000000
19:46Riastradh: Couple others. Likely to be significant?
19:46imirkin: dunno about that one specifically, but likely something in there
19:46imirkin: some of the changes may be on purpose
19:47Riastradh: Some might just be different memory addresses allocated by NetBSD or something.
19:47imirkin: i assume this doesn't happen on linux at all?
19:47Riastradh: Lemme check.
19:48Riastradh: I don't have linux dmesg handy, oops.
19:49Riastradh: Are these registers documented somewhere?
19:49Riastradh: Oh, presumably these are just the display engine registers whose documentation I was already consulting to compare the mmio dumps.
19:49imirkin: 0x94 is the semaphore handle
19:50imirkin: cafe seems wrong.
19:50Riastradh: Heh. cafe is the working one.
19:50imirkin: that must be a cooked one then.
19:50imirkin: oh yeah. dead is bad.
19:50imirkin: or no. 0xbadf is bad.
19:50imirkin: well, just a fixed handle that's #define'd
19:50imirkin: 0xbadf gets returned via mmio when you read something you're not supposed to
19:51Riastradh: OK, don't see that in Linux on 4.4.0.,
19:52Riastradh: https://www.netbsd.org/~riastradh/tmp/20180821/mmiodiff-su is the diff of the mmio registers from...crap, either from Linux to NetBSD or from NetBSD to Linux but I forget which.
19:53Riastradh: display mmio registers
19:53imirkin: crap, this is sounding familiar
19:53imirkin: do you know what mode you're setting?
19:53Riastradh: from Linux to NetBSD
19:53Riastradh: This one, I think: [ 1.045096] DRM debug in drm_mode_debug_printmodeline: Modeline 25:"1920x1200" 60 167520 1920 1968 2000 2296 1200 1203 1206 1216 0x48 0xa
19:54imirkin: what color format
19:54Riastradh: I have no idea.
19:54Riastradh: How do I find out?
19:54Riastradh: Or, what is the shape of the answer you're looking for?
19:55Riastradh: Lemme see if I can dig through to find that. While I'm digging, can you expand on what you will do with the answer?
19:55imirkin: we had a screwup of some sort with the C8 formats on ... some gpu's. unfortunately i don't remember which way it went.
19:56imirkin: either gf119+ or pre-gf119
19:56imirkin: C8 = indexed + LUT
19:56Riastradh: Is there a git log that will jog memory?
19:56imirkin: i will look now.
19:56imirkin: in the small repo, 32ed69845a000b8261c24282eb25947ce0178c5b
19:57imirkin: and yes, the issue is with pre-gf119
19:57imirkin: mmmmm ... maybe
19:57imirkin: Riastradh: https://hastebin.com/hibeduleqa.sql
19:57Riastradh: Chosen by drm_mode_legacy_fb_format?
19:58imirkin: wellll depends.
19:58imirkin: i guess maybe
19:58imirkin: right. yes.
19:58imirkin: is bpp == 8
19:58imirkin: or is it higher?
19:59imirkin: if it's higher, you're not affected by this bug
19:59Riastradh: surface_bpp = 32
19:59Riastradh: [ 1.045096] nouveaufb0: framebuffer at 0xffff8000711a2000, size 1920x1200, depth 32, stride 7680
19:59imirkin: right ok. so it's something else.
20:00imirkin: uint32_t drm_mode_legacy_fb_format(uint32_t bpp, uint32_t depth)
20:00Riastradh: Of course, if it were broken by this in 4.4, surely that would mean Linux would blank the screen too?
20:00imirkin: to confirm, bpp == 32 or depth == 32?
20:00Riastradh: bpp = 32
20:00Riastradh: The messaging is a little confusing:
20:00imirkin: if the format picked was _C8 then yes (which would only happen on a handful of weird machines)
20:00Riastradh: In nouveau_fbcon_create,
20:01Riastradh: mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp,
20:01imirkin: on most it would pick _XRGB8888 so nobody noticed
20:01Riastradh: so what I'm looking for is evidence of what surface_bpp is.
20:01Riastradh: surface_bpp flows into
20:01Riastradh: prop_dictionary_set_uint8(dict, "depth", sizes->surface_bpp);
20:01Riastradh: in NetBSD, which is then printed as `depth'.
20:01imirkin: and what's surface_depth btw? is it 24?
20:01Riastradh: I don't know where surface_depth is printed, so I don't know what the depth argument to drm_mode_legacy_fb_format is.
20:02imirkin: well, it's to distinguish e.g. a 32-bit pixel where 24 are used for color vs 30 (10bpp) or 32 (with alpha)
20:03imirkin: [ 1.044469] nouveau0: info: disp: 00c0: 00000000 -> ffff0000
20:03Riastradh: But, I can't figure it out immediately, so lemme try printing it.
20:03imirkin: that's weird.
20:03Riastradh: What's that?
20:03imirkin: this is supposed to the image handle
20:03imirkin: i don't think ffff0000 is a valid image handle
20:03Riastradh: What is an image handle?
20:03Riastradh: What makes it valid?
20:04imirkin: nothing intrinsically, can be anything
20:04imirkin: asyw->image.handle = ctxdma->object.handle;
20:04imirkin: these things tend to count up from 0
20:04Riastradh: I mean: is it a bus DMA address, is it a number that the driver assigns, is it...?
20:04imirkin: just a number in a table iirc
20:05Riastradh: Assigned by nouveau?
20:05imirkin: or i'm full of it...
20:05imirkin: const u8 kind = fb->nvbo->kind;
20:05imirkin: const u32 handle = 0xfb000000 | kind;
20:06imirkin: so first it tries to find the relevant ctxdma object
20:06imirkin: could that kind of code cause problems in netbsd?
20:06imirkin: i.e. u32 | u8
20:06imirkin: i dunno how well defined this stuff is
20:06Riastradh: Not likely but I can try using UINT32_C so it isn't technically undefined behaviour.
20:06Riastradh: Where is that?
20:07Riastradh: (Recall this is from 4.4.143, not Linux master.)
20:08imirkin: but that's unlikely to be the location of the issue...
20:08Riastradh: No function nv50_dmac_ctxdma_new there.
20:08imirkin: do you have a copy of your netbsd dmesg with nouveau.debug=trace ?
20:08Riastradh: No, but I can make it. Anything else you want printed?
20:08imirkin: not yet
20:10Riastradh: I don't see any obvious writes to 0x6100c0 in this version of nouveau.
20:10imirkin: so from your old trace -
20:10imirkin: [ 1.045096] drm kern error: nouveau E[ PDISP][nouveau0] 0x00c0: 0x00000000 -> 0x01000003
20:10imirkin: which looks a lot more like a handle.
20:10imirkin: the writes aren't directly to that reg
20:10imirkin: they're submitted as commands in a pushbuf to the disp engine
20:10Riastradh: Where does it get written?
20:11imirkin: search for 0x00c0
20:11imirkin: in nv50_display.c
20:11Riastradh: evo_mthd(push, 0x00c0, 1);
20:11Riastradh: evo_mthd(push, 0x00c0, 1);
20:11imirkin: the value that comes after is the value
20:11Riastradh: evo_data(push, nv_fb->r_handle);
20:12Riastradh: nv_fb->r_handle = 0xffff0000 | kind;
20:12imirkin: certainly SEEMS on purpose ;)
20:12Riastradh: u8 kind = nouveau_bo_tile_layout(nvbo) >> 8;
20:13Riastradh: Can print that too, I guess.
20:13imirkin: i wouldn't worry about that one.
20:13imirkin: 0 is a perfectly fine kind.
20:13imirkin: kind == memtype
20:13imirkin: can be things other than 0 for various microtiling things
20:13imirkin: esp for depth and ms surfaces. nothing to worry about for scanout.
20:14imirkin: unfortunately i don't know enough about all this to positively say how it works
20:14imirkin: you really want to chase down skeggsb and get him to explain some stuff
20:15Riastradh: skeggsb: Hi! Halp!
20:15imirkin: you'll also get the most benefit from moving to latest kernel, both because various things have been fixed, and more importantly changed around, as well as because people's memory doesn't go back that far.
20:15imirkin: the high level api's are pretty stable, but the impl has variously changed around
20:16karolherbst: Riastradh: is your plan to get a "random" base and port it over or to apply patches until yo get there?
20:16Riastradh: I'm still debating how much time to spend chasing this wild goose in 4.4 and how much to spend just updating to 4.18.
20:16Riastradh: karolherbst: ?
20:16karolherbst: I mean, there are two apporaches on backporting
20:16imirkin: karolherbst: he ported 3.15 or so a long while back
20:16karolherbst: ahh yeah
20:16imirkin: now trying to bring it up to speed
20:16karolherbst: I highly doubt it is a good idea to port 4.4 today
20:16karolherbst: just get the newest master or something
20:16imirkin: which involves both drm and nouveau itself, as well as all the supporting helpers
20:17karolherbst: ohh, true
20:17Riastradh: I didn't pick 4.4, exactly; just started working on top of some partial work someone else already started on it, and mostly finished it -- at least for intel and radeon.
20:17karolherbst: so netbsds drm code is still at 3.15?
20:17Riastradh: Next time I merge will be 4.18, probably.
20:17karolherbst: Riastradh: depending on when the next time is ;)
20:18karolherbst: maybe we want this to be a bit less painful, but I doubt you get many drm devs convinced to care about that
20:18Riastradh: I'm working in a branch on NetBSD, so what I plan to do is merge the current work into NetBSD for intel and radeon, and then start working on 4.18 in another branch.
20:18Riastradh: But that will leave nouveau broken in NetBSD.
20:18Riastradh: So I was kinda hoping to get at least modesetting functional in case I don't get to 4.18.
20:19karolherbst: what GPU are you testing on?
20:19Riastradh: But maybe I should just give up on 4.4 and move on to 4.18 ASAP.
20:19karolherbst: well, it is 8 releases
20:19karolherbst: how long do you need for a port normally?
20:19Riastradh: Some random nvidia thing in an old laptop. Couple other people have reported the same on other devices, though.
20:19Riastradh: This specific one is...
20:20Riastradh: [ 1.045096] nouveau0 at pci1 dev 0 function 0: vendor 10de product 040c (rev. 0xa1)
20:20karolherbst: the problem is, we can't really rely on the product id, but let me check
20:20Riastradh: [ 1.045096] drm kern info: nouveau [ DEVICE][nouveau0] Chipset: G84 (NV84)
20:20Riastradh: [ 1.045096] drm kern info: nouveau [ DEVICE][nouveau0] Family : NV50
20:21bubblethink: Hi. I'm trying to fix tearing for the external monitor in an optimus setup. I found that setting GLXVBlank to true fixes it, but with that set, my laptop's display stops working when there is no external monitor connected
20:21Riastradh: How long to port: doing the 3.15->4.4 merge took about a weekend to get it merged, compiled, and linked.
20:22Riastradh: Then ~3 weeks of evening spare time to fix it all and make it work.
20:22Riastradh: At least, intel and radeon.
20:22karolherbst: sounds like it would be worth to just jump to 4.18
20:22karolherbst: somebody will have to do it anyway
20:22karolherbst: or some later version
20:22karolherbst: and you get more hw support
20:24bubblethink: So it looks like GLXVBlank interferes with regular operation of laptop's display driven by the intel igpu in an optimus setup ?
20:24karolherbst: it is the i915 causing this though
20:25bubblethink: any way to get tear free setup for external monitor ?
20:25karolherbst: ohh, you mean reverse prime?
20:25bubblethink: no, this is regular prime. Intel drives the internal display, and nvidia drives the external display
20:26karolherbst: yeah, that is called reverse prime
20:26bubblethink: when i dock it, i only use the external display
20:26bubblethink: oh ok
20:26bubblethink: this is a thinkpad w530, kepler gpu
20:26karolherbst: prime: render on dedicated, display on integrated. reverse prime: render on integrated, display on dedicated
20:27karolherbst: bubblethink: is it just GLXVBlank what you change in the X config?
20:27karolherbst: or do you change more?
20:27bubblethink: but when the external monitor is connnected, isn't it rendered as well as displayed on the dedicated ? I don't
20:27bubblethink: no, just that
20:27bubblethink: i mean, I don't need both the screens to work
20:28bubblethink: i just need the external to work when docked
20:28karolherbst: did you check the screen settings?
20:28karolherbst: maybe something just disables the internal one
20:28bubblethink: so it's not necessary to render on integrated for the laptop's lcd
20:28karolherbst: well, you can't change that for now ;)
20:28karolherbst: so your desktop will always get rendered on the intel one
20:30karolherbst: bubblethink: okay, so, the problem you see is, if you have your laptop undocked, the internal screen is displaying stuff (both with GLXVBlank enabled and disabled), but when you dock is, the internal one gets disabled when GLXVBlank is set?
20:31bubblethink: no, if I set GLXVBlank, I cannot use the laptop normally
20:31bubblethink: i.e., when intel is driving the laptop display
20:31bubblethink: or so I assume
20:31karolherbst: mind sharing your X config file
20:32bubblethink: one sec. How do I generate the current xorg config ? There isn't an explicit xorg config file these days, right ?
20:33karolherbst: well, you have a file where you change GLXVBlank
20:33karolherbst: or not?
20:34bubblethink: yes, that was just a single change in a file under /usr/share/X11/xorg.conf.d
20:34bubblethink: I added this
20:34bubblethink: #Section "Device"
20:34bubblethink: # Identifier "nvidia card"
20:34bubblethink: # BusID "PCI:1:0:0"
20:34bubblethink: # Driver "nouveau"
20:34bubblethink: # Option "GLXVBlank" "true"
20:35karolherbst: Driver "nouveau" <=== is causing it
20:35karolherbst: remember, you are using the intel GPU mainly
20:35karolherbst: you want to change stuff on the intel GPU as well
20:35karolherbst: or at least have a intel device
20:35karolherbst: with autoconfiguration X kind of does something sane
20:35karolherbst: but if you add a device entry, X doesn't scan for more dispay devices
20:36karolherbst: but only uses the configured one
20:36karolherbst: of course you could setup reverse prime to render on nouveau and display on intel
20:36karolherbst: but then you have your nvidia GPU to be always on
20:36karolherbst: which sucks without a power supply
20:36bubblethink: yeah, that would kill power
20:36karolherbst: soo what you want to have is at least the intel device
20:36bubblethink: so is there any way to apply GLXVBlank only to nouveau without affecting intel ?
20:36karolherbst: try to fix tearing there
20:37karolherbst: but that would require to specify both devices
20:37karolherbst: and maybe some additional config
20:37bubblethink: So if I get my current intel config, and just add it to the file, would it work ?
20:38karolherbst: I really don't know what to do if you want to have two devices there
20:38karolherbst: I expect something else to be done
20:38karolherbst: but that's a question for #xorg
20:38karolherbst: I would try to fix things by changing parameters on the intel driver for now
20:38karolherbst: maybe that helps already
20:39bubblethink: with tearing you mean ?
20:39bubblethink: i.e. intel params for tearing ?
20:41bubblethink: Is there a way to generate X config file from a running session ? If I add a different section for intel, it may be worth a shot
20:42karolherbst: not that I am aware of
20:42karolherbst: but you can just leave everything out
20:42karolherbst: usually you only overwrite stuff you specify
20:43bubblethink: so just add this ?
20:43bubblethink: Section "Device"
20:43bubblethink: Identifier "Intel Graphics"
20:43bubblethink: Driver "intel"
20:43bubblethink: let me try. may get disconnected
20:48bubblethink: ok, for a second i thought it worked
20:48bubblethink: display didn't die
20:48bubblethink: but, it tears
20:48bubblethink: with external monitor
20:48bubblethink: so glxvblank didn't work
20:49bubblethink: adding that intel section negated it somehow
20:49karolherbst: yeah, it might be that there is some deeper issue
20:49karolherbst: I would ask inside #xorg or #intel-gfx maybe they know more
20:49karolherbst: or maybe we do?
20:49karolherbst: imirkin: do you know if we miss something to implement to get tear free reverse prime working?
20:51bubblethink: you may be right that there is something deeper because GLXVBlank is normally set to true (according to man pages), but isn't for prime setups
20:51bubblethink: so perhaps this is a known thing
20:51bubblethink: i was kind of surprised when tearing went away with that change
20:52bubblethink: and then i realized that it borks the internal display
20:52bubblethink: so maybe there's more to it that that
20:52bubblethink: also, with wayland, I don't see tearing
20:52imirkin: karolherbst: tear free reverse prime is roughly impossible.
20:53karolherbst: I don't believe you :p I mean there has to be a way to fix it ;)
20:53bubblethink: at least a hacky way to fix it would be to restart X on dock/undock with the GLXVBlank setting
20:53imirkin: karolherbst: talk to airlied
20:54karolherbst: I mean, it is tear free on windows, isn't it? Even if it means reworking quite a lot, it should be still possible in the end
20:54karolherbst: might just tons of work
20:55bubblethink: imirkin, the tearing itself seems to go away with the GLXVBlank setting. Is there a way to make it play nice with intel gpu + laptop lcd ?
20:55imirkin: no clue what the question is
20:55karolherbst: bubblethink: the thing is, with your GLXVBlank there wasn't any intel driver used inside X anymore
20:55bubblethink: sorry, i think you missed the initial part of my report
20:55karolherbst: so it was all just nouveau
20:55karolherbst: maybe it works even without stating GLXVBlank
20:55imirkin: i didn't miss it ... it was confusing.
20:55karolherbst: just having the nouveau device there might be enough
20:56imirkin: also your handle is the same length as karolherbst's so i confuse the two streams ;)
20:56karolherbst: mine is a little bit shorter :p but I guess you have a mono spaced font?
20:57imirkin: irc with variable-spaced fonts? blasphemy!
20:57karolherbst: my client also has nice colors, so I get always confused if the color of some person changes :)
20:57karolherbst: mhh I don't like monospaced for written language, feels kind of too technical
20:58imirkin: i find it easier to read
20:58imirkin: in pretty much all circumstances
20:58karolherbst: I guess this could be just being used to it ;)
20:58karolherbst: never really read monospaced text
20:59bubblethink: karolherbst: so you mean just the nouveau section in the conf fixed tearing ? I can try that out
20:59karolherbst: bubblethink: that is my theory for now, yes
21:01bubblethink: brb. trying the change
21:02karolherbst: imirkin: nice! I got something different now!
21:03karolherbst: imirkin: https://gist.githubusercontent.com/karolherbst/41485d374352fba454ca769e6047ac63/raw/1cd9fea93bf5e538ca46d5f86833fa639852756b/gistfile1.txt
21:03karolherbst: ever saw that?
21:03imirkin: i haven't run the nvidia blob driver in AGES
21:03imirkin: probably not since the 325 series
21:03karolherbst: well the last line is the normal fault thing
21:04karolherbst: the two lines in the middle only occur if I launch something with cuda-gdb
21:04bubblethink: karolherbst, you are right
21:04karolherbst: well something being a cuda application
21:04bubblethink: same behaviour without the GLXVBlank setting
21:04imirkin: right, makes sense that this would trigger additional debug enables
21:05karolherbst: yeah, but now I want to get cuda-gdb to break at the fault
21:05karolherbst: apperantly, cuda-gdb can debug GPU code
21:05karolherbst: uhh nice
21:05karolherbst: :D how fun
21:06karolherbst: imirkin: https://gist.githubusercontent.com/karolherbst/950d95681d51313d95cbf0f731ec2113/raw/8772c61091312af5374b18a8b17a641587e1fc70/gistfile1.txt
21:06karolherbst: I want that for nouveau now as well :p
21:07karolherbst: "Thread 1 "cudat" received signal CUDA_EXCEPTION_14, Warp Illegal Address." :)
21:08imirkin: they've obviously put a good bit of thought towards all this
21:08karolherbst: I know
21:09karolherbst: the situation is kind of like there is no point in having open source cuda if you are not able to have all the tooling around open source as well
21:09karolherbst: and the cuda runtime itself is laughable small
21:10karolherbst: I mean the runtime to run stuff on the GPU
21:12bubblethink: are there any compositor tricks for tearing ?
21:12bubblethink: there seem to be quite a few guides here and there, but they all seem to be old, and not sure if they apply to the reverse prime case
21:18bubblethink: any of the new stuff in xorg that supposedly enabled server side glvnd ?
21:27karolherbst: imirkin: I currently fear that nvidia does plays ping pong between userspace and kernelspace... so the kernel pings the userspace, because it just trapped the MP and now userspace takes over to handle it.. but hopefully I figure something out from the new traces I do
21:35linkmauve: Ormu’s issue is probably from atomic.
21:36linkmauve: Weston is using this since after 3.0.
21:36imirkin: linkmauve: oh, it REQUIRES atomic?
21:36imirkin: nouveau doesn't advertise it by default
21:36linkmauve: No, but it uses it if it’s available.
21:36imirkin: shouldn't be available...
21:38linkmauve: Oh wait, I misread, it says “[14:54:14.218] DRM: does not support atomic modesetting”.
21:38linkmauve: Nvm then, it’s another issue.
21:41imirkin: yay :)
21:42linkmauve: I would have hoped you’d be able to read the kernel logs for DRM, because I clearly can’t (without some more correlation). ^^'
21:43imirkin: i think we're at about the same level there
21:43imirkin: we generally know what it's supposed to be
21:43imirkin: but have to rtfs to know the specifics
21:43imirkin: and both lazy about doing it ;)
22:07Riastradh: imirkin: Kinda moot now, I suppose, but I have:
22:07Riastradh: [ 1.040818] nouveau_fbcon_create:386: surface_bpp=32, surface_depth=24, pixel_format=875713112
22:07Riastradh: and 100 MB of nouveau trace output.
22:08Riastradh: That's a pretty weird value for pixel_format, isn't it?
22:08imirkin: it's a fourcc
22:09imirkin: >>> chr(0x58), chr(0x52), chr(0x32), chr(0x34)
22:09imirkin: ('X', 'R', '2', '4')
22:09imirkin: aka DRM_FORMAT_XRGB8888
22:09Riastradh: Got it.
22:10imirkin: if you can easily upload that (compressed) output, i could take a glance. but no promises on success :)
22:11Riastradh: Yes, moment.
22:17Riastradh: imirkin: https://www.NetBSD.org/~riastradh/tmp/dmesg.20180826.0.gz
22:18imirkin: 404 not found
22:20imirkin: Riastradh: --^
22:21Riastradh: imirkin: https://www.NetBSD.org/~riastradh/tmp/20180826/dmesg.20180826.0.gz
22:21Riastradh:should test URLs before handing them out
22:21imirkin: ok, retrieved. thanks
22:22imirkin: Riastradh: i'll try to have a look tongiht after dinner
22:22Riastradh: imirkin: Cool, thanks!
22:23Riastradh: Meanwhile I think I might just merge this into NetBSD and start working on 4.18.
22:23imirkin: Riastradh: which kernel version specifically are you using?
22:23imirkin: ok cool. i'll grab that too.
22:27Riastradh: imirkin: FYI, there will be one notable difference from Linux: the map ioctl is a little bit different, because rather than shoehorning i/o port numbers into pointers with ioread32/iowrite32, we pass around a bus_space_tag_t object in addition to the port/address/whatever in bus_space_handle_t.
22:27Riastradh: The ioctl number is...
22:28imirkin: if ioread/write was broken, you'd be a whole lot more screwed
22:28Riastradh: #define NVIF_IOCTL_V0_MAP_NETBSD 0x0d
22:29Riastradh: Yes, I'm just pointing out that if you look at the trace output and you see 0x0d where you expect 0x07, that's what it's about.
22:29imirkin: ah i see.
22:29imirkin: btw - my friend's dropping netbsd. apparently someone broke /bin/sh which upset him greatly.
22:38Riastradh: Heh. Is your friend Izaac?
22:38imirkin: lucky guess.
22:39Riastradh: I don't really know what the issue is. What I do know is that Robert Elz is fixing various things in /bin/sh, and adding automated tests, and sometimes substantially rototilling parts of it, because the code base is ancient and crusty.
22:39imirkin: yeah, i don't think that's being contested
22:40Riastradh: Whether he's rototilling too much, or breaking things along the way, I dunno.
22:41imirkin: i don't really have a dog in that fight, but he was the only person i actually knew who used netbsd
22:42imirkin: (i also know i'd be livid if someone broke /bin/sh. i already pretty pissed off at the gnu ls changes to "help" with encoding issues.)
22:42Riastradh: Is it intentional that a lot of this mmio stuff goes through the nvif ioctl mechanism?
22:42Riastradh: Or did I screw something up in mapping so that it has to use that as a fallback?
22:42imirkin: well, most of the mmio is actually behind the nvif mechanism
22:43Riastradh: I added return checks to a few of the places that do mmio mapping, and I think I killed all the cases where it was failing.
22:43imirkin: only the consumers go through nvif
22:43imirkin: like nv50_display and so on
22:43imirkin: but those don't do a whole lot of mmio
22:43Riastradh: (added __must_check to some of the map routines to be sure)
22:43Riastradh: (or, to nvif_object_map, specifically)
22:44imirkin: the idea is that the nvif stuff should be virtualizable across virtio, for example
22:44imirkin: and everything outside of nvkm should interact with nvkm via nvif
22:44imirkin: (and usif)
22:44imirkin: like what's a place that does mmio that goes through nvif?
22:45Riastradh: I mean like nvif_wr32.
22:46Riastradh: Can use iowrite32, or can use nvif_object_wr.
22:46Riastradh: Seems to often use nvif_object_wr.
22:46Riastradh: Depends on whether (a) the caller did nvif_object_map, and (b) the nvif_object_map succeeded.
22:47Riastradh: Or at least it did last time I looked closely.
22:47Riastradh: Possible that was before I fixed some bugs.
22:49Riastradh: It's possible I'm mixing up some parts of the system; I can't keep the layers of abstraction for mmio straight here, I'm afraid!
22:50Riastradh: Mixing them up in my story, that is. Obviously a lot of the mmio actually is working or else it wouldn't manage, e.g., mode detection.
22:50Riastradh: Enjoy dinner!
23:58karolherbst: mslusarz: mhh, what is the best way to mmt trace something I run gdb around?