01:06RSpliet1: imirkin: thanks for figuring out #90350
04:43RSpliet: pmoreau: do you still have those test shaders?
04:43RSpliet: think you can repeat the "sat" test for the MAD operation?
04:45RSpliet: it strikes me odd that MUL can't do SAT but MAD can... and it would be interesting to figure out if we should replace MUL + SAT MOV to SAT MAD with 0 ($r0?) as third parameter in some cases
06:54inglor: Hi all, got an issue with nouveau in my system. I get in logs a lot of errors for PDISP like this dmesg error "[ 11.459335] nouveau E[ PDISP][0000:01:00.0] 0x0e5c: 0x00000001"
06:54inglor: I believe it's related to GRUB settings - does anyone have some idea where I should be looking?
06:55inglor: Other than the errors X loads fine (with GDM and Gnome 3)
06:56tobijk_: inglor: i dont think that has to do with grub settings, but just out curiosity, what settings do you have there? :D
06:59inglor: tobijk_, my grub settings: https://pastebin.mozilla.org/8833018
07:00inglor: Up until some time ago I used to have Nvidia drivers so I think it might be some leftover setting, maybe something to do with console setting?
07:01inglor: I generally followed the wiki from Archlinux: https://wiki.archlinux.org/index.php/Nouveau . Maybe it's seomthing to do with KMS?
07:04tobijk_: inglor: nothing that special in the grub config, you may just have bad luck i'M afraid
07:05inglor: Bad luck with what? To be honest the error is so cryptic so I don't know what to think..
07:05tobijk_: inglor: can you get me the full dmesg?
07:11inglor: tobijk_, here is it: http://fpaste.org/220390/
07:15tobijk_: oh wow that is really noisy, but you say it actually works? :O
07:15inglor: tobijk_, yeah awesome yes! ? :P
07:18tobijk_: i dont know that much about pdisp at all i have to admit, can you file a but at bugs.freedesktop.org?
07:19tobijk_: Product: xorg - Component: Driver/nouveau
07:32inglor: tobijk_, I'm not sure how to find the Version of nouveau .. It's not part of the package - but I suspect it's the latest
07:36inglor: tobijk_, Bug reported: https://bugs.freedesktop.org/show_bug.cgi?id=90388
07:37inglor: Thanks for help
07:44tobijk_: inglor: sorry was away for a while, thanks for the report!
07:51inglor: no prob i hope it's solved
07:51imirkin: RSpliet: np. pmoreau did test both mul and mad. mad.sat works, mul.sat doesn't. very surprising.
07:53tobijk_: imirkin: any idea about the pdisp errors?
07:53imirkin: someone bisected the errors they were getting to a commit between 3.15 and 3.16 -- https://bugs.freedesktop.org/show_bug.cgi?id=90276
07:54tobijk_: imirkin: its woth 4.0.x
07:54tobijk_: *with
07:55imirkin: tobijk_: and 4.0 is after 3.16 right?
07:55tobijk_: yeah so i hoped it to be fixed in 4.0
07:55imirkin: why?
07:55imirkin: the person doing the bisects in that bug said the issues occur with latest
07:56RSpliet: imirkin: interesting... thanks
07:57inglor: imirkin, tobijk_ Might be related, though the bug is using dual screens - I'm using 1 screen over Display Port
08:01imirkin: inglor: ah ok. are there additional issues on top of the errors, or just the dmesg spam?
08:02inglor: imirkin, nothing I've noticed so far.
08:03inglor: I raised a ticket: 90388
08:03imirkin: inglor: ah ok. btw note that if you plan on playing games you should boot with nouveau.pstate=1 to be able to move to higher clock speeds.
08:04inglor: that's a kernel param? (no time to play games nowdays :/ )
08:04imirkin: it's a module parameter
08:04inglor: ok thanks
08:04imirkin: if nouveau is a module you can also stick it into modprobe.conf
08:06inglor: Is this available only on the latest driver? (the pstate thing)
08:06tobijk_: inglor: after that you can change ther perf-lvl, but this may not always work
08:06tobijk_: inglor: that is available since 3.18 or .17
08:07inglor: ok cool, maybe the archwiki needs some updates then :D
08:21bonbons: imirkin: for whatever reason my riva systems freezes hard on X startup, with "[ 777.877189] nouveau E[ PGR][0000:01:00.0] NOTIFY nsource: PROTECTION_ERROR" as last message from kernel
08:22bonbons: while X's last message that gets through ssh is about switching resolution
08:24bonbons: (nouveau.debug=debug, increasing Xorg verbosity does not show anything extra either)
08:26imirkin: bonbons: that error is expected on all NV04 systems, and not the cause of your issue
08:27bonbons: any idea how to find out the cause?
08:27imirkin: iirc it has to do with the ddx binding the null object and it doesn't like that (but nv05 is perfectly happy with it)
08:28imirkin: bonbons: sorry no... this is the Diamond one right?
08:28bonbons: Diamond?
08:28imirkin: the brand
08:29bonbons: no, it's "onboard" one of SR440 Intel mainboard
08:29imirkin: oh innnnteresting. does it get reported as NV04 or NV0A?
08:31bonbons: Chipset: NV04 (NV04)
08:33imirkin: hm, ok. what's the pci id?
08:33imirkin: (lspci -nn -d 10de: )
08:36RSpliet: airlied, skeggsb: mind rolling a new libdrm for F21 too?
08:37bonbons: imirkin: 10de:0020, NVIDIA Corporation NV4 [Riva TNT]
08:38imirkin: ok cool. so just the standard NV4 then. a little surprising, i would have expected onboard to be different somehow
08:39tobijk_: RSpliet: wait a bit, .61 breaks my ivy-bridge system, i try to find out what it is right now
08:39RSpliet: tobijk_: rolling an RPM doesn't necessarily mean shipping it yet
08:39RSpliet: it'd make it easier for others to test
08:39imirkin: tobijk_: "The new comment is a bit upside down, but thats not really a problem" -- what does that mean?
08:40tobijk_: i expected the loop to have a !*->priv in it after reading the comment at first
08:40imirkin: why?
08:41imirkin: the while condition is when it's true...
08:41tobijk_: yeah its all fine, i was just a bit mislead first :)
08:41imirkin: anyways, i sorta see what you mean
08:41imirkin: but not a big deal
08:42tobijk_: yeah the other half sentence meant to say that ;-)
08:42imirkin: right, just wanted to confirm
08:42imirkin: i'm also gonna go and add a ton of comments to the heap
08:42imirkin: this isn't the first time i've figured out how it works
08:43imirkin: but every time i forget and look at it and it's complex
08:43bonbons: imirkin: I guess it's just the addon-card merged into the mainboard :). Full kernel log: https://pastebin.mozilla.org/8833019
08:43imirkin: haha -- nouveau.runpm=1 -- you wish!
08:43tobijk_: RSpliet: well rolling dist are fast today :/ opensuse already has it in its latest snapshot :o
08:44imirkin: bonbons: i assume you can't ssh in either?
08:44imirkin: bonbons: and netconsole doesn't provide more useful info?
08:45imirkin: bonbons: also is there a kernel on which this all worked ok?
08:46RSpliet: tobijk_: Fedora is not really a rolling release distro. besides, there's a community testing mechanism in place that can be used to bring a patch from build to updates
08:46bonbons: once I have started Xorg the machines goes 100% mute, no netconsole output anymore, not sysrq response on PS/2 keyboard, not ping answers (and no SSH either)
08:47bonbons: well all the kernels I tried on it now can't start X and run it successfully. So I guess one of the userspace updates (DDX, Mesa, ...) might have triggered the freeze-always case
08:47bonbons: though I can't tell for sure.
08:47tobijk_: RSpliet: ah interesting, anyway i just try to warn about it, i just shows after a full x stack rebuild, who knows why...
08:49bonbons: imirkin: a month or two ago when I took it out and installed Gentoo on it (compiled on a more powerful system ;) ) I got X working with XFCE on top of it. Since then, no luck and all kernels fail the same (3.19.3, 4.0, 4.0.2) - should be the 3.19.3 that worked in the beginning
08:50imirkin: silly question, but you didn't happen to do -march=native on said more powerful system, did you?
08:53bonbons: no I didn't, it's -march=pentium2 -O2 -fomit-frame-pointer
08:55bonbons: with -march=native I don't think it would run a long time in userspace... too many extra instructions added since then!
08:58bonbons: Xorg stack: xf86-video-nouveau-1.0.11, mesa-10.3.7, xorg-server-1.16.4, libdrm-2.4.59
09:04imirkin: if it's not too difficult, mind trying some older kernels?
09:04imirkin: like 3.14 or something
09:06imirkin: i should probably whip out my NV5 and see if that still works but... not gonna happen today
09:07bonbons: imirkin: that's not too difficult, matter of an hour or so :)
10:12bonbons: imirkin: 3.14.41 produces the same results, kernel&xorg logs at https://pastebin.mozilla.org/8833021
10:13imirkin: hm, i was hoping it was a recently-introduced issue
10:13imirkin: there have been a number of cleanups
10:14imirkin: sorry i'm of little help. perhaps skeggsb will have some ideas of what to try out.
10:14imirkin: i tend to be a bit out of my depth when it comes to things outside of mesa =/
10:17bonbons: heh, no problem with that :), as kernel and console are usable it's a good start so nouveau is not fully broken
10:17imirkin: one thing to try might be xf86-video-nouveau-1.0.10 or 1.0.9
10:17imirkin: i think .11 added a bunch of stuff that might subtly destroy nv4?
10:18imirkin: (vblank-related things, which i could easily imagine were improperly implemented there)
10:18hakzsam: imirkin, Hi, I'll investigate tomorrow about the issue you reported to me two days ago. As you have already pinpointed, we need to instantiate a compute object, but if nvf0 is really different of nve0, I'll only fix the crash in the first time (it's not my priority to implement compute support for nvf0)
10:18imirkin: hakzsam: yeah, just make it not crash. don't worry about compute on nvf0
10:18bonbons: that I can do as well, though if I can get some more debugging output that would be cool
10:19imirkin: bonbons: i'm all for having more debugging output ;)
10:19imirkin: you could do nouveau.debug=trace drm.debug=0xe
10:19hakzsam: imirkin, yes, anyways, my plan is to move MP counters to the kernel in the future :)
10:20bonbons: is there some more or less advanced kms test-suite around which I could run against nouveau/kms?
10:21imirkin: bonbons: drm has a 'modetest' tool which can do various things
10:21imirkin: (like set various modes, and, on nv04, should be able to set up the video overlay as well)
10:22bonbons: yeh I found those tools in libdrm, but I have a hard time understanding how to use them...
10:22imirkin: oh, you could also set NoAccel in your xorg.conf to see if it's an acceleration-gone-wrong thing, or if it's a mode thing
10:25imirkin: RSpliet: btw, if you could double-check that it all works on your G200, that'd be great. although the fact that it worked on nvac should be enough for it to work on G200
10:29bonbons: imirkin: noaccel seems to work much better... at least X starts and machine is still responsive
10:29bonbons: that is on 3.14.41
10:30imirkin: hrmmmmm... maybe that protection error isn't as innocuous as i had thought
10:31imirkin: what if you move nouveau_vieux_dri.so out of the way but keep acceleration enabled?
10:34bonbons: NoAccel also successful on 4.0.2
10:36imirkin: and if it still dies, start adding prints to the driver like it was going out of style. i think there's also some debug stuff that can be enabled.
10:36imirkin: (i mean xf86-video-nouveau)
10:37bonbons: imirkin: renaming both dri/nouveau_vieux_dri.so and mesa/nouveau_vieux_dri.so away does not protect against freeze
10:46bonbons: Accel=none is fine, Accel=glamor freezes, Accel=exa freezes
11:02bonbons: I get then same results with xf86-video-nouveau-1.0.10, now trying to find out where to insert prints in the DDX...
11:03imirkin: yeah, i didn't expect glamor would do any good ;)
11:10imirkin: bonbons: when in doubt, add a print
11:10bonbons: well, question is where to start as I have not idea about the execution flow!
11:12imirkin: start in nv_driver.c
11:12imirkin: also look at nv_accel_common and nv04_exa.c
11:22bonbons: looking at strace output it is rather soon after printing "resize called" in drmmode_xf86crtc_resize()
11:23imirkin: yeah, but that happens with accel disabled as well
11:46bonbons: with accel it exists drmmode_xf86crtc_resize() right after the print, then it attempts ioctl(13, 0xc0406481 with no further output
11:47joi: bonbons: you might try valgrind mmt to trace xorg
11:50joi: 0xc0406481 is DRM_NOUVEAU_GEM_PUSHBUF
11:50joi: which is used to submit commands to the gpu
11:51joi: mmt with demmt will tell you what are those commands
12:27bonbons: joi: cool, going to try (and hope valgrind will accept to start without extra symbol information, otherwise trying out will take some preparation time!)
12:50imirkin_: joi: does it record on ioctl submit or return? sounds like the hang happens in the kernel...
12:51imirkin_: bonbons: which pushbuf submit does it hang on?
12:51imirkin_: (you're looking for PUSH_KICK or nouveau_pushbuf_submit)
12:53joi: imirkin_: mmt catches both
12:53imirkin_: ok. hopefully it makes it to disk (or nfs root? i hope... for the sake of your fs...)
12:53joi: bonbons: mmt does not care about debug info
12:55bonbons: hopefully it can get the logs out via ssh properly or does proper fsync/fdatasync, otherwise most interesting details will be missing!
12:58joi: you can patch mmt to fsync after each entry
13:00joi: it's as easy as: http://fpaste.org/220458/raw/
13:04bonbons: imirkin_: calling nouveau_pushbuf_kick(0x8cf4d18, 0x8cf4cc8) from xf86-video-nouveau-1.0.11/src/nv04_exa.c:158
13:05bonbons: that is NV04EXASolid()
13:05imirkin_: bonbons: not that it really matters, but did you happen to log w/h?
13:06bonbons: not yet, though it could be 1280x1024 if it's screen size
13:07imirkin_: hmmm... i wonder if the max is 1024
13:07imirkin_: that'd be sad.
13:08imirkin_: hmmmm
13:09joi: (demmt decodes commands on return from ioctl, so you will need this patch to see the last ioctl: http://fpaste.org/220459/raw/)
13:11bonbons: imirkin_: calling PUSH_KICK(0x9abfd18) from NV04EXASolid(), pPixmap=0x9aedec8, w=1280, h=1024, x=0, y=0, x2=1280, y2=1024
13:11imirkin_: ok. well good to know it's not like -1 or something dumb like that
13:14imirkin_: uhm... wtf?!
13:14imirkin_: NVAccelInitRectangle is never run
13:15imirkin_: oh. there's a macro.
13:20imirkin_: bonbons: ftr, all pre-nv50 chips use that same EXASolid logic
13:22joi: nv04_graph_intr should print channel info after "NOTIFY nsource: PROTECTION_ERROR"
13:22joi: so it probably crashes in nouveau_client_name
13:22imirkin_: that'd be a particularly sad place to die in =/
13:22joi: yup
13:23imirkin_: btw, he's on 32-bit... might be relevant.
13:29joi: actually, before channel info there's supposed to be "nstatus:..."
13:30imirkin_: this is what it should look like: https://bugs.freedesktop.org/show_bug.cgi?id=68854
13:36bonbons: joi: I might be missing something though valgrind fails on me with: Xorg: ../sysdeps/x86_64/cacheinfo.c:304: handle_intel: Assertion `maxidx >= 2' failed.
13:36imirkin_: uhm. x86_64?
13:36joi: heh, that's valgrind core :(
13:37bonbons: good question! might be it does not compile properly under linux32 personality on x86_64...
13:39joi: it's better, it's different source tree
13:40joi: there's no cacheinfo.c in our valgrind
13:40imirkin_: bonbons: i don't see how you're getting a nsource print, but no nstatus..
13:41imirkin_: bonbons: see gr/nv04.c:nv04_gr_intr
13:41bonbons: imirkin_: aren't those two in distinct printk() calls?
13:41imirkin_: yeah, but there's nothing between them
13:41imirkin_: and the bitfield_print is a pretty innocuous function as well
13:43bonbons: imirkin_: unless system gets broken in-between due to pci/network calls (too few packets for loss of netconsole UDP packets I would say)
13:43imirkin_: is serial an option btw?
13:43imirkin_: old motherboard like that certainly has a serial port
13:44imirkin_: (maybe even kgdb? that seems like too much to ask for)
13:45bonbons: it has serial port, tough have not tried it recently. For kgdb, well I never played with it yet!
13:46imirkin_: me neither
13:48bonbons: let me find a nullmodem cable for the serial console...
14:14bonbons: imirkin_: huh, the freeze seems to happen because of netconsole (might be flaky 8139too network card driver?)
14:15bonbons: with netconsole disabled I get: [ 991.177072] nouveau E[ PGR][0000:01:00.0] NOTIFY nsource: PROTECTION_ERROR nstatus: PROTECTION_FAULT
14:15imirkin_: and do you get a line below that?
14:15bonbons: [ 991.177132] nouveau E[ PGR][0000:01:00.0] ch 1 [Xorg[209]] subc 2 class 0x0042 mthd 0x0180 data 0x00003a04
14:15imirkin_: ah cool
14:16imirkin_: that's the same error as the other nv04 user was getting, so i think that this one's expected
14:16bonbons: and 3 further occurences (with sub 6 class, 0x00{44,43,19})
14:17imirkin_: yep, that's what he saw as well
14:17bonbons: now, for my toto list: find out what kills the system with netconsole active...
14:17imirkin_: i think it's the NvNull object being unhappy
14:17imirkin_: wait, so it all works fine with netconsole disabled?
14:18bonbons: yes, without netconsole system keeps running
14:18imirkin_: and you see things on the screen/etc?
14:19imirkin_: i.e. excessive debugging is the cause of this issue? :)
14:19bonbons: yes, I do. Excessive debugging, not even, just running netconsole to catch whatever kernel has to say in case it crashes!
14:25bonbons: imirkin_: what kind of condition is nv04_gr_intr() being run at (interrupt status, interrupts masked?, possible locks being held)?
14:26imirkin_: pppretty sure it's with interrupts disabled
14:26imirkin_: but not 100% sure
14:29imirkin_: bonbons: fairly sure it gets invoked by nvkm_mc_intr, which in turn is the function passed to request_irq
14:29imirkin_: so... whatever mode irq functions are run in.
14:29imirkin_: if it's not cool to print from intr, you should know that nouveau does that a lot
14:31bonbons: printing from interrupts should work (that's wat netpoll is for), unless there is something weird with the driver or some locks
14:31imirkin_: nothing i'm aware of... but who knows, these things all feed into one another
14:31imirkin_: perhaps try running with lockdep?
14:33bonbons: yeah, but I will first have a look at the 8139too driver (though no more this evening, too late already!)
14:34imirkin_: yeah... if you said the rtl8139 driver or hw had some bug, i would not be amazed ;)
14:34joi: it seems there's some problem with notifier object setup; as it depends on agp being enabled I wonder what will happen when you boot with agp disabled...
14:35imirkin_: joi: fwiw i've used a pci version of nv5 just fine
14:35bonbons: there were complaints about rtl8193 and netconsole in the past so might be there is something left :/
14:36joi: imirkin_: yes, but maybe notifier objects on nv4 should be set up differently?
14:36imirkin_: sure. tbh i never fully grasped all these 'objects'
14:36imirkin_: the other nv04 user also had agp
14:37imirkin_: i've only used nv05 with nouveau in a rather anachronistic system -- core i7-920
14:37joi: hehe
14:37imirkin_: no agp for me
14:37imirkin_: while i've been less-than-successful at getting rid of all my various junk, i'm trying not to acquire too much *more* of it
14:38joi: bonbons: can you boot with nouveau.agpmode=0 ?
14:40joi: this is the code I think might be relevant to PROTECTION_ERRORs: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau/nouveau_abi16.c#n464
14:42imirkin_: but afaik most people with agp cards don't hit problems
14:42imirkin_: [at least not related to this]
14:44bonbons: joi: booting with nouveau.agpmode=0 makes no visible difference
14:50bonbons: thanks to looking at it and sorry for the fuzz caused by netconsole triggering the freeze
14:50bonbons:getting some sleep now
14:51imirkin_: np :) always fun to play with ancient systems
14:51bonbons: that's what they are there for :)