01:06 RSpliet1: imirkin: thanks for figuring out #90350
04:43 RSpliet: pmoreau: do you still have those test shaders?
04:43 RSpliet: think you can repeat the "sat" test for the MAD operation?
04:45 RSpliet: it strikes me odd that MUL can't do SAT but MAD can... and it would be interesting to figure out if we should replace MUL + SAT MOV to SAT MAD with 0 ($r0?) as third parameter in some cases
06:54 inglor: Hi all, got an issue with nouveau in my system. I get in logs a lot of errors for PDISP like this dmesg error "[ 11.459335] nouveau E[ PDISP][0000:01:00.0] 0x0e5c: 0x00000001"
06:54 inglor: I believe it's related to GRUB settings - does anyone have some idea where I should be looking?
06:55 inglor: Other than the errors X loads fine (with GDM and Gnome 3)
06:56 tobijk_: inglor: i dont think that has to do with grub settings, but just out curiosity, what settings do you have there? :D
06:59 inglor: tobijk_, my grub settings: https://pastebin.mozilla.org/8833018
07:00 inglor: Up until some time ago I used to have Nvidia drivers so I think it might be some leftover setting, maybe something to do with console setting?
07:01 inglor: I generally followed the wiki from Archlinux: https://wiki.archlinux.org/index.php/Nouveau . Maybe it's seomthing to do with KMS?
07:04 tobijk_: inglor: nothing that special in the grub config, you may just have bad luck i'M afraid
07:05 inglor: Bad luck with what? To be honest the error is so cryptic so I don't know what to think..
07:05 tobijk_: inglor: can you get me the full dmesg?
07:11 inglor: tobijk_, here is it: http://fpaste.org/220390/
07:15 tobijk_: oh wow that is really noisy, but you say it actually works? :O
07:15 inglor: tobijk_, yeah awesome yes! ? :P
07:18 tobijk_: i dont know that much about pdisp at all i have to admit, can you file a but at bugs.freedesktop.org?
07:19 tobijk_: Product: xorg - Component: Driver/nouveau
07:32 inglor: tobijk_, I'm not sure how to find the Version of nouveau .. It's not part of the package - but I suspect it's the latest
07:36 inglor: tobijk_, Bug reported: https://bugs.freedesktop.org/show_bug.cgi?id=90388
07:37 inglor: Thanks for help
07:44 tobijk_: inglor: sorry was away for a while, thanks for the report!
07:51 inglor: no prob i hope it's solved
07:51 imirkin: RSpliet: np. pmoreau did test both mul and mad. mad.sat works, mul.sat doesn't. very surprising.
07:53 tobijk_: imirkin: any idea about the pdisp errors?
07:53 imirkin: someone bisected the errors they were getting to a commit between 3.15 and 3.16 -- https://bugs.freedesktop.org/show_bug.cgi?id=90276
07:54 tobijk_: imirkin: its woth 4.0.x
07:54 tobijk_: *with
07:55 imirkin: tobijk_: and 4.0 is after 3.16 right?
07:55 tobijk_: yeah so i hoped it to be fixed in 4.0
07:55 imirkin: why?
07:55 imirkin: the person doing the bisects in that bug said the issues occur with latest
07:56 RSpliet: imirkin: interesting... thanks
07:57 inglor: imirkin, tobijk_ Might be related, though the bug is using dual screens - I'm using 1 screen over Display Port
08:01 imirkin: inglor: ah ok. are there additional issues on top of the errors, or just the dmesg spam?
08:02 inglor: imirkin, nothing I've noticed so far.
08:03 inglor: I raised a ticket: 90388
08:03 imirkin: inglor: ah ok. btw note that if you plan on playing games you should boot with nouveau.pstate=1 to be able to move to higher clock speeds.
08:04 inglor: that's a kernel param? (no time to play games nowdays :/ )
08:04 imirkin: it's a module parameter
08:04 inglor: ok thanks
08:04 imirkin: if nouveau is a module you can also stick it into modprobe.conf
08:06 inglor: Is this available only on the latest driver? (the pstate thing)
08:06 tobijk_: inglor: after that you can change ther perf-lvl, but this may not always work
08:06 tobijk_: inglor: that is available since 3.18 or .17
08:07 inglor: ok cool, maybe the archwiki needs some updates then :D
08:21 bonbons: imirkin: for whatever reason my riva systems freezes hard on X startup, with "[ 777.877189] nouveau E[ PGR][0000:01:00.0] NOTIFY nsource: PROTECTION_ERROR" as last message from kernel
08:22 bonbons: while X's last message that gets through ssh is about switching resolution
08:24 bonbons: (nouveau.debug=debug, increasing Xorg verbosity does not show anything extra either)
08:26 imirkin: bonbons: that error is expected on all NV04 systems, and not the cause of your issue
08:27 bonbons: any idea how to find out the cause?
08:27 imirkin: iirc it has to do with the ddx binding the null object and it doesn't like that (but nv05 is perfectly happy with it)
08:28 imirkin: bonbons: sorry no... this is the Diamond one right?
08:28 bonbons: Diamond?
08:28 imirkin: the brand
08:29 bonbons: no, it's "onboard" one of SR440 Intel mainboard
08:29 imirkin: oh innnnteresting. does it get reported as NV04 or NV0A?
08:31 bonbons: Chipset: NV04 (NV04)
08:33 imirkin: hm, ok. what's the pci id?
08:33 imirkin: (lspci -nn -d 10de: )
08:36 RSpliet: airlied, skeggsb: mind rolling a new libdrm for F21 too?
08:37 bonbons: imirkin: 10de:0020, NVIDIA Corporation NV4 [Riva TNT]
08:38 imirkin: ok cool. so just the standard NV4 then. a little surprising, i would have expected onboard to be different somehow
08:39 tobijk_: RSpliet: wait a bit, .61 breaks my ivy-bridge system, i try to find out what it is right now
08:39 RSpliet: tobijk_: rolling an RPM doesn't necessarily mean shipping it yet
08:39 RSpliet: it'd make it easier for others to test
08:39 imirkin: tobijk_: "The new comment is a bit upside down, but thats not really a problem" -- what does that mean?
08:40 tobijk_: i expected the loop to have a !*->priv in it after reading the comment at first
08:40 imirkin: why?
08:41 imirkin: the while condition is when it's true...
08:41 tobijk_: yeah its all fine, i was just a bit mislead first :)
08:41 imirkin: anyways, i sorta see what you mean
08:41 imirkin: but not a big deal
08:42 tobijk_: yeah the other half sentence meant to say that ;-)
08:42 imirkin: right, just wanted to confirm
08:42 imirkin: i'm also gonna go and add a ton of comments to the heap
08:42 imirkin: this isn't the first time i've figured out how it works
08:43 imirkin: but every time i forget and look at it and it's complex
08:43 bonbons: imirkin: I guess it's just the addon-card merged into the mainboard :). Full kernel log: https://pastebin.mozilla.org/8833019
08:43 imirkin: haha -- nouveau.runpm=1 -- you wish!
08:43 tobijk_: RSpliet: well rolling dist are fast today :/ opensuse already has it in its latest snapshot :o
08:44 imirkin: bonbons: i assume you can't ssh in either?
08:44 imirkin: bonbons: and netconsole doesn't provide more useful info?
08:45 imirkin: bonbons: also is there a kernel on which this all worked ok?
08:46 RSpliet: tobijk_: Fedora is not really a rolling release distro. besides, there's a community testing mechanism in place that can be used to bring a patch from build to updates
08:46 bonbons: once I have started Xorg the machines goes 100% mute, no netconsole output anymore, not sysrq response on PS/2 keyboard, not ping answers (and no SSH either)
08:47 bonbons: well all the kernels I tried on it now can't start X and run it successfully. So I guess one of the userspace updates (DDX, Mesa, ...) might have triggered the freeze-always case
08:47 bonbons: though I can't tell for sure.
08:47 tobijk_: RSpliet: ah interesting, anyway i just try to warn about it, i just shows after a full x stack rebuild, who knows why...
08:49 bonbons: imirkin: a month or two ago when I took it out and installed Gentoo on it (compiled on a more powerful system ;) ) I got X working with XFCE on top of it. Since then, no luck and all kernels fail the same (3.19.3, 4.0, 4.0.2) - should be the 3.19.3 that worked in the beginning
08:50 imirkin: silly question, but you didn't happen to do -march=native on said more powerful system, did you?
08:53 bonbons: no I didn't, it's -march=pentium2 -O2 -fomit-frame-pointer
08:55 bonbons: with -march=native I don't think it would run a long time in userspace... too many extra instructions added since then!
08:58 bonbons: Xorg stack: xf86-video-nouveau-1.0.11, mesa-10.3.7, xorg-server-1.16.4, libdrm-2.4.59
09:04 imirkin: if it's not too difficult, mind trying some older kernels?
09:04 imirkin: like 3.14 or something
09:06 imirkin: i should probably whip out my NV5 and see if that still works but... not gonna happen today
09:07 bonbons: imirkin: that's not too difficult, matter of an hour or so :)
10:12 bonbons: imirkin: 3.14.41 produces the same results, kernel&xorg logs at https://pastebin.mozilla.org/8833021
10:13 imirkin: hm, i was hoping it was a recently-introduced issue
10:13 imirkin: there have been a number of cleanups
10:14 imirkin: sorry i'm of little help. perhaps skeggsb will have some ideas of what to try out.
10:14 imirkin: i tend to be a bit out of my depth when it comes to things outside of mesa =/
10:17 bonbons: heh, no problem with that :), as kernel and console are usable it's a good start so nouveau is not fully broken
10:17 imirkin: one thing to try might be xf86-video-nouveau-1.0.10 or 1.0.9
10:17 imirkin: i think .11 added a bunch of stuff that might subtly destroy nv4?
10:18 imirkin: (vblank-related things, which i could easily imagine were improperly implemented there)
10:18 hakzsam: imirkin, Hi, I'll investigate tomorrow about the issue you reported to me two days ago. As you have already pinpointed, we need to instantiate a compute object, but if nvf0 is really different of nve0, I'll only fix the crash in the first time (it's not my priority to implement compute support for nvf0)
10:18 imirkin: hakzsam: yeah, just make it not crash. don't worry about compute on nvf0
10:18 bonbons: that I can do as well, though if I can get some more debugging output that would be cool
10:19 imirkin: bonbons: i'm all for having more debugging output ;)
10:19 imirkin: you could do nouveau.debug=trace drm.debug=0xe
10:19 hakzsam: imirkin, yes, anyways, my plan is to move MP counters to the kernel in the future :)
10:20 bonbons: is there some more or less advanced kms test-suite around which I could run against nouveau/kms?
10:21 imirkin: bonbons: drm has a 'modetest' tool which can do various things
10:21 imirkin: (like set various modes, and, on nv04, should be able to set up the video overlay as well)
10:22 bonbons: yeh I found those tools in libdrm, but I have a hard time understanding how to use them...
10:22 imirkin: oh, you could also set NoAccel in your xorg.conf to see if it's an acceleration-gone-wrong thing, or if it's a mode thing
10:25 imirkin: RSpliet: btw, if you could double-check that it all works on your G200, that'd be great. although the fact that it worked on nvac should be enough for it to work on G200
10:29 bonbons: imirkin: noaccel seems to work much better... at least X starts and machine is still responsive
10:29 bonbons: that is on 3.14.41
10:30 imirkin: hrmmmmm... maybe that protection error isn't as innocuous as i had thought
10:31 imirkin: what if you move nouveau_vieux_dri.so out of the way but keep acceleration enabled?
10:34 bonbons: NoAccel also successful on 4.0.2
10:36 imirkin: and if it still dies, start adding prints to the driver like it was going out of style. i think there's also some debug stuff that can be enabled.
10:36 imirkin: (i mean xf86-video-nouveau)
10:37 bonbons: imirkin: renaming both dri/nouveau_vieux_dri.so and mesa/nouveau_vieux_dri.so away does not protect against freeze
10:46 bonbons: Accel=none is fine, Accel=glamor freezes, Accel=exa freezes
11:02 bonbons: I get then same results with xf86-video-nouveau-1.0.10, now trying to find out where to insert prints in the DDX...
11:03 imirkin: yeah, i didn't expect glamor would do any good ;)
11:10 imirkin: bonbons: when in doubt, add a print
11:10 bonbons: well, question is where to start as I have not idea about the execution flow!
11:12 imirkin: start in nv_driver.c
11:12 imirkin: also look at nv_accel_common and nv04_exa.c
11:22 bonbons: looking at strace output it is rather soon after printing "resize called" in drmmode_xf86crtc_resize()
11:23 imirkin: yeah, but that happens with accel disabled as well
11:46 bonbons: with accel it exists drmmode_xf86crtc_resize() right after the print, then it attempts ioctl(13, 0xc0406481 with no further output
11:47 joi: bonbons: you might try valgrind mmt to trace xorg
11:50 joi: 0xc0406481 is DRM_NOUVEAU_GEM_PUSHBUF
11:50 joi: which is used to submit commands to the gpu
11:51 joi: mmt with demmt will tell you what are those commands
12:27 bonbons: joi: cool, going to try (and hope valgrind will accept to start without extra symbol information, otherwise trying out will take some preparation time!)
12:50 imirkin_: joi: does it record on ioctl submit or return? sounds like the hang happens in the kernel...
12:51 imirkin_: bonbons: which pushbuf submit does it hang on?
12:51 imirkin_: (you're looking for PUSH_KICK or nouveau_pushbuf_submit)
12:53 joi: imirkin_: mmt catches both
12:53 imirkin_: ok. hopefully it makes it to disk (or nfs root? i hope... for the sake of your fs...)
12:53 joi: bonbons: mmt does not care about debug info
12:55 bonbons: hopefully it can get the logs out via ssh properly or does proper fsync/fdatasync, otherwise most interesting details will be missing!
12:58 joi: you can patch mmt to fsync after each entry
13:00 joi: it's as easy as: http://fpaste.org/220458/raw/
13:04 bonbons: imirkin_: calling nouveau_pushbuf_kick(0x8cf4d18, 0x8cf4cc8) from xf86-video-nouveau-1.0.11/src/nv04_exa.c:158
13:05 bonbons: that is NV04EXASolid()
13:05 imirkin_: bonbons: not that it really matters, but did you happen to log w/h?
13:06 bonbons: not yet, though it could be 1280x1024 if it's screen size
13:07 imirkin_: hmmm... i wonder if the max is 1024
13:07 imirkin_: that'd be sad.
13:08 imirkin_: hmmmm
13:09 joi: (demmt decodes commands on return from ioctl, so you will need this patch to see the last ioctl: http://fpaste.org/220459/raw/)
13:11 bonbons: imirkin_: calling PUSH_KICK(0x9abfd18) from NV04EXASolid(), pPixmap=0x9aedec8, w=1280, h=1024, x=0, y=0, x2=1280, y2=1024
13:11 imirkin_: ok. well good to know it's not like -1 or something dumb like that
13:14 imirkin_: uhm... wtf?!
13:14 imirkin_: NVAccelInitRectangle is never run
13:15 imirkin_: oh. there's a macro.
13:20 imirkin_: bonbons: ftr, all pre-nv50 chips use that same EXASolid logic
13:22 joi: nv04_graph_intr should print channel info after "NOTIFY nsource: PROTECTION_ERROR"
13:22 joi: so it probably crashes in nouveau_client_name
13:22 imirkin_: that'd be a particularly sad place to die in =/
13:22 joi: yup
13:23 imirkin_: btw, he's on 32-bit... might be relevant.
13:29 joi: actually, before channel info there's supposed to be "nstatus:..."
13:30 imirkin_: this is what it should look like: https://bugs.freedesktop.org/show_bug.cgi?id=68854
13:36 bonbons: joi: I might be missing something though valgrind fails on me with: Xorg: ../sysdeps/x86_64/cacheinfo.c:304: handle_intel: Assertion `maxidx >= 2' failed.
13:36 imirkin_: uhm. x86_64?
13:36 joi: heh, that's valgrind core :(
13:37 bonbons: good question! might be it does not compile properly under linux32 personality on x86_64...
13:39 joi: it's better, it's different source tree
13:40 joi: there's no cacheinfo.c in our valgrind
13:40 imirkin_: bonbons: i don't see how you're getting a nsource print, but no nstatus..
13:41 imirkin_: bonbons: see gr/nv04.c:nv04_gr_intr
13:41 bonbons: imirkin_: aren't those two in distinct printk() calls?
13:41 imirkin_: yeah, but there's nothing between them
13:41 imirkin_: and the bitfield_print is a pretty innocuous function as well
13:43 bonbons: imirkin_: unless system gets broken in-between due to pci/network calls (too few packets for loss of netconsole UDP packets I would say)
13:43 imirkin_: is serial an option btw?
13:43 imirkin_: old motherboard like that certainly has a serial port
13:44 imirkin_: (maybe even kgdb? that seems like too much to ask for)
13:45 bonbons: it has serial port, tough have not tried it recently. For kgdb, well I never played with it yet!
13:46 imirkin_: me neither
13:48 bonbons: let me find a nullmodem cable for the serial console...
14:14 bonbons: imirkin_: huh, the freeze seems to happen because of netconsole (might be flaky 8139too network card driver?)
14:15 bonbons: with netconsole disabled I get: [ 991.177072] nouveau E[ PGR][0000:01:00.0] NOTIFY nsource: PROTECTION_ERROR nstatus: PROTECTION_FAULT
14:15 imirkin_: and do you get a line below that?
14:15 bonbons: [ 991.177132] nouveau E[ PGR][0000:01:00.0] ch 1 [Xorg[209]] subc 2 class 0x0042 mthd 0x0180 data 0x00003a04
14:15 imirkin_: ah cool
14:16 imirkin_: that's the same error as the other nv04 user was getting, so i think that this one's expected
14:16 bonbons: and 3 further occurences (with sub 6 class, 0x00{44,43,19})
14:17 imirkin_: yep, that's what he saw as well
14:17 bonbons: now, for my toto list: find out what kills the system with netconsole active...
14:17 imirkin_: i think it's the NvNull object being unhappy
14:17 imirkin_: wait, so it all works fine with netconsole disabled?
14:18 bonbons: yes, without netconsole system keeps running
14:18 imirkin_: and you see things on the screen/etc?
14:19 imirkin_: i.e. excessive debugging is the cause of this issue? :)
14:19 bonbons: yes, I do. Excessive debugging, not even, just running netconsole to catch whatever kernel has to say in case it crashes!
14:25 bonbons: imirkin_: what kind of condition is nv04_gr_intr() being run at (interrupt status, interrupts masked?, possible locks being held)?
14:26 imirkin_: pppretty sure it's with interrupts disabled
14:26 imirkin_: but not 100% sure
14:29 imirkin_: bonbons: fairly sure it gets invoked by nvkm_mc_intr, which in turn is the function passed to request_irq
14:29 imirkin_: so... whatever mode irq functions are run in.
14:29 imirkin_: if it's not cool to print from intr, you should know that nouveau does that a lot
14:31 bonbons: printing from interrupts should work (that's wat netpoll is for), unless there is something weird with the driver or some locks
14:31 imirkin_: nothing i'm aware of... but who knows, these things all feed into one another
14:31 imirkin_: perhaps try running with lockdep?
14:33 bonbons: yeah, but I will first have a look at the 8139too driver (though no more this evening, too late already!)
14:34 imirkin_: yeah... if you said the rtl8139 driver or hw had some bug, i would not be amazed ;)
14:34 joi: it seems there's some problem with notifier object setup; as it depends on agp being enabled I wonder what will happen when you boot with agp disabled...
14:35 imirkin_: joi: fwiw i've used a pci version of nv5 just fine
14:35 bonbons: there were complaints about rtl8193 and netconsole in the past so might be there is something left :/
14:36 joi: imirkin_: yes, but maybe notifier objects on nv4 should be set up differently?
14:36 imirkin_: sure. tbh i never fully grasped all these 'objects'
14:36 imirkin_: the other nv04 user also had agp
14:37 imirkin_: i've only used nv05 with nouveau in a rather anachronistic system -- core i7-920
14:37 joi: hehe
14:37 imirkin_: no agp for me
14:37 imirkin_: while i've been less-than-successful at getting rid of all my various junk, i'm trying not to acquire too much *more* of it
14:38 joi: bonbons: can you boot with nouveau.agpmode=0 ?
14:40 joi: this is the code I think might be relevant to PROTECTION_ERRORs: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau/nouveau_abi16.c#n464
14:42 imirkin_: but afaik most people with agp cards don't hit problems
14:42 imirkin_: [at least not related to this]
14:44 bonbons: joi: booting with nouveau.agpmode=0 makes no visible difference
14:50 bonbons: thanks to looking at it and sorry for the fuzz caused by netconsole triggering the freeze
14:50 bonbons:getting some sleep now
14:51 imirkin_: np :) always fun to play with ancient systems
14:51 bonbons: that's what they are there for :)