02:01 hakzsam: imirkin, uploading constbufs for compute using the 3d chan works fine, I'll drop that patch
03:18 karolherbst: oh meh, there might be differences in the zcull stuff with gf110, because nvidia optimized the zcull engines for tesseleation, maybe looking at pre gf110 is easier :)
05:02 karolherbst: imirkin: I think there will be a dri3 enabled intel ddx ebuild soon, I talked to some of the gentoo-x11 team
06:00 ashmew2: Thats nice
06:00 ashmew2: I had to build it myself
06:59 ashmew2: imirkin: I'm not sure if that bug about honoring the selected GPU is the same as my case.
06:59 ashmew2: But I dont know enough about DRI2 yet :)
07:13 karlmag: http://www.slackware.com/~karlmag/nouveau/debug/
07:13 karlmag: put a 650 Ti in my main machine and it freezes quite quickly.
07:13 karlmag: getting back-traces too
07:13 karlmag: see URL
07:15 karlmag: actually, it's not a full freeze.. the pointer can still move, but that's about it.
07:16 karlmag: dunno if that trace tells people able to read them anything useful.
07:17 imirkin: that's a _really_ high address... probably some sort of use after free? dunno.
07:18 karlmag: which address what? :)
07:18 imirkin: read fault at 3300131000
07:20 karlmag: machine has 64GB ram, 650 Ti has 2GB. 4 monitors: 3x 1920x1080 + 1x 1920x1200
07:21 imirkin: please tell me you're not running an i386 + PAE kernel...
07:21 karlmag: I'm pretty sure the trace talks about lib64 :)
07:22 karlmag: so no, I'm not.
07:22 karlmag: 4.4.1
07:22 karlmag: 64bit
07:22 imirkin: the stuff in xorg are just "gpu has hung" messages
07:23 imirkin: no other error-looking things in dmesg prior?
07:23 karlmag: anything else I can do (quick and fairly easy) to get anything more useful out of this?
07:23 imirkin: also if you can get kwin's output that'd be interesting
07:23 karlmag: nope... but let me double check
07:23 imirkin: did it have a pushbuf submit fail maybe?
07:23 imirkin: [which would cause a ton of junk to be printed to its stderr]
07:25 karlmag: nothing in dmesg
07:26 karolherbst: imirkin: I get "No provider of glXBindTexImageEXT found. Requires one of: GLX extension "GLX_EXT_texture_from_pixmap" when starting kwin with nouveau
07:26 karolherbst: but then it is just crashing
07:26 imirkin: errrr... tfp should be supported
07:26 karolherbst: could be prime related then
07:32 karlmag: not sure if kwin output to anywhere?
07:33 imirkin: maybe there's a ~/.xsession-errors?
07:34 karlmag: hmm.. nope..
07:36 karolherbst: maybe journalctl ;)
07:36 karlmag: heh... nah
07:39 karlmag: Feb 7 16:39:10 vorlon kernel: [ 2694.563172] nouveau 0000:01:00.0: fifo: PBDMA0: 04000000 [ACQUIRE] ch 2 [023faf0000 Xorg[1255]] subc 0 mthd 001c data 00001004
07:39 karlmag: syslog gets filled with that stuff
07:40 imirkin: karlmag: i recently pushed a change to mesa
07:40 imirkin: karlmag: which may or may not help with that sort of thing
07:40 karlmag: that's cute.. :-P -rw-r----- 1 root root 2266122040 Feb 7 16:40 syslog
07:41 karlmag: imirkin: ok
07:41 imirkin: karlmag: iow, my recommendation is to grab mesa-git and see what happens
07:42 karlmag: ... and I was just intending to do a quick test with that 650 Ti today... :-P
07:43 karlmag: imirkin: I'll consider my options. Thanks so far.
07:44 karlmag: the size of that syslog file is just hilarious
07:47 karlmag: I did one ls about 10 seconds before the one I pasted and also booted quite quickly after that.. the first ls was about 200MB less than the second, and the file at boot is about 200MB more still...
07:56 karolherbst: I guess something writes a lot of stuff in there
07:59 karlmag: karolherbst: looks like that message I pasted was written many times/second
08:03 karlmag: anything from about 5k up to over 37k lines in the log/second
08:03 karlmag: that I've seen now
08:06 karlmag: nah.. gives up now.. reverts to a different card. I'll revisit this sometime later I presume
09:15 karolherbst: any name suggestions for a "current_load" file inside debugfs?
09:15 karolherbst: I want to clean up my pmu counter stuff
09:16 imirkin: engine_counters?
09:17 karolherbst: mhh
09:17 karolherbst: I don't expose the counters though
09:18 karolherbst: currently I out in their {core,memory,video,pcie}_counter_value * 0xff / tick_counter_value
09:18 imirkin: so then maybe provide the values
09:19 imirkin: also that should be * 0x100
09:19 karolherbst: that would make the code more compicated on the pmu in general (both things)
09:19 karolherbst: sending over 4 32bit values isn't as easy than one
09:20 karolherbst: I also don't know if there is any use for those pmu counters except reclocking
09:21 karolherbst: and they aren't per application/channel/whatever, the are always for the complete gpu
10:36 karolherbst: I complety don't get the new nvif stuff...
10:36 karolherbst: is there some rules to the header files names or is it completly random
10:38 imirkin: karolherbst: examples please
10:38 karolherbst: ohhh there is class.h
10:38 karolherbst: yeah nvif/class.h helps a lot
10:44 karolherbst: skeggsb: is there already a place for the pmu engine counters in nvif?
12:46 Javantea: mmiotracing the nvidia driver into X isn't working (hang at black screen with underscore), is there any way I can debug it? Geforce GTX 970m?
12:47 karolherbst: thos stacks with the erroring mmiotracing looks real od
12:48 karolherbst: it is basically like this: do_page_fault; scheduler_stuff; nvidia_stuff; do_page_fault; nvidia_stuff and now my question is, why is nvidia stuff executed while handling a page fault :/
12:48 karolherbst: Javantea: ohhh depens
12:48 karolherbst: Javantea: check dmesg over ssh or something
12:48 karolherbst: or kernel logs
12:49 karolherbst: Javantea: and why do you want to trace? or what is wrong with nouveau on your gpu?
12:50 karolherbst: (despite being not able to provide acceleration due to missing firmware blobs)
12:50 Javantea: nouveau isn't supported on my gpu to my knowledge, I'd like to grab the firmware, get backlight working
12:50 Javantea: and then work toward acceleration
12:51 imirkin: Javantea: backlight is generally not handled by anything relating to nouveau
12:51 karolherbst: k, mmiotrace might help with backlight indeed, but that may or may not be true. Also you can't do anything for acceleration, because you will need firmware not released by nvidia yet
12:52 karolherbst: Javantea: is it a nvidia only laptop with intel gpu?
12:52 Javantea: nvidia only laptop, no intel gpu, works with nvidia driver
12:53 Javantea: imirkin: xbacklight -dec 10 works with nvidia driver, not with nouveau
12:53 karolherbst: Javantea: this could have different reasons, you should first check if there is a provider working inside /sys/class/backlight
12:54 karolherbst: but yeah, maybe nouveau messed up here
12:54 imirkin: Javantea: oh ok. didn't realize it was nvidia-only
12:55 Javantea: /sys/class/backlight has acpi_video0, but doesn't actually work
12:59 Javantea: so you guys mean that it's not possible to extract the firmware from nvidia driver?
12:59 imirkin: it's definitely possible
12:59 imirkin: it's in there somewhere :)
13:00 Javantea: I'm a reverse engineer, so I have a shot at finding it
13:00 imirkin: normally it shows up in the mmiotrace
13:01 imirkin: but... for gm20x it doesn't, because it's uploaded using a diff mechanism
13:01 Javantea: ah, okay seems straightforward
13:01 imirkin: if you say so
13:02 imirkin: i think the gpu dma's it from system memory
13:02 Javantea: dmesg shows [ 377.466748] mmiotrace: unexpected secondary hit for address 0xffffc90006001070 on CPU 0. which happens right when I modprobe nvidia
13:02 imirkin: so... you'd have to figure out what bit of system memory and when to read it :)
13:02 imirkin: yeah, we've been seeing that recently
13:02 Javantea: and [ 377.466766] BUG: unable to handle kernel paging request at ffff8800de000008
13:02 imirkin: it's been too long since anyone has even thought about how mmiotrace worked
13:02 imirkin: to remember precisely what that means or how to fix it
13:03 Javantea: right, the logs from November 2, 2015 say that it's probably an unhandled instruction in the nvidia kernel blob
13:04 karolherbst: Javantea: it is a known issue
13:04 karolherbst: imirkin: ohhh I plan to fix it
13:04 karolherbst: because it is annoying me too much now
13:04 karolherbst: I think I also know what the error is
13:04 Javantea: cool, I will look at different re methods for now
13:05 karolherbst: Javantea: you might want to play with the acpi
13:05 karolherbst: ...
13:05 karolherbst: acpi_backlight kernel paramter
13:06 karolherbst: it might be that nvidia does something, which might be also handled by the acpi backlight module
13:06 karolherbst: or so
13:06 karolherbst: but usually when nouveau gets loaded, the acpi_backlight module shouldn't be exposed anymore
13:07 Javantea: okay
13:08 karolherbst: imirkin: ohhh I bet something went wrong while the page_fault stuff was rewritten to c (or other asm->c conversions)
13:39 Old: ello
13:40 Guest48836: anyone on here?
13:42 Guest48836: if so and if anyone feels like it i have issues with an older mac (Imac g4 700 (half sphere)) and lubuntu 12.04, feel free to email me, vipersec.947@gmail.com
13:43 orbea: imirkin_: I built the latest stable MPlayer and I'm still getting the mesa user error... No dmesg spam at last though. http://dpaste.com/0J4P53K
13:43 orbea: *least
13:45 orbea: fixed paste: http://dpaste.com/09FBEGW
13:45 imirkin: Guest48836: you're unlikely to get any help with old software, sorry
13:46 imirkin: orbea: mplayer shouldn't use GL at all
13:46 imirkin: oh. you're using -vo gl.
13:46 imirkin: don't do that.
13:46 imirkin: use -vo vdpau
13:46 Guest48836: perfect, 3 months wasted.
13:47 imirkin: ....
13:47 imirkin: orbea: btw, that's a totally different error you're seeing
13:47 imirkin: orbea: mplayer is doing some glTexParameter with an invalid pname (probably wants GL_APPLE_something and doesn't check for it)
13:48 orbea: ah, my mistake
13:49 orbea: is there a reason to use vpdau instead of gl? I was testing both
13:49 imirkin: yeah. coz vdpau works.
13:49 orbea: heh, sounds good enough
14:25 karolherbst: weird, in case of this mmiotrace error, the kernel hangs on a task which does not exist with the command "nvidia-smi" which shouldn't be started at all...
14:26 Javantea: perhaps it's being executed by nvidia_drv.so
14:27 karolherbst: I do modprobe nvidia.ko
14:27 karolherbst: maybe there is a stupid udev rule
14:27 imirkin: probably :)
14:27 imirkin: or systemd being its usual helpful self
14:28 karolherbst: ohhh right, there is a udev rule
14:29 karolherbst: no, nvidia-drivers should never ever load nvidia by itself... how insane is this
14:30 karolherbst: what the hell: https://gist.github.com/karolherbst/0b55712f6f3a708841cd
14:31 karolherbst: why should anybody wants to call nvidia-smi after nvidia gets loaded
14:33 karolherbst: if that fixes it (I really hope not)....
14:33 karolherbst: ....
14:33 karolherbst: wtf
14:34 karolherbst: k, now something serious
14:37 karolherbst: okay
14:38 karolherbst: it seems like mmiotrace has no problems handling the module loading part
14:38 karolherbst: but if userspace actually tries to use the nvidia stuff, then it gets messed up
14:53 karolherbst: ohh yeah, corrupted journals again :/
14:53 imirkin: yay systemd
14:57 karolherbst: the more I look at the stack the more strange it looks
14:58 karolherbst: this part bothers me most: https://gist.github.com/karolherbst/0b55712f6f3a708841cd
14:58 karolherbst: is there some kind of scheduling happening or something?
15:00 karolherbst: and __do_page_fault+0x261 points to arch/x86/mm/fault.c:1096 which is the vmalloc_fault call (RIP: 0010:[<ffffffff81087db1>] [<ffffffff81087db1>] vmalloc_fault+0x1c1/0x280)
15:01 karolherbst: and this vmalloc_fault points to ./arch/x86/include/asm/pgtable.h:567
15:01 karolherbst: return (pte_t *)pmd_page_vaddr(*pmd) + pte_index(address);
15:32 karolherbst: now the stupid idea: I will just let the kerne do its thing on the second hit :)
15:33 karolherbst: "mmiotrace: 4MB pages are not currently supported: 0xffffc90010001070" mhhh
15:37 karolherbst: ohhh
15:37 karolherbst: imirkin: I think I got it then
15:37 karolherbst: yeah that makes sense now
15:38 karolherbst: see how that address offset is bigger than 1024?
15:38 karolherbst: mmiotrace seems to handly only small pages (or whatever it does) and only loads 4k into that address
15:39 imirkin: ohhhh it's a 4MB page?
15:39 karolherbst: yeah
15:39 imirkin: kill 'em
15:39 imirkin: kill 'em with fire
15:39 karolherbst: so we get a second hit at the same address
15:39 imirkin: or make mmiotrace work with it
15:39 imirkin: you should see if there's an easy way to break it up into 4K pages
15:40 imirkin: er actually... hm
15:40 imirkin: no reason you couldn't do this with a 4MB page either
15:40 imirkin: except iirc the way those work is that all the info is in the PDE only, and there are no PTE's
15:41 karolherbst: mmiotrace just intercepts those page faults anyway and they should get normally handled
15:41 karolherbst: maybe mmiotrace just doesn't know the address+offset was already handled
15:41 karolherbst: and pagefaults
15:41 karolherbst: but the kernel things, wait I already handled that one
15:43 karolherbst: or something like that
15:43 karolherbst: coult be part of the generic tracer code though
21:48 Jayhost: Any nightcrawlers?
22:39 karolherbst: ohh yeah nice, we are competitive with a 710, big deal :D
22:59 Arbition: heh I read that
23:01 Arbition: GK208, I wonder what similarities that shares with my laptop chip
23:01 karolherbst: what do you have?
23:01 Arbition: 730M
23:01 Arbition: GK208M
23:01 karolherbst: then quite a lot
23:01 karolherbst: like nearly everything
23:02 karolherbst: except the vbios
23:02 Arbition: so the M part means very little?
23:02 karolherbst: it usually means nothing
23:02 karolherbst: the mobile chips are still a bit slower in general
23:03 karolherbst: though not much
23:03 Arbition: thats to do with power envelopes though, more than yield or chip arch
23:03 Arbition: at least I would have thought
23:03 karolherbst: I have a mobile GK106 chip and it is just around 30% slower than the fastest desktop one
23:03 karolherbst: nah, it is the same chip
23:04 karolherbst: maybe the vbios is more tuned regarding power consumption though
23:04 karolherbst: or they use lower voltages than compared to the desktop ones
23:04 Arbition: well I was here a couple of months ago with funny reclocking issues which may be related to that...?
23:04 karolherbst: maybe, maybe not
23:05 Arbition: anyway, now that a new kernel is out, I might have another go with reclocking
23:05 karolherbst: mhh
23:05 karolherbst: it didn't change much though
23:05 Arbition: hmm
23:05 Arbition: well the issue I had was it'd lock up
23:05 karolherbst: voltage is a problem still