03:22pmoreau: Are there any specific requirements before interacting with the flip mechanism - like flip_stop or flip_next -?
11:47Karlton: "[17889.874618] nouveau E[ DRM] GPU lockup - switching to software fbcon" Can I fix this without rebooting?
11:48imirkin: what are you trying to fix?
11:48imirkin: no way to get hw fbcon back afaik
11:48imirkin: but sw fbcon is just fine...
11:50Karlton: imirkin: I am trimming an api trace but KDE is slow as hell now after I broke it while trying to launch a game xD
11:50imirkin: ah ok. yeah, dunno... the gpu lockup recovery is variably successful
11:51imirkin: you can try a suspend/resume cycle
11:51specing: Karlton: you know what is funny?
11:51specing: [676056.784498] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
11:52imirkin: specing: are you using bumblebee?
11:52imirkin: and/or manually futzing with acpi
11:53specing: imirkin: nope
11:53specing: faulty G84
11:54specing: Im going to see how much it lasts on nouveau
11:54specing: currently at 21 days
11:54specing: nvidia-drivers lasts ~59 days
11:56Karlton: gawd, sw fbcon sucks!
11:57imirkin: Karlton: why are you even using fbcon?
11:58Karlton: imirkin: Isn't that where it goes by default after GPU lockup?
11:59imirkin: Karlton: fbcon = the thing that drives vt's
12:01Karlton: imirkin: Then what should I be using?
12:02imirkin: i dunno... X comes to mind
12:02Karlton: well I was before it died...
12:02imirkin: so... restart it
12:03specing: we don't just restart our computers
12:03imirkin: specing: glad you understand :)
12:04imirkin: up 64 days, 12:10
12:10specing: up 65 days, 6:38
12:10specing: har har
12:11imirkin: i'll catch up soon!
12:13imirkin: well that's just great. i managed to (accidentally) repro the "multiple buffers on list" issue
12:13imirkin: mlankhorst: looks like it happens when i use vdpau and drag the window around
12:15imirkin: mlankhorst: which is weird coz the thing that gets printed by nouveau as the rejected thing doesn't actually have the same buffer on there multiple times
12:15imirkin: it complains about the last buffer on the list though
12:16imirkin: mlankhorst: skeggsb: http://hastebin.com/ocuvocoxir.sm
12:16imirkin: this is on 3.17
12:16Karlton: imirkin: restarting X made everything freeze up, had to do a hard reboot
12:17imirkin: libdrm 2.4.60
12:17imirkin: Karlton: doh, i guess it didn't really recover
12:17imirkin: Karlton: i think suspend/resume tends to work more often as a "fix it" tactic
12:19imirkin: mlankhorst: skeggsb: looks like a new issue in 2.4.60 -- 2.4.59 works fine
12:20imirkin: which has all the thread-safe fixes :p
12:20imirkin: i'll bisect... but i kinda know where it's going to lead me...
12:25imirkin: mlankhorst: yeah, as i suspected, commit 5ea6f1c3262888 is first bad
12:34mlankhorst: uh oh o.O
12:37mlankhorst: imirkin: can you try to reproduce with valgrind?
12:38imirkin: valgrind's all happy
12:38imirkin: i can add debug things or valgrind flags or whatever you want... just let me know
12:38mlankhorst: bleh i should try to get my debugfs patches upstream, would have made this easier to debug
12:38mlankhorst: try with --free-fill=cc ?
12:40imirkin: valgrind's all good
12:41imirkin: looks like it's always the last buffer
12:41imirkin: and also the 13th buffer, but that could just be mplayer-specific
12:41imirkin: does it not repro for you?
12:41mlankhorst: yeah no surprise there..
12:41mlankhorst: I can only test on gk20a right now
12:41imirkin: mplayer -vo vdpau -- just move it around
12:41mlankhorst: which probably has no vdpau
12:41imirkin: no, not on the chip anyways
12:41imirkin: it has a separate video encoder/decoder thing
12:42mlankhorst: it's always the last buffer because of a lifetime thing, probably..
12:43mlankhorst: can you dump the BO in userspace?
12:43imirkin: "the bo"?
12:43imirkin: you mean the bo list?
12:44imirkin: which userspace? mesa before it submits? or in libdrm?
12:44imirkin: k, let's see if i can figure it out....
12:45imirkin: know offhand which function i should be looking at?
12:45mlankhorst: does the pushbuf ioctl unref somewhere?
12:48imirkin: so i set NOUVEAU_LIBDRM_DEBUG=1
12:48imirkin: which dumps all the things before submitting them
12:48imirkin: and it was the same as the thing printed when it errors
12:50mlankhorst: meh, can you check from lsof how many times card0 was opened?
12:51imirkin: let's say i don't know how to do that...
12:54mlankhorst: lsof /dev/dri/card*
12:55imirkin: that just shows me who's using it
12:55imirkin: not how many times it was opened
12:55imirkin: anywyas, right now i have 3 from X, and 1 plugin-container
12:56imirkin: mplayer also opens it 3 times
12:56imirkin: mlankhorst: http://hastebin.com/epitikoroy.vbs
12:56imirkin: (X also has 1 "mem" and 2 numbered fd's)
13:01mlankhorst: imirkin: eep, 2 fds. shouldn't be the case..
13:01imirkin: mlankhorst: X does the same thing
13:01mlankhorst: oh well then
13:01mlankhorst: imirkin: does glxinfo list vdpau_interop?
13:01imirkin: not to say that X is the metric for perfection, but merely pointing it out :)
13:02imirkin: probably. but mplayer doesn't use X
13:02imirkin: yeah, it's listed
13:02mlankhorst: ok probably fine then
13:02imirkin: this is just straight-up vdpau
13:03imirkin: but it does open/close the device a few times for good luck
13:03mlankhorst: imirkin: yeah but still should be fine, screen's shared.
13:03mlankhorst: does it use any kind of multithreading?
13:03imirkin: mesa 10.5.1 ftr
13:03imirkin: i *highly* doubt it
13:04imirkin: let's see what mpv does
13:04imirkin: ah, i've uninstalled it.
13:04imirkin: o well
13:05imirkin: oh, but fwiw, windowmaker likes to do that crazy X call when moving windows around
13:05mlankhorst: imirkin: hm, do pushbufs have refcounting for bo's on the validation list?
13:05imirkin: XBlockClients or osmething
13:05imirkin: i forget
13:05imirkin: the one that destroys the universe for a short while
13:06imirkin: tbh i dunno too much about all those various details
13:10joi: maybe mmt+demmt can answer this question?
13:13imirkin: ooh, good idea
13:15imirkin: how do i know which thing it errors on?
13:17joi: if you add -e ioctl-raw you should see ioctl return values
13:24joi: oh, actually, you don't need it - when ioctl returns error it's printed as "err: %d"
13:25imirkin: joi: how are ioctl errors displayed in demmt?
13:26joi: "err:" is appended to decoded text
13:26imirkin: not seeing that in the trace =/
13:26imirkin: running with -e all
13:29imirkin: joi: hm, well i guess it's never seeing the error
13:29imirkin: which is odd, since both the kernel and libdrm definitely complain
13:29joi: but you see "nouveau: kernel rejected pushbuf: Invalid argument" in dmesg when it happens?
13:30joi: eh, not in dmesg of course
13:30imirkin: that message is in the stdout/stderr of the app
13:30imirkin: dmesg has the validate failures
13:30joi: hmm, you could catch stderr/stdout with mmt
13:42joi: imirkin_: on completely different topic, did you look into why shaders@glsl-fs-lots-of-tex does not pass?
13:42imirkin: well, i guess we know the answer to that question -- no, i can't catch stderr/stdout
13:43imirkin: display updates died somehow... nothing in dmesg =/
13:44imirkin: i'll play with this some more later when my raid array rebuilds. gr.
13:45buhman: needs more ssd
13:45imirkin: meh, it's a bunch of 2T drives
13:46RSpliet: yes, because SSD are so reliable that you never have to rebuild your RAID array
13:46buhman: needs more magnetic tape
13:46imirkin: they don't make SSD's that big in my price range
13:46imirkin: (certainly not back when i bought this set up)
13:46imirkin: magnetic tape also not as big (or reliable) as many people believe
13:46imirkin: if you want a few T of data, the *cheapest* thing is a hdd
13:47buhman: I wonder if a suffiently large raid0 tape array would provide acceptable performance
13:47buhman: or, volume group composed of tape drives, or something like that
13:51imirkin: let me know when you try it out
13:51imirkin: joi: re glsl-lots-of-tex -- the test is wrong
13:51imirkin: joi: should just be removed
13:52imirkin: coz with the CSE that the glsl ir does, it doesn't even test the thing it claims
13:53joi: I'm wondering why does it pass on intel
13:54imirkin: coz it's not THAT wrong :)
13:54imirkin: it's only a little wrong
13:54imirkin: basically the intermediate value is like 2.55 / 256
13:54imirkin: it's expecting that it get rounded up to 3
13:55imirkin: but nvidia rounds it down to 2
13:55imirkin: (when storing to the RGBA8888 unorm surface)
13:55imirkin: however this is totally unspecified behaviour
13:56imirkin: (i might have the specific numbers wrong, but that's the general idea of why it fails)
14:55whompy: imirkin: I saw in the notes on the y-tiling patch that you need a piglit run on an nv50. I haven't run much with piglit on nouveau yet, but seem to remember hearing about threading issues or some such.
14:56imirkin: whompy: yeah, see http://people.freedesktop.org/~imirkin/ for how i run it
14:56imirkin: whompy: if you could just try that one piglit test first, that'd be great
14:57imirkin: apparently G80 has some sort of additional restrictions on 3d tiling that are gone in G84+
14:57imirkin: difficult to bring myself to care though :)
14:58imirkin: maybe if i found one i'd care more...
15:03whompy: Which test is it that you would like to see?
15:04imirkin: whompy: bin/texelFetch fs sampler3D 98x129x1-98x129x9 -auto -fbo
15:04whompy: Ok, thanks!
15:04imirkin: should fail on master, pass with my change
15:04imirkin: if it still fails, pastebin the text output and i'll have more questions for you :)
15:23whompy: imirkin: fails before, passes after on nv50 as intended.:)
15:23imirkin: whompy: awesome, thanks for checking
15:24whompy: No problem. Gave me an excuse to fix my piglit setup on here anyway.
15:24imirkin: this was with a nva5 right?
15:35imirkin: calim: thoughts on http://lists.freedesktop.org/archives/nouveau/2015-April/020449.html ? intuitively makes sense since we only had one tile mode to texture setup... but not sure why you had that limit in the first place
16:39imirkin: glennk: do you know if it's common to have 3d textures whose height is > 16 * 2^depth
16:39glennk: you mean z rather than height?
16:40imirkin: no, i mean height :)
16:40imirkin: basically there was an issue in the minification of 3d textures
16:41imirkin: s.t. if you ended up with a miplevel whose depth was 1 but height > 32, then fail.
16:41imirkin: wondering if this can happen in real life, or only piglit
16:41imirkin: [height > 16 on nv50]
16:43glennk: only thing i can think of are some fluid simulations with only a few z layers, but more detail in x/y
16:44imirkin: like water in some random game?
16:45glennk: those tend to be just flat 2d
16:46imirkin: what's a fluid simulation then?
16:47glennk: this one comes to mind http://www.nvidia.com/coolstuff/demos#!/box-of-smoke
16:48glennk: hmm, or cloud rendering
16:48imirkin: ah heh ok
16:48imirkin: so like fog
16:53calim: imirkin: it would be nicer if you moved that condition into nv50_tex_choose_tile_dims, just pass level0depth to it ...
16:54calim: or some such parameter
16:54calim: nice bug catch btw.
16:55calim: eeeehlegance :)
16:55imirkin: calim: hm, yeah, that makes sense too
16:55calim: the limit is what the blob does (or did), I figured it's a performance optimization so I copied it
16:56imirkin: ah ok. it (sorta) makes sense... good with keeping it
16:57imirkin: that last param will probably be more like layout_3d && pt->depth0 > 1