00:04karolherbst: skeggsb: https://github.com/karolherbst/nouveau/commits/clk_fixes
00:04catphish: unigine just gives me a black screen :(
00:06imirkin_: catphish: you have to be running a compositor for dri2-based offloading to work properly
00:06catphish: i would have assumed i was
00:06imirkin_: catphish: btw, i'd recommend switching to DRI3
00:06catphish: i'd prefer to do that
00:06imirkin_: (that optimus guide has explanations)
00:06catphish: i'm not clear how to switch to DRI3
00:07catphish: mostly not sure if my builds of everything support it
00:07imirkin_: you're using the intel ddx... you have to enable it somehow
00:08imirkin_: or you're using an old intel ddx maybe?
00:08imirkin_: ah no, looks like it's fine...
00:08imirkin_: iirc you have to do like Option "DRI3" "on"
00:08imirkin_: or something
00:09imirkin_: Option "DRI" "3"
00:09imirkin_: although allegedly 3 should be on by default
00:09ClaudiusMaximus: i upgraded my laptop kernel to 4.15, still have the bug
00:10imirkin_: ClaudiusMaximus: yeah, i'm sure that bug is not affected by anythign other than mesa
00:10imirkin_: karolherbst: + if (IS_ERR_VALUE(ret) && ret != -EACCES)
00:10ClaudiusMaximus: imirkin_: ok, thanks
00:10imirkin_: what if ret == -EACCES?
00:11catphish: how do i check if it's using DRI3?
00:11imirkin_: LIBGL_DEBUG=verbose glxinfo > /dev/null
00:11imirkin_: should say if it's using DRI2 or DRI3
00:11catphish: libGL: Using DRI2 for screen 0
00:11catphish: brb changed a setting, sorry no irc bouncer
00:12ClaudiusMaximus: briefly tried out the pstate stuff too, my laptop only has nvidia card so i assume it's always on so always safe to change? or does xorg need to be running at least (didn't try without it)?
00:12karolherbst: imirkin_: no clue, I just copied how it was used elsewhere
00:12karolherbst: I am sure I looked into it and it was fine when I did
00:13karolherbst: imirkin_: EACCES: 'power.disable_depth' is different from 0
00:13karolherbst: and devices only suspend if disable_depth == 0
00:14stratact: lachs0r: I just upgraded to 4.16.1 and used the nvidia drivers and I had no issues with it so far.
00:15stratact: lachs0r: how did you get the data lost though?
00:15lachs0r: memory corruption
00:16karolherbst: skeggsb: there is an unchecked pm_runtime_get_sync call inside nouveau_display_hpd_work by the way
00:16lachs0r: also freezing in the worst possible moments
00:17stratact: lachs0r: I'll those 2 in mind
00:17lachs0r: I can reproduce the latter very reliably by switching my projector output on and off via xrandr
00:18catphish: libGL: Using DRI3 for screen 0
00:18lachs0r: seems to be easier to trigger with more recent drivers
00:18lachs0r: (it now happens instantly)
00:19catphish: and unigine now works :)
00:20imirkin_: and you don't have to do the explicit offload thing now
00:20catphish: i noticed that, works great :)
00:20catphish: thank you for helping!
00:21imirkin_: np. hope you get higher framerates with nouveau on unigine than the intel chip
00:22imirkin_: afaik in most lower-end optimus setups it's barely any win at all
00:23catphish: it used to be significant using the binary driver, just a lot of hassle
00:23catphish: i'll compare
00:23imirkin_: well, with nouveau you'll get like 60% of the perf
00:23catphish: well that's a start
00:24imirkin_: and on maxwell there are known artifacts
00:24imirkin_: (although their cause is not understood)
00:24catphish: i might try this with the binary driver again some time if it supports this method of offload, but nouveau might be good enough for my needs
00:24imirkin_: in the future, pick an amd gpu :)
00:25catphish: the difference for me is 16fps vs 20
00:26catphish: *13 fps vs 20
00:26imirkin_: intel vs nouveau? or nouveau vs blob?
00:26catphish: intel vs nouveau
00:26imirkin_:hopes nouveau is the 20...
00:27catphish: it is :)
00:27imirkin_: in what? some unigine demo?
00:28catphish: tested again, unigine demo, yes, on lower settings, 14 fps vs 24 fps
00:28catphish: so definitely worthwhile (if you can ignore the heat / fans)
00:29imirkin_: well this thing ain't magic... gotta put some power through it in order to make the pixels go
00:30catphish: well that's great anyway, offloading that actually works as it's supposed to
00:30catphish: does the binary nvidia driver support this kind of offload?
00:31imirkin_: they definitely didn't before. i believe with GLvnd they might? not sure.
00:31imirkin_: part of the sticking point is that you can only have one libGL... and mesa's knows how to load various drivers
00:31catphish: oh well, probably not worth breaking my perfectly good setup
00:31imirkin_: nvidia's doesn't
00:31catphish: i see
00:31imirkin_: (but of course mesa's can't load nvidia's...)
00:32imirkin_: so now there's a shared libGLvnd frontend
00:32imirkin_: which redispatches stuff
00:32imirkin_: so that's part of the problem
00:32imirkin_: the other part is the buffer sharing/etc
00:32imirkin_: i *think* they've been implementing dma-buf support in their drivers
00:32catphish: when are nvidia releasing a nice open source driver that supports everything and integrates properly?
00:32imirkin_: but again, no first-hand knowledge
00:32catphish:backs away slowly
00:32imirkin_: whenever you stop buying their hw
00:33imirkin_: hence ... get a nice amd board next time.
00:33catphish: tbh i think with intel doing amd GPUs on-chip, things may be about to change
00:33catphish: i only have nvidia by default
00:33imirkin_: so until that changes, nvidia has no interest in releasing a nice open-source driver.
00:34catphish: is AMD's better?
00:34imirkin_: which is why advise anyone who will listen to stop buying nvidia
00:34imirkin_: AMD has a dedicated engineering team making open-source drivers.
00:34imirkin_: (as does intel)
00:34imirkin_: (but intel doesn't make powerful add-on boards)
00:35imirkin_: in the meanwhile, nvidia has made it impossible for nouveau to do reclocking on GM20x+
00:35imirkin_: (ok, we might get lucky for GM20x, but definitely GP10x is out)
00:35catphish: seems odd, reclocking seems like a core function
00:36imirkin_: part of the whole signed firmware thing
00:36imirkin_: so we can't provide our own firmware
00:36catphish: is this stuff all reverse engineered, or do they release limited public specs?
00:36imirkin_: and they won't release theirs in a redistributable fashion
00:36imirkin_: mostly RE'd
00:36catphish: oh, firmware too, eww
00:36imirkin_: they have, on occasion, provided clarifications
00:37imirkin_: and they have provided "more" documentation on how to operate the displays
00:37imirkin_: here's everything they've ever released
00:37catphish: well i'm not going to read that, because i value my sanity and i should sleep now
00:37catphish: thanks again
00:38catphish: wow that's "sparse"
00:38imirkin_: most of the info was already known at time of release
00:38imirkin_: they did clarify a few things which were nice
00:38catphish: intel need to get into high spec GPUs
00:38imirkin_: and it's nice to move things from the "probably" to the "certainly" pile
00:39juri_: ooh. more reading material. thanks.
00:53nbtenney: skeggsb, imirkin: I'm apparently the other user that actually uses separate AC and DC settings. Always seems to work fine here on a tesla
00:58nbtenney: Anyone hear tell of issues with garbage rendering upon resume from suspend with nouveau?
00:58imirkin_: like a single frame?
00:58imirkin_: or forever?
00:58nbtenney: Seems to be triggered by gdm. Switching to TTY and restarting the service makes it less angry
00:58imirkin_: i had not heard of such issues.
00:59nbtenney: I've had near zero time with the machine lately, and I'm debating whether I care to investigate
01:21nbtenney: Another thing I've been experiencing forever and have been too lazy to report...
01:21nbtenney: This pops up every boot: ix.io/17qE
01:21nbtenney: No clue if it actually matters or what it means
01:25imirkin: that's fine
01:25imirkin: just something in our parser that's going too far
01:42imirkin: ClaudiusMaximus: repro'd on G92 but not GF108
01:44nbtenney: Nice. I'm searching around for any logs/errors... No dice
01:45nbtenney: Garbage on the screen, but no messages in logs that I can find
01:45imirkin: let's think about this... lines means y is ok, but x is always 0
01:47imirkin: ok yeah. that generated shader code is buggy.
01:47imirkin: now to figure out where i mess it up...
01:47imirkin: grrr... why isn't it merging those texqueries...
01:49imirkin: looks fine pre-ra... but ra's messed up?? wtf
01:51imirkin: whaaaa... why's it joining.... grrr
01:51imirkin: i'm gonna be unhappy when i figure this out.
01:58imirkin: yeah. very unhappy.
01:58imirkin: skeggsb: do you have any idea what JOIN_MASK_TEX is about? it seems like the code desperately tries to make defs == srcs regs for tex ops on nv50-era gpu's
01:59imirkin: oh. i see. that's by necessity. the encoding only allows one register field.
02:00imirkin: but only for a couple ops?
02:03imirkin: ah, but for those tex ops, still need it in same regs for the narrow op
02:03imirkin: grrr.... too many things to unwind. i'll start with the actual RA bug i guess.
02:19imirkin: ClaudiusMaximus: this fixes it for me: https://hastebin.com/ufiqodedim.swift
02:30imirkin: ugh. ok. LTSRC and LTDST use the same bitfield. nevermind. so that's just like a hard requirement. oh well.
02:33imirkin: now ... why aren't those two TXQ's just joined into one
02:33imirkin: i thought cse was supposed to do that
02:36imirkin: hm. i guess not.
02:54imirkin: well this kinda sucks. can't run any of the UE4 demos on the G92. not enough vram =/
03:00imirkin: actually 2 of them die because either we submit too fast or the gpu can't keep up
03:21ClaudiusMaximus: imirkin: bit over my head, but is the root cause something like "sampler2D textureSize() register can only be read once per shader invocation"?
03:24imirkin: ClaudiusMaximus: root cause is "RA fail for texture-related ops"
03:24imirkin: basically if multiple texture ops reuse the same source, then ... ka-boom
03:24imirkin: in this case, the zero LOD got CSE'd to be a single value
03:25imirkin: and the second textureSize got a bogus LOD
03:25imirkin: RA = register allocation
03:25ClaudiusMaximus: ah ok
03:25imirkin: CSE = common subexpression elimination
03:25ClaudiusMaximus: yep, done a bit of compilation, but out of practice, and for higher level stuff than real machines (i compiled to C)
03:26imirkin: this has been an issue since the dawn of time, but i guess it's hard to hit in practice
03:26imirkin: since it's rare to have multiple texture ops that take identical args
03:27ClaudiusMaximus: question is, why didn't the textureSize(tex, 0) get CSE'd rather than just the 0 ?
03:27imirkin: (and i think on top of that, it may have only been happening with texture ops that only take a single arg... so like texsize and 1d-texturing. not too common.)
03:27imirkin: coz our CSE doesn't work like that
03:27imirkin: it should have.
03:27imirkin: i thought it would/should.
03:27imirkin: but we don't have that logic in place
03:27imirkin: to merge multiple result masks together
04:12imirkin: ClaudiusMaximus: ended up doing it a bit differently: https://patchwork.freedesktop.org/patch/216042/
04:35ClaudiusMaximus: getting some black rectangle glitches with fragmentarium, i suspect it's due to the shader computation being too heavy / getting killed by a timeout? don't remember it from the nvidia driver on the same machine http://mathr.co.uk/tmp/fragmentarium-mandelbulb.png gets worse (multiple rectangles) when i increase the raymarching step count
04:37skeggsb: ClaudiusMaximus: NOUVEAU_SHADER_WATCHDOG=0 might help rule that in/out as a cause
04:42ClaudiusMaximus: skeggsb: i set that env var when running the program? it doesn't help if so
04:43skeggsb: yeah, it was a long-shot ;)
04:44skeggsb: ClaudiusMaximus: btw, what chipset?
04:46ClaudiusMaximus: skeggsb: 01:00.0 VGA compatible controller: NVIDIA Corporation G98M [GeForce G 105M] (rev a1) (prog-if 00 [VGA controller])
04:50ClaudiusMaximus: https://github.com/3Dickulus/FragM this is the version of fragmentarium i am using, fwiw; built with cmake -DNVIDAGL4PLUS=OFF . or so (forgot exactly the name of the flag to get it to work with my old laptop hardware)
11:07karolherbst: imirkin: you have access to dawn of war 2, right?
11:07karolherbst: seems like I see quite a lot of missrendering, even in the benchmark
11:35imirkin: karolherbst: probably do... all the texture dirtying needs to be redone in light of some mesa changes, i haven't had a chance to do it yet
11:35imirkin: also maxwell+ has a variety of rendering fail unrelated to any of that
11:35karolherbst: yeah, but the missrendering is like everything is mostly black :)
11:36karolherbst: but yeah, might be the issue as well, dunno
11:36imirkin: ClaudiusMaximus: known issue ... basically the shaders crap out after a while, taking whatever color was in the output regs at the time.
11:37imirkin: ClaudiusMaximus: https://bugs.freedesktop.org/show_bug.cgi?id=78161
11:38imirkin: skeggsb: if you happen to have any clever ideas as to how to fix it, i'm all ears
11:40karolherbst: imirkin: did you find what the issue is? Like RA overwriting values or something like that?
11:45karolherbst: imirkin: that thing renders only black on pascal for opt levels below 2
11:45imirkin: sending email now.
11:48karolherbst: imirkin: I actually think I hit the bug where we overwrite live values... but I totally can't remember now
11:49imirkin: on tesla?
11:49karolherbst: RA bug
11:52imirkin: well hopefully my explanation makes sense.
11:54karolherbst: yeah, well I was more wondering what hit this issue, but yeah, it makes sense
11:54karolherbst: or why you wrote the patch in the first place :p
11:54imirkin: oh. look at scrollback.
11:55imirkin: ClaudiusMaximus supplied a test program that hit this issue
11:57karolherbst: imirkin: I think we have to put another print into RA, because the live ranges sometimes don't fit the post SSA opt version
11:57imirkin: with NV50_PROG_DEBUG=255 iirc there's a print after all the various fixups that happen pre-ra
11:58imirkin: maybe not. in which case, feel free to add one.
11:58karolherbst: ohh I kind of thought 7 is max value which makes sense...
11:59karolherbst: but mhh, no, I got the last print
11:59imirkin: perhaps it is
11:59karolherbst: there is a weird live range I found
11:59imirkin: i don't bother with remembering the bitfields
11:59karolherbst: [23 35)
11:59karolherbst: for %58, which isn't bad at first
11:59karolherbst: 35: mov u32 %r113 0x00000000 (0)
11:59karolherbst: which is like totally unrelated
12:00imirkin: 35 is an open interval
12:00imirkin: i.e. up to but not including 35
12:00karolherbst: 34: eq %c112 bra BB:9 (0)
12:00karolherbst: ohh wait
12:00imirkin: makes sense
12:00imirkin: it's a loop
12:00imirkin: (just guessing since you never showed the code)
12:02karolherbst: mhh, a phi instruction isn't included in the range, is this normal?
12:02karolherbst: 9: phi 10: unrelated value, ranges are 9) and [10
12:02karolherbst: 8 is a bra into the node starting with the phi
12:04karolherbst: allthough phi values also start one after the phi, so I guess that is fine
12:05skeggsb: imirkin: what GPUs does that happen on? just tesla?
12:06imirkin: skeggsb: yep
12:06imirkin: karolherbst: for the %97 node?
12:06imirkin: oh, no - %58 you said?
12:06karolherbst: mhh but I may have found something
12:06skeggsb: i don't suppose if you send method 0xde4 with a high value it does any better?
12:07skeggsb: ie. does our default context have a low shader timeout?
12:07skeggsb: on gf100 we send "0" for our default context
12:07imirkin: skeggsb: will try it.
12:07skeggsb: i have nfi how to determine that on pre-gf100, as we init grctx directly, rather than sending stuff down the pipe and saving it
12:08imirkin: karolherbst: i dunno, those live intervals make sense.
12:08imirkin: don't mess with %58 inside the loop
12:08karolherbst: yeah, I wasn't sure about the phi thing
12:09imirkin: which makes sense - it's set outside the loop
12:09imirkin: but used inside the loop
12:09imirkin: skeggsb: de4 == WATCHDOG_TIMER :)
12:09skeggsb: imirkin: yes
12:10imirkin: i tried setting values higher than 0x18
12:10imirkin: with no effect
12:10skeggsb: what about "0" ?
12:10imirkin: mmmm ... let's see...
12:12skeggsb: there go all my ideas then!
12:13imirkin: extensive repertoire, i see...
12:13skeggsb: naturally ;_)
12:19karolherbst: imirkin: "NODE[%102, 2 colors]" got no reg assigned in the debug output ...
12:20karolherbst: maybe it just got lost, dunno
12:20karolherbst: but that's weird
12:21imirkin: did it get merged with %100?
12:21imirkin: not sure exactly how that works
14:40karolherbst: imirkin: do we have some ways to change what code codegen generates depending on env variable in release builds?
14:43imirkin_: i don't think so
14:45karolherbst: mhh okay
14:46karolherbst: I have to touch that code anyway, because I forgot to hash in that we may use nir vs TGSI. Screwed up the cache
14:47karolherbst: ..... crap, I ctrl+c again with meson/ninja
14:47karolherbst: now by build dir is screwed up again
14:48imirkin_: yeah, that's happened to me a few times
14:48imirkin_: it's really bad at recovery
14:48imirkin_: not such a big deal if a build takes a minute
14:48imirkin_: but for me it's way longer
14:49imirkin_: and it has to build tons of useless stuff. like nir and spirv things =/
14:49karolherbst: if at least unity builds would work with mesa
14:49karolherbst: 2000 -> 270 steps
14:50karolherbst: well 290
15:04imirkin_: clearing out my cpu fan was a very nice improvement
15:04imirkin_: basically tops out at 60C now, instead of getting thermally throttled all the time at 97C
15:06karolherbst: yeah, I limited by desktop CPU with TLP, so the fans don't get annoyingly loud
15:07karolherbst: allthough it is mainly for laptops, I also set a super short HDD sleep timeout, because I don't really use the HDD anyway
15:21imirkin_: it wasn't so much about fan loudness as effectiveness
15:21imirkin_: and i also applied fresh thermal paste
15:22imirkin_: hadn't done it in quite a while, and i think the contact gets worse every time i move
15:25karolherbst: well mine is in the office and I didn't want to annoy the others :p
15:25karolherbst: because I tend to compile quite often