00:04 karolherbst: skeggsb: https://github.com/karolherbst/nouveau/commits/clk_fixes
00:04 catphish: unigine just gives me a black screen :(
00:06 imirkin_: catphish: you have to be running a compositor for dri2-based offloading to work properly
00:06 catphish: i would have assumed i was
00:06 imirkin_: catphish: btw, i'd recommend switching to DRI3
00:06 catphish: i'd prefer to do that
00:06 imirkin_: (that optimus guide has explanations)
00:06 catphish: i'm not clear how to switch to DRI3
00:07 catphish: mostly not sure if my builds of everything support it
00:07 imirkin_: you're using the intel ddx... you have to enable it somehow
00:08 imirkin_: or you're using an old intel ddx maybe?
00:08 imirkin_: ah no, looks like it's fine...
00:08 imirkin_: iirc you have to do like Option "DRI3" "on"
00:08 imirkin_: or something
00:09 catphish:hits google
00:09 imirkin_: Option "DRI" "3"
00:09 imirkin_: although allegedly 3 should be on by default
00:09 ClaudiusMaximus: i upgraded my laptop kernel to 4.15, still have the bug
00:10 imirkin_: ClaudiusMaximus: yeah, i'm sure that bug is not affected by anythign other than mesa
00:10 imirkin_: karolherbst: + if (IS_ERR_VALUE(ret) && ret != -EACCES)
00:10 imirkin_: 
00:10 ClaudiusMaximus: imirkin_: ok, thanks
00:10 imirkin_: what if ret == -EACCES?
00:10 catphish: brb
00:11 catphish: how do i check if it's using DRI3?
00:11 imirkin_: LIBGL_DEBUG=verbose glxinfo > /dev/null
00:11 imirkin_: should say if it's using DRI2 or DRI3
00:11 catphish: libGL: Using DRI2 for screen 0
00:11 catphish: brb changed a setting, sorry no irc bouncer
00:12 ClaudiusMaximus: briefly tried out the pstate stuff too, my laptop only has nvidia card so i assume it's always on so always safe to change? or does xorg need to be running at least (didn't try without it)?
00:12 karolherbst: imirkin_: no clue, I just copied how it was used elsewhere
00:12 karolherbst: I am sure I looked into it and it was fine when I did
00:13 karolherbst: imirkin_: EACCES: 'power.disable_depth' is different from 0
00:13 karolherbst: and devices only suspend if disable_depth == 0
00:14 stratact: lachs0r: I just upgraded to 4.16.1 and used the nvidia drivers and I had no issues with it so far.
00:14 lachs0r: okay
00:15 stratact: lachs0r: how did you get the data lost though?
00:15 lachs0r: memory corruption
00:16 karolherbst: skeggsb: there is an unchecked pm_runtime_get_sync call inside nouveau_display_hpd_work by the way
00:16 lachs0r: also freezing in the worst possible moments
00:17 stratact: lachs0r: I'll those 2 in mind
00:17 lachs0r: I can reproduce the latter very reliably by switching my projector output on and off via xrandr
00:18 catphish: libGL: Using DRI3 for screen 0
00:18 catphish: :)
00:18 lachs0r: seems to be easier to trigger with more recent drivers
00:18 lachs0r: (it now happens instantly)
00:19 catphish: and unigine now works :)
00:20 imirkin_: and you don't have to do the explicit offload thing now
00:20 catphish: i noticed that, works great :)
00:20 catphish: thank you for helping!
00:21 imirkin_: np. hope you get higher framerates with nouveau on unigine than the intel chip
00:22 imirkin_: afaik in most lower-end optimus setups it's barely any win at all
00:23 catphish: it used to be significant using the binary driver, just a lot of hassle
00:23 catphish: i'll compare
00:23 imirkin_: well, with nouveau you'll get like 60% of the perf
00:23 catphish: well that's a start
00:24 imirkin_: and on maxwell there are known artifacts
00:24 imirkin_: (although their cause is not understood)
00:24 catphish: i might try this with the binary driver again some time if it supports this method of offload, but nouveau might be good enough for my needs
00:24 imirkin_: in the future, pick an amd gpu :)
00:25 catphish: the difference for me is 16fps vs 20
00:26 catphish: *13 fps vs 20
00:26 imirkin_: intel vs nouveau? or nouveau vs blob?
00:26 catphish: intel vs nouveau
00:26 imirkin_: nice
00:26 imirkin_:hopes nouveau is the 20...
00:27 catphish: it is :)
00:27 imirkin_: in what? some unigine demo?
00:28 catphish: tested again, unigine demo, yes, on lower settings, 14 fps vs 24 fps
00:28 catphish: so definitely worthwhile (if you can ignore the heat / fans)
00:29 imirkin_: well this thing ain't magic... gotta put some power through it in order to make the pixels go
00:29 catphish: :)
00:30 catphish: well that's great anyway, offloading that actually works as it's supposed to
00:30 catphish: does the binary nvidia driver support this kind of offload?
00:31 imirkin_: they definitely didn't before. i believe with GLvnd they might? not sure.
00:31 imirkin_: part of the sticking point is that you can only have one libGL... and mesa's knows how to load various drivers
00:31 catphish: oh well, probably not worth breaking my perfectly good setup
00:31 imirkin_: nvidia's doesn't
00:31 catphish: i see
00:31 imirkin_: (but of course mesa's can't load nvidia's...)
00:32 imirkin_: so now there's a shared libGLvnd frontend
00:32 imirkin_: which redispatches stuff
00:32 imirkin_: so that's part of the problem
00:32 imirkin_: the other part is the buffer sharing/etc
00:32 imirkin_: i *think* they've been implementing dma-buf support in their drivers
00:32 catphish: when are nvidia releasing a nice open source driver that supports everything and integrates properly?
00:32 imirkin_: but again, no first-hand knowledge
00:32 catphish:backs away slowly
00:32 imirkin_: whenever you stop buying their hw
00:33 imirkin_: hence ... get a nice amd board next time.
00:33 catphish: tbh i think with intel doing amd GPUs on-chip, things may be about to change
00:33 catphish: i only have nvidia by default
00:33 imirkin_: so until that changes, nvidia has no interest in releasing a nice open-source driver.
00:34 catphish: is AMD's better?
00:34 imirkin_: which is why advise anyone who will listen to stop buying nvidia
00:34 imirkin_: AMD has a dedicated engineering team making open-source drivers.
00:34 imirkin_: (as does intel)
00:34 catphish: cool
00:34 imirkin_: (but intel doesn't make powerful add-on boards)
00:35 imirkin_: in the meanwhile, nvidia has made it impossible for nouveau to do reclocking on GM20x+
00:35 imirkin_: (ok, we might get lucky for GM20x, but definitely GP10x is out)
00:35 catphish: seems odd, reclocking seems like a core function
00:36 imirkin_: part of the whole signed firmware thing
00:36 imirkin_: so we can't provide our own firmware
00:36 catphish: is this stuff all reverse engineered, or do they release limited public specs?
00:36 imirkin_: and they won't release theirs in a redistributable fashion
00:36 imirkin_: mostly RE'd
00:36 catphish: oh, firmware too, eww
00:36 imirkin_: they have, on occasion, provided clarifications
00:37 imirkin_: and they have provided "more" documentation on how to operate the displays
00:37 imirkin_: http://download.nvidia.com/open-gpu-doc/
00:37 imirkin_: here's everything they've ever released
00:37 catphish: well i'm not going to read that, because i value my sanity and i should sleep now
00:37 catphish: thanks again
00:38 imirkin_: enjoy
00:38 catphish: wow that's "sparse"
00:38 imirkin_: most of the info was already known at time of release
00:38 imirkin_: they did clarify a few things which were nice
00:38 catphish: intel need to get into high spec GPUs
00:38 imirkin_: and it's nice to move things from the "probably" to the "certainly" pile
00:39 juri_: ooh. more reading material. thanks.
00:53 nbtenney: skeggsb, imirkin: I'm apparently the other user that actually uses separate AC and DC settings. Always seems to work fine here on a tesla
00:55 imirkin_: =]
00:58 nbtenney: Anyone hear tell of issues with garbage rendering upon resume from suspend with nouveau?
00:58 imirkin_: like a single frame?
00:58 imirkin_: or forever?
00:58 nbtenney: Seems to be triggered by gdm. Switching to TTY and restarting the service makes it less angry
00:58 imirkin_: i had not heard of such issues.
00:59 nbtenney: I've had near zero time with the machine lately, and I'm debating whether I care to investigate
00:59 imirkin_: "no"
01:21 nbtenney: Another thing I've been experiencing forever and have been too lazy to report...
01:21 nbtenney: This pops up every boot: ix.io/17qE
01:21 nbtenney: No clue if it actually matters or what it means
01:25 imirkin: that's fine
01:25 imirkin: just something in our parser that's going too far
01:42 imirkin: ClaudiusMaximus: repro'd on G92 but not GF108
01:44 nbtenney: Nice. I'm searching around for any logs/errors... No dice
01:45 nbtenney: Garbage on the screen, but no messages in logs that I can find
01:45 imirkin: let's think about this... lines means y is ok, but x is always 0
01:47 imirkin: ok yeah. that generated shader code is buggy.
01:47 imirkin: now to figure out where i mess it up...
01:47 imirkin: grrr... why isn't it merging those texqueries...
01:49 imirkin: looks fine pre-ra... but ra's messed up?? wtf
01:51 imirkin: whaaaa... why's it joining.... grrr
01:51 imirkin: i'm gonna be unhappy when i figure this out.
01:58 imirkin: yeah. very unhappy.
01:58 imirkin: skeggsb: do you have any idea what JOIN_MASK_TEX is about? it seems like the code desperately tries to make defs == srcs regs for tex ops on nv50-era gpu's
01:59 imirkin: oh. i see. that's by necessity. the encoding only allows one register field.
01:59 imirkin: super.
02:00 imirkin: but only for a couple ops?
02:03 imirkin: ah, but for those tex ops, still need it in same regs for the narrow op
02:03 imirkin: grrr.... too many things to unwind. i'll start with the actual RA bug i guess.
02:19 imirkin: ClaudiusMaximus: this fixes it for me: https://hastebin.com/ufiqodedim.swift
02:30 imirkin: ugh. ok. LTSRC and LTDST use the same bitfield. nevermind. so that's just like a hard requirement. oh well.
02:33 imirkin: now ... why aren't those two TXQ's just joined into one
02:33 imirkin: i thought cse was supposed to do that
02:36 imirkin: hm. i guess not.
02:54 imirkin: well this kinda sucks. can't run any of the UE4 demos on the G92. not enough vram =/
02:56 HdkR: lol
03:00 imirkin: actually 2 of them die because either we submit too fast or the gpu can't keep up
03:21 ClaudiusMaximus: imirkin: bit over my head, but is the root cause something like "sampler2D textureSize() register can only be read once per shader invocation"?
03:24 imirkin: ClaudiusMaximus: root cause is "RA fail for texture-related ops"
03:24 imirkin: basically if multiple texture ops reuse the same source, then ... ka-boom
03:24 ClaudiusMaximus: RA?
03:24 imirkin: in this case, the zero LOD got CSE'd to be a single value
03:25 imirkin: and the second textureSize got a bogus LOD
03:25 imirkin: RA = register allocation
03:25 ClaudiusMaximus: ah ok
03:25 imirkin: CSE = common subexpression elimination
03:25 ClaudiusMaximus: yep, done a bit of compilation, but out of practice, and for higher level stuff than real machines (i compiled to C)
03:26 imirkin: this has been an issue since the dawn of time, but i guess it's hard to hit in practice
03:26 imirkin: since it's rare to have multiple texture ops that take identical args
03:27 ClaudiusMaximus: question is, why didn't the textureSize(tex, 0) get CSE'd rather than just the 0 ?
03:27 imirkin: (and i think on top of that, it may have only been happening with texture ops that only take a single arg... so like texsize and 1d-texturing. not too common.)
03:27 imirkin: coz our CSE doesn't work like that
03:27 imirkin: it should have.
03:27 imirkin: i thought it would/should.
03:27 ClaudiusMaximus: ok
03:27 imirkin: but we don't have that logic in place
03:27 imirkin: to merge multiple result masks together
04:12 imirkin: ClaudiusMaximus: ended up doing it a bit differently: https://patchwork.freedesktop.org/patch/216042/
04:35 ClaudiusMaximus: getting some black rectangle glitches with fragmentarium, i suspect it's due to the shader computation being too heavy / getting killed by a timeout? don't remember it from the nvidia driver on the same machine http://mathr.co.uk/tmp/fragmentarium-mandelbulb.png gets worse (multiple rectangles) when i increase the raymarching step count
04:37 skeggsb: ClaudiusMaximus: NOUVEAU_SHADER_WATCHDOG=0 might help rule that in/out as a cause
04:42 ClaudiusMaximus: skeggsb: i set that env var when running the program? it doesn't help if so
04:43 skeggsb: yeah, it was a long-shot ;)
04:44 skeggsb: ClaudiusMaximus: btw, what chipset?
04:46 ClaudiusMaximus: skeggsb: 01:00.0 VGA compatible controller: NVIDIA Corporation G98M [GeForce G 105M] (rev a1) (prog-if 00 [VGA controller])
04:50 ClaudiusMaximus: https://github.com/3Dickulus/FragM this is the version of fragmentarium i am using, fwiw; built with cmake -DNVIDAGL4PLUS=OFF . or so (forgot exactly the name of the flag to get it to work with my old laptop hardware)
11:07 karolherbst: imirkin: you have access to dawn of war 2, right?
11:07 karolherbst: seems like I see quite a lot of missrendering, even in the benchmark
11:35 imirkin: karolherbst: probably do... all the texture dirtying needs to be redone in light of some mesa changes, i haven't had a chance to do it yet
11:35 imirkin: also maxwell+ has a variety of rendering fail unrelated to any of that
11:35 karolherbst: yeah, but the missrendering is like everything is mostly black :)
11:36 karolherbst: but yeah, might be the issue as well, dunno
11:36 imirkin: ClaudiusMaximus: known issue ... basically the shaders crap out after a while, taking whatever color was in the output regs at the time.
11:37 imirkin: ClaudiusMaximus: https://bugs.freedesktop.org/show_bug.cgi?id=78161
11:38 imirkin: skeggsb: if you happen to have any clever ideas as to how to fix it, i'm all ears
11:40 karolherbst: imirkin: did you find what the issue is? Like RA overwriting values or something like that?
11:45 karolherbst: imirkin: that thing renders only black on pascal for opt levels below 2
11:45 imirkin: sending email now.
11:48 karolherbst: imirkin: I actually think I hit the bug where we overwrite live values... but I totally can't remember now
11:49 imirkin: on tesla?
11:49 karolherbst: generally
11:49 karolherbst: RA bug
11:52 imirkin: ah
11:52 imirkin: well hopefully my explanation makes sense.
11:54 karolherbst: yeah, well I was more wondering what hit this issue, but yeah, it makes sense
11:54 karolherbst: or why you wrote the patch in the first place :p
11:54 imirkin: oh. look at scrollback.
11:55 imirkin: ClaudiusMaximus supplied a test program that hit this issue
11:55 karolherbst: ahh
11:57 karolherbst: imirkin: I think we have to put another print into RA, because the live ranges sometimes don't fit the post SSA opt version
11:57 imirkin: with NV50_PROG_DEBUG=255 iirc there's a print after all the various fixups that happen pre-ra
11:58 imirkin: maybe not. in which case, feel free to add one.
11:58 karolherbst: ohh I kind of thought 7 is max value which makes sense...
11:59 karolherbst: but mhh, no, I got the last print
11:59 imirkin: perhaps it is
11:59 karolherbst: actually
11:59 karolherbst: there is a weird live range I found
11:59 imirkin: i don't bother with remembering the bitfields
11:59 karolherbst: [23 35)
11:59 karolherbst: for %58, which isn't bad at first
11:59 karolherbst: but
11:59 karolherbst: 35: mov u32 %r113 0x00000000 (0)
11:59 karolherbst: which is like totally unrelated
12:00 imirkin: 35 is an open interval
12:00 imirkin: i.e. up to but not including 35
12:00 karolherbst: 34: eq %c112 bra BB:9 (0)
12:00 karolherbst: ohh wait
12:00 imirkin: makes sense
12:00 karolherbst: right
12:00 karolherbst: ...
12:00 imirkin: it's a loop
12:00 imirkin: (just guessing since you never showed the code)
12:00 karolherbst: yeah.
12:02 karolherbst: mhh, a phi instruction isn't included in the range, is this normal?
12:02 karolherbst: 9: phi 10: unrelated value, ranges are 9) and [10
12:02 karolherbst: 8 is a bra into the node starting with the phi
12:03 karolherbst: https://gist.github.com/karolherbst/dc7b6c920047482006083a9f60e3d08d
12:04 karolherbst: allthough phi values also start one after the phi, so I guess that is fine
12:05 skeggsb: imirkin: what GPUs does that happen on? just tesla?
12:06 imirkin: skeggsb: yep
12:06 imirkin: karolherbst: for the %97 node?
12:06 imirkin: oh, no - %58 you said?
12:06 karolherbst: yeah
12:06 karolherbst: mhh but I may have found something
12:06 skeggsb: i don't suppose if you send method 0xde4 with a high value it does any better?
12:07 skeggsb: ie. does our default context have a low shader timeout?
12:07 skeggsb: on gf100 we send "0" for our default context
12:07 imirkin: skeggsb: will try it.
12:07 skeggsb: i have nfi how to determine that on pre-gf100, as we init grctx directly, rather than sending stuff down the pipe and saving it
12:08 imirkin: karolherbst: i dunno, those live intervals make sense.
12:08 imirkin: don't mess with %58 inside the loop
12:08 karolherbst: yeah, I wasn't sure about the phi thing
12:09 imirkin: which makes sense - it's set outside the loop
12:09 imirkin: but used inside the loop
12:09 imirkin: skeggsb: de4 == WATCHDOG_TIMER :)
12:09 skeggsb: imirkin: yes
12:10 imirkin: i tried setting values higher than 0x18
12:10 imirkin: with no effect
12:10 skeggsb: what about "0" ?
12:10 imirkin: mmmm ... let's see...
12:11 imirkin: nope
12:12 skeggsb: there go all my ideas then!
12:12 karolherbst: ufff...
12:13 imirkin: extensive repertoire, i see...
12:13 skeggsb: naturally ;_)
12:19 karolherbst: imirkin: "NODE[%102, 2 colors]" got no reg assigned in the debug output ...
12:20 karolherbst: maybe it just got lost, dunno
12:20 karolherbst: but that's weird
12:21 imirkin: did it get merged with %100?
12:21 imirkin: not sure exactly how that works
14:40 karolherbst: imirkin: do we have some ways to change what code codegen generates depending on env variable in release builds?
14:43 imirkin_: i don't think so
14:45 karolherbst: mhh okay
14:46 karolherbst: I have to touch that code anyway, because I forgot to hash in that we may use nir vs TGSI. Screwed up the cache
14:47 imirkin_: ah
14:47 karolherbst: ..... crap, I ctrl+c again with meson/ninja
14:47 karolherbst: now by build dir is screwed up again
14:48 karolherbst: :(
14:48 imirkin_: yeah, that's happened to me a few times
14:48 imirkin_: it's really bad at recovery
14:48 imirkin_: not such a big deal if a build takes a minute
14:48 imirkin_: but for me it's way longer
14:49 imirkin_: and it has to build tons of useless stuff. like nir and spirv things =/
14:49 karolherbst: :(
14:49 karolherbst: if at least unity builds would work with mesa
14:49 karolherbst: 2000 -> 270 steps
14:50 karolherbst: well 290
15:04 imirkin_: clearing out my cpu fan was a very nice improvement
15:04 imirkin_: basically tops out at 60C now, instead of getting thermally throttled all the time at 97C
15:06 karolherbst: :)
15:06 karolherbst: yeah, I limited by desktop CPU with TLP, so the fans don't get annoyingly loud
15:06 karolherbst: *my
15:07 karolherbst: allthough it is mainly for laptops, I also set a super short HDD sleep timeout, because I don't really use the HDD anyway
15:21 imirkin_: it wasn't so much about fan loudness as effectiveness
15:21 imirkin_: and i also applied fresh thermal paste
15:22 imirkin_: hadn't done it in quite a while, and i think the contact gets worse every time i move
15:25 karolherbst: well mine is in the office and I didn't want to annoy the others :p
15:25 karolherbst: because I tend to compile quite often