10:34 mupuf: nouveau still crashes for kde... but so does the blob :o
10:34 mupuf: more debug to come later!
10:35 mupuf: but I am back on nvidia (hopefully on Nouveau soon) on my desktop PC, because of my 4K monitor. Hopefully it makes me want to contribute more again :)
11:09 karolherbst: mupuf: concurrency
11:09 karolherbst: mupuf: skeggsb is working on it (tm)
12:06 mupuf: karolherbst: a similar problem is visible on the blob ;)
12:06 mupuf: it keeps on crashing too
12:06 karolherbst: odd
12:06 karolherbst: maybe it's also a plasma bug
12:06 mupuf: I doubt it, even vt switching is blocked
12:09 karolherbst: mhh, interesting
12:09 karolherbst: okay well
12:09 karolherbst: somtimes plasma also get stuck for me on intel
12:09 karolherbst: killing kwin usually solves that problem
12:09 pmoreau: mupuf: VT switching has been a long standing problem for the blob. I think they fixed it at some point, but it came back recently.
12:09 karolherbst: but the session is still up and running
12:10 mupuf: karolherbst: interesting. It works flawlessly on Intel for me
12:10 pmoreau: karolherbst: I haven’t had that issue on Intel, thankfully.
12:10 karolherbst: well with "sometimes" I mean like once a month
12:10 karolherbst: and for long running sessions
12:11 karolherbst: but I also still use intel ddx with SNA and dri3
12:11 mupuf: I use modesetting
12:38 pmoreau: I use modesetting as well
12:45 karolherbst: maybe I should use modesetting as well
12:47 imirkin: iirc modesetting way over-flushes, which may help work around various buggy app behavior
19:28 karolherbst: imirkin: any idea what the unknown is? 00000038: 00070f00 50b00000 $p3 add rm f32 $r28 neg $r0 neg $r0 [unknown: 00000000 00300000]
19:28 karolherbst: uhhh
19:28 karolherbst: there is also an unknown instruction
19:29 karolherbst: ffe007ff 001f8000
19:29 karolherbst: odd
19:29 karolherbst: it's the first instruction
19:31 karolherbst: anyway, here the KHR-GL45.texre_cube_map_array.sampling outputs: https://gist.github.com/karolherbst/6eedd2bd5c1e037714e64d168b87299b
19:31 karolherbst: maybe you see something I might miss
19:32 imirkin_: karolherbst: did you decode it with the right thing?
19:32 imirkin_: or is that in demmt?
19:32 karolherbst: demmt
19:32 karolherbst: specified -me6
19:32 imirkin_: oh
19:32 imirkin_: that's bs
19:32 imirkin_: it's fake data
19:33 imirkin_: don't worry about it
19:33 karolherbst: okay
19:33 imirkin_: it uploads a dummy TCP
19:33 imirkin_: it's necessary to have a TES without TCP because the shader header's # of patch attributes is still used
19:34 karolherbst: okay
19:34 karolherbst: I was just buffeled why the first insn was no sched
19:35 imirkin_: whoa weird
19:35 imirkin_: quadop rn f32 add add add add $r0 dall l0 $r0 0x0
19:35 imirkin_: i wonder what that's about
19:35 karolherbst: yeah
19:36 karolherbst: the test is: GL_RGBA8 TextureGrad without mutabilities
19:37 karolherbst: I could get the shaders as well
19:38 karolherbst: allthough only the fragment shader is important here I guess
19:39 karolherbst: https://gist.github.com/karolherbst/6eedd2bd5c1e037714e64d168b87299b#file-frag-frag
19:40 karolherbst: mapping that to nvidia will be fun
19:42 karolherbst: maybe we should start to display those SUBOPS things more readable
19:44 karolherbst: diff tool, easy
19:44 karolherbst: ohhh
19:45 karolherbst: imirkin_: they have a quadpop
19:45 karolherbst: tex p (quadpop) texbar
19:45 karolherbst: nouveau doesn't put a quadpop there
19:45 karolherbst: otherwise it looks pretty much identical
19:46 karolherbst: didn't resolve register references yet
19:47 karolherbst: yeah, there are a lot more quadpops
19:47 karolherbst: quadop rn f32 add add add add $r0 dall l1 $r4 0x0
19:55 imirkin_: make sure to push the nouveau logic through envydis as well
19:55 imirkin_: so that you see all the flags
19:55 imirkin_: e.g. dall vs not, etc
19:55 karolherbst: yeah
19:55 imirkin_: pretty sure we have dall now :)
19:55 karolherbst: already did
19:56 karolherbst: now I try to rename the registers so that I can compare both shaders
19:57 imirkin_: easiest to just put them up side-by-side
19:57 karolherbst: yeah
19:57 karolherbst: in "quadop rn f32 mov2 add mov2 add $r0 dall l0 $r8 $r3"
19:57 karolherbst: is $r0 written to?
19:58 karolherbst: no idea how those quadop OPs work
19:58 imirkin_: so
19:58 imirkin_: the idea is
19:58 imirkin_: you have a quad
19:58 imirkin_: i.e. 4 values
19:58 imirkin_: er
19:58 imirkin_: 4 pixels
19:59 imirkin_: in each of those pixels, $rN has a (different) value
19:59 imirkin_: if you do "add $r0 $r1 $r2", it will update $r0 to be $r1 + $r2 in each of the respective pixels
19:59 imirkin_: quadop allows one lane to look at another lane's values
20:00 karolherbst: yeah okay
20:00 imirkin_: so ... for_each_lane_i, $r0_i = op_i($r8_0, $r3_i) [or vice-versa, i forget which arg is pinned]
20:00 karolherbst: okay
20:00 imirkin_: where $r0_i means $r0 in lane i
20:00 imirkin_: so that "dall l0" means $r8+0
20:00 imirkin_: could have been l1
20:00 imirkin_: etc
20:01 karolherbst: are the "a[0xa8]" accesses portable between nvidia and nouveau?
20:01 karolherbst: I guess not, but want to be sure
20:01 imirkin_: well, that's just a shader varying access
20:01 imirkin_: (for the interp)
20:02 imirkin_: the specific positions might vary based on ... a number of things
20:02 karolherbst: okay
20:02 karolherbst: just wondering, because the accesses are generally the same
20:02 karolherbst: but nvidia once reads from 0xa8 where nouveau reads from 0x9c
20:03 karolherbst: but I will see that later after I worked through those shaders
20:03 imirkin_: we might eliminate varyings it doesn't? dunno
20:04 karolherbst: for now I want to rename the registers to better compare both
20:04 imirkin_: yea
20:04 karolherbst: in "tex p lauto all dall $r0:$r1:$r2:$r3 acube $t8 $s0 $r4:$r5:$r6:$r7" which registers are changing?
20:04 karolherbst: the first group?
20:04 imirkin_: destination is $r0q
20:04 karolherbst: k
20:05 imirkin_: erm
20:05 imirkin_: that's too many args...
20:05 imirkin_: oh, acube
20:05 imirkin_: so $r4 is the array index, and $r5..r7 are the cube coords
20:06 imirkin_: acube = array cube; tcube = texture cube (i.e. not an array)
20:06 karolherbst: but I am actually surprised how much both shaders look a like
20:06 imirkin_: well it's not a COMPLETE accident
20:06 karolherbst: :D
20:06 imirkin_: we actually do something a bit different for textureGrad in vertex shaders
20:06 karolherbst: but nvidia has more instructions
20:06 imirkin_: i wonder if the nvidia way is required
20:07 karolherbst: yeah, first I fix fragment shaders, maybe it fixes everything else, but if I get less than 144 errors I am happy
20:07 imirkin_: [by which i mean we do the exact same thing in vertex shaders, while nvidia is slightly different]
20:08 karolherbst: ohh, okay
20:08 imirkin_: they do all the lanes' derivatives from l0 i think
20:08 imirkin_: dunno if it's required, or an artifact of something else
20:08 imirkin_: either way, i haven't seen anything to indicate our way doesn't work
20:10 karolherbst: ohh well, I see significant differences already
20:10 karolherbst: but currently I wouldn't know how that maps to anything
20:10 imirkin_: entirely possible that nouveau is doing something totally dumb... it's hard to trace through all that stuff.
20:11 imirkin_: i don't think i ever wrote anything out
20:28 mupuf: hmm, seems like my 960 is stable, but not my 1060. Odd...
20:29 mupuf: (still with the blob)
21:31 karolherbst: imirkin: okay nice, we do indeed some things differently
21:32 karolherbst: just looking at this block already: https://gist.github.com/karolherbst/afc9aa8c3a12c16c20ddd828cd2d73b2
21:32 karolherbst: the last three ops
21:42 imirkin_: did i mess something up?
21:45 karolherbst: I give a better overview shortly
21:45 karolherbst: updated: https://gist.github.com/karolherbst/afc9aa8c3a12c16c20ddd828cd2d73b2
21:45 karolherbst: complete until the first texbar
21:46 imirkin_: 0x9c is probably the array index
21:46 imirkin_: sounds like i do mess it up?
21:46 karolherbst: looks like it
21:46 karolherbst: but only a small mess up
21:46 imirkin_: oh no... 0x8c is
21:46 karolherbst: wondering what that quadpop is all about
21:46 imirkin_: hmmmmmmmmm
21:47 imirkin_: so a rotation of G/H/I...
21:47 karolherbst: yeah
21:47 karolherbst: and
21:47 karolherbst: $I is wrongly filled?
21:47 karolherbst: might be related
21:47 imirkin_: i wonder if it's different on for the diff lanes
21:47 imirkin_: can you do the same thing for l1?
21:47 karolherbst: yeah
21:48 imirkin_: oh ffs
21:48 imirkin_: i think i know what's going on
21:48 imirkin_: stupid goddamn GL y-flip
21:48 karolherbst: :O
21:49 imirkin_: maybe
21:49 imirkin_: no, maybe not
21:49 karolherbst: what's 0xa8 vs 0x9c?
21:49 imirkin_: yeah, needs investigation
21:49 imirkin_: no clue
21:49 karolherbst: k
21:49 imirkin_: i suspect that nvidia is just better at packing attributes
21:52 imirkin_: karolherbst: could also be that nvidia packs the attribs differently, so IGH is right for us, and GHI is right for nvidia
21:53 karolherbst: mhh, might be
21:53 karolherbst: anyway, is thre some magic going on with l1 mov b32 $r17 $r0 ?
21:53 imirkin_: yeah
21:53 imirkin_: so that moves ... let's see ...
21:55 imirkin_: either l1 of $r1 into all lanes of $r17
21:55 imirkin_: or no
21:55 imirkin_: probably it moves $r0_l1 into $r17_l1
21:55 imirkin_: so $r17 in other lanes is unaffected
21:55 karolherbst: mhhh
21:55 imirkin_: mov takes a lanemask... on gm107 that's the 0xf argument at the end - means all lanes.
21:57 karolherbst: it looks like we simply miss a few quadops in the end
21:57 karolherbst: nvidia does 12 quadops more than nouveau
21:57 karolherbst: but more later
21:58 imirkin_: i don't think those quadops get you anything
21:59 imirkin_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp#n1097
21:59 imirkin_: this is what we do
21:59 imirkin_: imho the comments are pretty decent
22:01 imirkin_: so yeah. i wonder if this needs to account for the stupid y-flip
22:01 imirkin_: although ... the textureGrad ... hrm
22:02 imirkin_: yeah shouldn't matter.
22:02 karolherbst: you'll love the differences
22:12 imirkin_: karolherbst: can i see the TGSI btw?
22:12 imirkin_: i have a little doubt....
22:14 karolherbst: imirkin_: https://gist.github.com/karolherbst/6eedd2bd5c1e037714e64d168b87299b#file-nouveau-L579-L591
22:14 imirkin_: ok, so TEMP[0] is the dPdy
22:15 imirkin_: which gets IN[1].w + IN[2].xy
22:15 imirkin_: ok, yeah - makes sense.
22:17 imirkin_: and right - the I comes before GH. so it all makes sense.
22:17 imirkin_: nouveau packs attributes, nvidia does not
22:17 imirkin_: (or rather, higher levels above nouveau pack attributes)
22:17 imirkin_: so that explains why those are different.
22:32 mangix: aaaaand 4.12.5 on fedora can't load GNOME properly
22:33 mangix: i2c fix wasn't backported
22:38 karolherbst: imirkin_: might be interesting to know if the packing has any negative influence
22:38 imirkin_: thoroughly unlikely.
22:44 karolherbst: with IGH modified it looks a little better
22:44 karolherbst: but I am sure we forgot some quadops, had to begin from scratch, because I was unconcentrated
22:45 karolherbst: and we might have to replace some quadop subr mov2 subr mov2 with add mov2 add mov2
22:45 imirkin_: so that would imply that the Y coordinate flip thing matters
22:46 karolherbst: I can show you l1 shortly
22:50 karolherbst: imirkin_: https://gist.github.com/karolherbst/361ce013e080aa00f7e927f5360c47d4
22:51 karolherbst: especially this "quadop rn f32 add add add add $AW dall l1 $O 0x0"
22:53 karolherbst: but l2 is even more different
23:13 karolherbst1: imirkin: done https://gist.github.com/karolherbst/361ce013e080aa00f7e927f5360c47d4
23:14 tobijk: karolherbst: actually a diff would be good imho :)
23:14 karolherbst: use your favourite diff tool
23:15 karolherbst: I like a more visual diff tool for things like this
23:15 karolherbst: a patch file doesn't cut it
23:15 tobijk: so kompare?! :D
23:15 karolherbst: I use meld
23:16 karolherbst: it seems like every tools is silly except meld
23:16 karolherbst: but meld has perf issues
23:16 tobijk: anyway, ment for the pastebin
23:16 karolherbst: meld has the biggest advantage, that you can simply create "blank" views without the need of opening files
23:16 karolherbst: so you can simply paste stuff into it
23:17 karolherbst: I am little confused about things like "quadop rn f32 add add add add $BE dall l0 $BA 0x0"
23:20 tobijk: huh, i'm blind, where do we use $AG around ther?
23:20 tobijk: (nouveau)
23:21 karolherbst: uhh, I had a little mistake in there as well
23:21 karolherbst: at the end
23:22 karolherbst: updated: https://gist.github.com/karolherbst/361ce013e080aa00f7e927f5360c47d4
23:23 karolherbst: imirkin: do we optimize those quadop adds with 0x0 away? wondering why nvidia doesn't do it
23:24 karolherbst: allthough they also do quite a lot of quadpop and quadon
23:25 tobijk: yeah i wonder what those are for
23:25 karolherbst: okay but we have some add/subr issues there
23:26 karolherbst: maybe those are the only relevant changes
23:27 karolherbst: uhhh wait
23:28 tobijk: the tex in nouveau:72 to nvidia:82 ?
23:28 tobijk: $0 vs $BU
23:28 karolherbst: that's maybe also fine
23:28 karolherbst: quadop rn f32 add add add add $BU dall l2 $O 0x0
23:33 karolherbst: yeah
23:33 karolherbst: I thnk imirkin is right
23:33 karolherbst: Y is flipped
23:34 karolherbst: let me test it
23:46 karolherbst: okay, nearly the same as nvidia
23:54 karolherbst: yay
23:54 karolherbst: pass
23:56 karolherbst: okay, now the full test
23:59 karolherbst: 96 fails vs 144
23:59 karolherbst: :)