00:00 karolherbst: mhh, running ConstantFolding twice: total instructions in shared programs : 5726032 -> 5719281 (-0.12%) total gprs used in shared programs : 663147 -> 662790 (-0.05%)
00:10 HdkR: Whoa, massive NIR patchset :D
00:10 HdkR: Hopefully last revision? :)
00:13 karolherbst: HdkR: yeah, I guess so
00:13 karolherbst: maybe some minor cleanups here and there, but nothing I could also do later
00:14 karolherbst: perf is terrible at the moment, but this can be improved as well
00:18 HdkR: Perf is always bad when you haven't spent years optimizing. Sounds like a good time
00:20 karolherbst: well
00:20 karolherbst: we run the stuff through the same optimizer :)
00:20 karolherbst: but nir does a much better job optimizing loops and so on
00:20 karolherbst: no idea when we will get that inside codegen
00:20 karolherbst: if at all
00:31 karolherbst: imirkin: would be nice if you could take a loot at my CTS fixes this weekend
00:31 karolherbst: https://patchwork.freedesktop.org/series/45307/ and https://patchwork.freedesktop.org/series/45313/
00:33 karolherbst: uhm, ignore the imageLoad for now
00:33 karolherbst: just the blitter
00:33 karolherbst: something in piglit hit that predicate assertion, which I need to track down
12:37 karolherbst: imirkin: any thoughts on turning on compatibility profiles? I ran those new piglit tests and the only real issue were clipvertex outputs in geometry shaders, but I fixed that.
12:38 karolherbst: we could just expose 4.3 and see in what issues we might run
12:38 karolherbst: I guess the alternative is, that certain applications just won't start
13:19 imirkin: karolherbst: yeah, clipping is the main thing
13:19 imirkin: i was thinking i might spend a bit of time with it
13:19 imirkin: since i know pretty much exactly how to fix it
13:27 karolherbst: imirkin: I came up with this patch to fix the shader side: https://github.com/karolherbst/mesa/commit/76acf37de18c0f3c1b37ebece93c8c5de51da703#diff-98aa617caad923f36d410f28e02e398f
13:27 karolherbst: I am sure there is more work needed for tesselation, but there aren't tests for that yet I think
13:27 karolherbst: but overall it seems like things are just working out fine
13:32 karolherbst: imirkin: but if you don't mind I will simply send this patch to the ML + enabling compat 4.3 and we could just go from there. I am not aware of other issues where we have piglit tests right now for
13:34 karolherbst: mhh, just found this: https://gist.githubusercontent.com/karolherbst/5f2ead93d09e91202595abd21ea8ab51/raw/49da5175a35ac3b6cf311e4169428fe1110c30ad/gistfile1.txt
13:34 karolherbst: we could also simply use $r4q in the export :(
13:50 karolherbst: pendingchaos: envyas crashes when I try to compile your mme method mme9097_inc_inv_counter
13:50 pendingchaos: yeah, you have to comment out some line in macro.c
13:50 pendingchaos: line 120 it seems
13:51 karolherbst: mhhh
13:54 pendingchaos: I think imirkin said it's some ambiguity in the syntax with that line and line 145
14:01 karolherbst: pendingchaos: well, it looks like the CTS tests are ahppy with it
14:24 pendingchaos: karolherbst: would it be acceptable to include the PUSH_KICK in final patches? otherwise https://hastebin.com/jebobexuzo.diff fails for me
14:24 pendingchaos: I don't know why it make it pass the test though
14:25 karolherbst: I think the CTS test failed once for me as well, but I couldn't reproduce that. I don't know that much about all the push buffer stuff so rather ask imirkin about that
14:28 pendingchaos: imirkin: ^ (in case the mention wasn't enough)
14:31 pendingchaos: from running the modified piglit test, the failure was completely reproducible
14:31 pendingchaos: so I wonder if that's a different problem
14:33 pendingchaos: I was able to make the piglit test pass in some runs by throwing in some serializes and MmeShadowRamControl(s) (according to switchbrew) though
15:25 rhyskidd: is there a good spot to store/add to existing knowledge of the PMU microcode firmware? so distinct from the hw itself
15:25 karolherbst: rhyskidd: what do you mean?
15:25 karolherbst: you mean documentation about the closed pmu firmware?
15:26 rhyskidd: RE'd documentation about the closed pmu firmware
15:26 rhyskidd: interfaces, commands etc
15:42 karolherbst: mhh, maybe something inside envytools? but I don't think there is much
15:43 rhyskidd: yes, mwk mentioned to keep rnndb hw-focused, so not including potentially version-specific microcode sw interfaces or info
15:46 karolherbst: well I don't know a different place where to put it
15:47 karolherbst: rhyskidd: you could also just create a repository on gitlab and put it there
16:14 rhyskidd: ok
17:13 karolherbst: imirkin: do you have any wip code or any conclusions from that packed_depth_stencil.blit.depth32f_stencil8 test fail? I think we might need a super complex vertex shader to handle this, but I hope you have more insights here and tell me that we won't need it
19:56 karolherbst: uhm, when I have a "join $whatever" does the join happen before or after the execution of that instruction?
19:56 karolherbst: we don't have that on maxwell anymore, right?
20:03 mwk: rhyskidd: just stuff it in envytools, but not in the hw-specific dirs
20:03 mwk: eg docs/fw or rnndb/fw
20:04 mwk: I think there's already something in the docs about pmu fw
20:09 rhyskidd: ok, thanks
20:14 karolherbst: imirkin: tarceri wrote something about OpenGL windows games needing compat profiles on wine. I was mainly thinking about fixing issues first I can verify in case somebody forces a compat profile for those.
20:16 rhyskidd: pendingchaos: great contributions to nouveau recently btw
20:20 pendingchaos: thanks
20:21 rhyskidd: always great to see new contributors doing cool things
20:33 imirkin: karolherbst: if your changes as-is didn't trigger any problems, then the current tests are heavily insufficient
20:33 imirkin: or perhaps tarceri hasn't pushed some stuff out yet
20:33 karolherbst: imirkin: yeah, that was the only fix I had to do
20:33 imirkin: either way, i don't want to push out known-broken patches
20:33 karolherbst: well, it doesn't enable anything yet
20:34 imirkin: might as well fix it up in one go
20:34 karolherbst: it just fixes a few things if you force a compat profile and run into this code path
20:34 imirkin: i'm not talking about substantial patches
20:34 imirkin: s/patches/changes/
20:34 imirkin: why not spend the extra 10 minutes on it
20:35 karolherbst: well mianly because I wouldn't know if it works or not
20:39 karolherbst: so I would rather spend more time writing those tests or ping tarceri about that and then wait until we have those and then continue from there
20:39 karolherbst: I kind of get the feeling that implementing this for tess shaders might be a bit more work than for geom
20:41 karolherbst: anyway, didn't want to spend too much time on that. Mainly just enabling and see which tests are easy to fix
20:45 imirkin: less work than for geom
20:46 imirkin: can just directly reuse the VS logic for TES
20:46 imirkin: just have to make sure the recompiles get triggered appropriately
20:46 imirkin: in the nvc0_shader_validate logic
20:49 karolherbst: ohh, okay
20:50 karolherbst: for eval and ctrl shaders I assume?
20:53 karolherbst: imirkin: I think the recompile should be already triggered. in the validation logic isn't something shader type specific really
20:54 karolherbst: maybe that part: "if (nvc0->dirty_3d & (NVC0_NEW_3D_CLIP | (NVC0_NEW_3D_VERTPROG << stage)))" but I think this should be actually fine, no?
20:54 karolherbst: this is guarding nvc0_upload_uclip_planes
21:00 mwk: rhyskidd: oh, and it might be a good idea to annotate any information you find with a blob version number
21:00 mwk: most of it is stable, but changes do happen
21:01 karolherbst: meh... ported the gk110 rsq to gm107, but I get kind of random results :(
21:05 HdkR: What is kepler rsq versus maxwell rsq?
21:05 karolherbst: I meant the fp64 one
21:06 HdkR: ah
21:06 HdkR: Nightmare fuel you mean
21:07 karolherbst: yes
21:07 karolherbst: I am sure it is something stupid
21:09 karolherbst: https://github.com/karolherbst/mesa/commit/d0720b690239fbc69cff2b50945b971fae74f81c
21:11 karolherbst: ohh wait... the CTS is testing different arguments every time
21:11 karolherbst: no wonder the output is random
21:12 HdkR: hah
21:12 karolherbst: but still
21:12 karolherbst: the result is very wrong
21:12 karolherbst: -7.25432e+24 vs 2 :)
21:17 HdkR: karolherbst: No use of mufu rsq64h?
21:17 karolherbst: I don't want to change the code
21:17 karolherbst: just port whatever we kind of have for gk110
21:17 HdkR: ah
21:18 HdkR: It cuts down a decent chunk of the work
21:19 HdkR: Still enormous obviously :P
22:21 karolherbst: ahh, I get a MISALIGNED_REG in dmesg
22:21 karolherbst: I just wrote a shader_test file to be able to test it painfree
22:21 HdkR: ah, that would be an issue
22:27 HdkR: I don't see any misalignments in that commit though...
22:29 HdkR: Unless it's those dsetp instructions?
22:29 HdkR:doesn't understand the order there
22:42 karolherbst: well
22:42 karolherbst: envyas already verifies that things are sane
22:42 karolherbst: maybe some sched screwup
22:43 HdkR: sched should throw a misaligned_reg though
22:43 HdkR: shouldn't*
22:43 karolherbst: ohh, true
22:44 karolherbst: so 32 bit vs 64 bit reg
22:46 karolherbst: sched screwup generates a ILLEGAL_INSTR_ENCODING
22:46 karolherbst: anyway, checking block by block then
22:47 HdkR: oops, lol
22:49 karolherbst: I am not entirely sure about the f2f thingies
22:49 karolherbst: but we will see what is the fault instruction
22:51 karolherbst: ...
22:51 karolherbst: it was a f2f
22:51 HdkR: `f2f ftz f32 f64 $r5 $r6` That one?
22:51 karolherbst: yeah
22:51 karolherbst: but I need to fix the others as well
22:52 karolherbst: f2f dty sty would have made sense
22:52 karolherbst: but apperantly it is f2f sty dty
22:52 HdkR: oh
22:52 karolherbst: checking the gk110 code again, but...
22:52 HdkR: Gross, okay. If it has that order of operands then that is an easy mistake
22:53 HdkR: destination as leftmost please :P
22:58 karolherbst: mhh, still something odd
22:59 karolherbst: but no errors in dmesg
23:01 karolherbst: HdkR: https://github.com/karolherbst/mesa/commit/e351d7c10d267321b758a65d96d850969718fc00
23:03 karolherbst: HdkR: isetp is a bit weird: isetp CC type lop $p0 $p1 $r0 $r1 $p2
23:03 karolherbst: I am not 100% sure for what $p1 is used, but I think isetp can output two predicates
23:03 karolherbst: one for the result and the other being !result
23:04 karolherbst: or something weird like that
23:04 karolherbst: again, not 100% sure
23:04 karolherbst: anyway, 1 means true
23:10 HdkR: Right
23:11 HdkR: Just wrong output instead of fault now?
23:15 karolherbst: yes
23:16 karolherbst: meh... I think I found it
23:17 karolherbst: iadd $r3 $r2 -0x1
23:17 karolherbst: -0x1 is 0x7ffffff not 0xffffffff
23:17 karolherbst: stupid
23:19 HdkR: hehe
23:19 imirkin: -1 is 0xffffffff ...
23:19 karolherbst: imirkin: well, envyas doesn't accept that :)
23:20 imirkin: iadd32i
23:20 karolherbst: I know
23:20 karolherbst: but, no
23:20 imirkin: or iadd -1
23:20 imirkin: er
23:20 karolherbst: "iadd32i $r3 $r2 0xffffffff"
23:20 imirkin: iadd neg 1
23:20 karolherbst: ahh -1 works
23:20 imirkin: coz it's a signed thing
23:21 imirkin: it gets encoded properly
23:21 imirkin: with a short immediate, it's the 1-bit + 19 bit thing
23:22 HdkR: Your assembler just derping on negative hex I guess is the issue?
23:22 HdkR: and not doing positive imm + negation?
23:22 karolherbst: mhh still something wrong
23:22 imirkin: HdkR: it's an assembler, not an instruction selection thing
23:22 HdkR: Right
23:23 imirkin: it SHOULD be able to accept a 0xffffffff immediate for a short integer imm though...
23:23 imirkin: or not. not sure.
23:23 imirkin: updates were made in that area
23:23 karolherbst: subr is substraction, right?
23:23 imirkin: reversed order
23:23 imirkin: i.e. b - a instead of a - b
23:23 karolherbst: ohhhhh
23:24 karolherbst: mhh
23:24 karolherbst: do we have that on maxwell?
23:24 imirkin: sure
23:24 imirkin: i think :)
23:24 imirkin: iadd neg a b
23:24 imirkin: right?
23:24 karolherbst: yeah
23:25 karolherbst: but mhh
23:25 karolherbst: b is an immediate
23:25 karolherbst: uhm wait...
23:26 karolherbst: got it
23:28 karolherbst: I am a bit confused about the "clamp" here: shl b32 $r4 $r4 clamp 0x14
23:28 karolherbst: but clamp is the default behaviour, no?
23:28 karolherbst: meaning if I write "shl $r4 $r4 0x14" for gm107 it should be equal, no?
23:32 HdkR: Should be
23:32 HdkR: Even though it doesn't really matter for immediate
23:33 karolherbst: but uhm, the error is quite small now
23:33 karolherbst: 15.4355 vs -15.5
23:33 karolherbst: maybe some odd error somewhere
23:35 karolherbst: oh wait no, it actually should return 0