00:00karolherbst: mhh, running ConstantFolding twice: total instructions in shared programs : 5726032 -> 5719281 (-0.12%) total gprs used in shared programs : 663147 -> 662790 (-0.05%)
00:10HdkR: Whoa, massive NIR patchset :D
00:10HdkR: Hopefully last revision? :)
00:13karolherbst: HdkR: yeah, I guess so
00:13karolherbst: maybe some minor cleanups here and there, but nothing I could also do later
00:14karolherbst: perf is terrible at the moment, but this can be improved as well
00:18HdkR: Perf is always bad when you haven't spent years optimizing. Sounds like a good time
00:20karolherbst: we run the stuff through the same optimizer :)
00:20karolherbst: but nir does a much better job optimizing loops and so on
00:20karolherbst: no idea when we will get that inside codegen
00:20karolherbst: if at all
00:31karolherbst: imirkin: would be nice if you could take a loot at my CTS fixes this weekend
00:31karolherbst: https://patchwork.freedesktop.org/series/45307/ and https://patchwork.freedesktop.org/series/45313/
00:33karolherbst: uhm, ignore the imageLoad for now
00:33karolherbst: just the blitter
00:33karolherbst: something in piglit hit that predicate assertion, which I need to track down
12:37karolherbst: imirkin: any thoughts on turning on compatibility profiles? I ran those new piglit tests and the only real issue were clipvertex outputs in geometry shaders, but I fixed that.
12:38karolherbst: we could just expose 4.3 and see in what issues we might run
12:38karolherbst: I guess the alternative is, that certain applications just won't start
13:19imirkin: karolherbst: yeah, clipping is the main thing
13:19imirkin: i was thinking i might spend a bit of time with it
13:19imirkin: since i know pretty much exactly how to fix it
13:27karolherbst: imirkin: I came up with this patch to fix the shader side: https://github.com/karolherbst/mesa/commit/76acf37de18c0f3c1b37ebece93c8c5de51da703#diff-98aa617caad923f36d410f28e02e398f
13:27karolherbst: I am sure there is more work needed for tesselation, but there aren't tests for that yet I think
13:27karolherbst: but overall it seems like things are just working out fine
13:32karolherbst: imirkin: but if you don't mind I will simply send this patch to the ML + enabling compat 4.3 and we could just go from there. I am not aware of other issues where we have piglit tests right now for
13:34karolherbst: mhh, just found this: https://gist.githubusercontent.com/karolherbst/5f2ead93d09e91202595abd21ea8ab51/raw/49da5175a35ac3b6cf311e4169428fe1110c30ad/gistfile1.txt
13:34karolherbst: we could also simply use $r4q in the export :(
13:50karolherbst: pendingchaos: envyas crashes when I try to compile your mme method mme9097_inc_inv_counter
13:50pendingchaos: yeah, you have to comment out some line in macro.c
13:50pendingchaos: line 120 it seems
13:54pendingchaos: I think imirkin said it's some ambiguity in the syntax with that line and line 145
14:01karolherbst: pendingchaos: well, it looks like the CTS tests are ahppy with it
14:24pendingchaos: karolherbst: would it be acceptable to include the PUSH_KICK in final patches? otherwise https://hastebin.com/jebobexuzo.diff fails for me
14:24pendingchaos: I don't know why it make it pass the test though
14:25karolherbst: I think the CTS test failed once for me as well, but I couldn't reproduce that. I don't know that much about all the push buffer stuff so rather ask imirkin about that
14:28pendingchaos: imirkin: ^ (in case the mention wasn't enough)
14:31pendingchaos: from running the modified piglit test, the failure was completely reproducible
14:31pendingchaos: so I wonder if that's a different problem
14:33pendingchaos: I was able to make the piglit test pass in some runs by throwing in some serializes and MmeShadowRamControl(s) (according to switchbrew) though
15:25rhyskidd: is there a good spot to store/add to existing knowledge of the PMU microcode firmware? so distinct from the hw itself
15:25karolherbst: rhyskidd: what do you mean?
15:25karolherbst: you mean documentation about the closed pmu firmware?
15:26rhyskidd: RE'd documentation about the closed pmu firmware
15:26rhyskidd: interfaces, commands etc
15:42karolherbst: mhh, maybe something inside envytools? but I don't think there is much
15:43rhyskidd: yes, mwk mentioned to keep rnndb hw-focused, so not including potentially version-specific microcode sw interfaces or info
15:46karolherbst: well I don't know a different place where to put it
15:47karolherbst: rhyskidd: you could also just create a repository on gitlab and put it there
17:13karolherbst: imirkin: do you have any wip code or any conclusions from that packed_depth_stencil.blit.depth32f_stencil8 test fail? I think we might need a super complex vertex shader to handle this, but I hope you have more insights here and tell me that we won't need it
19:56karolherbst: uhm, when I have a "join $whatever" does the join happen before or after the execution of that instruction?
19:56karolherbst: we don't have that on maxwell anymore, right?
20:03mwk: rhyskidd: just stuff it in envytools, but not in the hw-specific dirs
20:03mwk: eg docs/fw or rnndb/fw
20:04mwk: I think there's already something in the docs about pmu fw
20:09rhyskidd: ok, thanks
20:14karolherbst: imirkin: tarceri wrote something about OpenGL windows games needing compat profiles on wine. I was mainly thinking about fixing issues first I can verify in case somebody forces a compat profile for those.
20:16rhyskidd: pendingchaos: great contributions to nouveau recently btw
20:21rhyskidd: always great to see new contributors doing cool things
20:33imirkin: karolherbst: if your changes as-is didn't trigger any problems, then the current tests are heavily insufficient
20:33imirkin: or perhaps tarceri hasn't pushed some stuff out yet
20:33karolherbst: imirkin: yeah, that was the only fix I had to do
20:33imirkin: either way, i don't want to push out known-broken patches
20:33karolherbst: well, it doesn't enable anything yet
20:34imirkin: might as well fix it up in one go
20:34karolherbst: it just fixes a few things if you force a compat profile and run into this code path
20:34imirkin: i'm not talking about substantial patches
20:34imirkin: why not spend the extra 10 minutes on it
20:35karolherbst: well mianly because I wouldn't know if it works or not
20:39karolherbst: so I would rather spend more time writing those tests or ping tarceri about that and then wait until we have those and then continue from there
20:39karolherbst: I kind of get the feeling that implementing this for tess shaders might be a bit more work than for geom
20:41karolherbst: anyway, didn't want to spend too much time on that. Mainly just enabling and see which tests are easy to fix
20:45imirkin: less work than for geom
20:46imirkin: can just directly reuse the VS logic for TES
20:46imirkin: just have to make sure the recompiles get triggered appropriately
20:46imirkin: in the nvc0_shader_validate logic
20:49karolherbst: ohh, okay
20:50karolherbst: for eval and ctrl shaders I assume?
20:53karolherbst: imirkin: I think the recompile should be already triggered. in the validation logic isn't something shader type specific really
20:54karolherbst: maybe that part: "if (nvc0->dirty_3d & (NVC0_NEW_3D_CLIP | (NVC0_NEW_3D_VERTPROG << stage)))" but I think this should be actually fine, no?
20:54karolherbst: this is guarding nvc0_upload_uclip_planes
21:00mwk: rhyskidd: oh, and it might be a good idea to annotate any information you find with a blob version number
21:00mwk: most of it is stable, but changes do happen
21:01karolherbst: meh... ported the gk110 rsq to gm107, but I get kind of random results :(
21:05HdkR: What is kepler rsq versus maxwell rsq?
21:05karolherbst: I meant the fp64 one
21:06HdkR: Nightmare fuel you mean
21:07karolherbst: I am sure it is something stupid
21:11karolherbst: ohh wait... the CTS is testing different arguments every time
21:11karolherbst: no wonder the output is random
21:12karolherbst: but still
21:12karolherbst: the result is very wrong
21:12karolherbst: -7.25432e+24 vs 2 :)
21:17HdkR: karolherbst: No use of mufu rsq64h?
21:17karolherbst: I don't want to change the code
21:17karolherbst: just port whatever we kind of have for gk110
21:18HdkR: It cuts down a decent chunk of the work
21:19HdkR: Still enormous obviously :P
22:21karolherbst: ahh, I get a MISALIGNED_REG in dmesg
22:21karolherbst: I just wrote a shader_test file to be able to test it painfree
22:21HdkR: ah, that would be an issue
22:27HdkR: I don't see any misalignments in that commit though...
22:29HdkR: Unless it's those dsetp instructions?
22:29HdkR:doesn't understand the order there
22:42karolherbst: envyas already verifies that things are sane
22:42karolherbst: maybe some sched screwup
22:43HdkR: sched should throw a misaligned_reg though
22:43karolherbst: ohh, true
22:44karolherbst: so 32 bit vs 64 bit reg
22:46karolherbst: sched screwup generates a ILLEGAL_INSTR_ENCODING
22:46karolherbst: anyway, checking block by block then
22:47HdkR: oops, lol
22:49karolherbst: I am not entirely sure about the f2f thingies
22:49karolherbst: but we will see what is the fault instruction
22:51karolherbst: it was a f2f
22:51HdkR: `f2f ftz f32 f64 $r5 $r6` That one?
22:51karolherbst: but I need to fix the others as well
22:52karolherbst: f2f dty sty would have made sense
22:52karolherbst: but apperantly it is f2f sty dty
22:52karolherbst: checking the gk110 code again, but...
22:52HdkR: Gross, okay. If it has that order of operands then that is an easy mistake
22:53HdkR: destination as leftmost please :P
22:58karolherbst: mhh, still something odd
22:59karolherbst: but no errors in dmesg
23:01karolherbst: HdkR: https://github.com/karolherbst/mesa/commit/e351d7c10d267321b758a65d96d850969718fc00
23:03karolherbst: HdkR: isetp is a bit weird: isetp CC type lop $p0 $p1 $r0 $r1 $p2
23:03karolherbst: I am not 100% sure for what $p1 is used, but I think isetp can output two predicates
23:03karolherbst: one for the result and the other being !result
23:04karolherbst: or something weird like that
23:04karolherbst: again, not 100% sure
23:04karolherbst: anyway, 1 means true
23:11HdkR: Just wrong output instead of fault now?
23:16karolherbst: meh... I think I found it
23:17karolherbst: iadd $r3 $r2 -0x1
23:17karolherbst: -0x1 is 0x7ffffff not 0xffffffff
23:19imirkin: -1 is 0xffffffff ...
23:19karolherbst: imirkin: well, envyas doesn't accept that :)
23:20karolherbst: I know
23:20karolherbst: but, no
23:20imirkin: or iadd -1
23:20karolherbst: "iadd32i $r3 $r2 0xffffffff"
23:20imirkin: iadd neg 1
23:20karolherbst: ahh -1 works
23:20imirkin: coz it's a signed thing
23:21imirkin: it gets encoded properly
23:21imirkin: with a short immediate, it's the 1-bit + 19 bit thing
23:22HdkR: Your assembler just derping on negative hex I guess is the issue?
23:22HdkR: and not doing positive imm + negation?
23:22karolherbst: mhh still something wrong
23:22imirkin: HdkR: it's an assembler, not an instruction selection thing
23:23imirkin: it SHOULD be able to accept a 0xffffffff immediate for a short integer imm though...
23:23imirkin: or not. not sure.
23:23imirkin: updates were made in that area
23:23karolherbst: subr is substraction, right?
23:23imirkin: reversed order
23:23imirkin: i.e. b - a instead of a - b
23:24karolherbst: do we have that on maxwell?
23:24imirkin: i think :)
23:24imirkin: iadd neg a b
23:25karolherbst: but mhh
23:25karolherbst: b is an immediate
23:25karolherbst: uhm wait...
23:26karolherbst: got it
23:28karolherbst: I am a bit confused about the "clamp" here: shl b32 $r4 $r4 clamp 0x14
23:28karolherbst: but clamp is the default behaviour, no?
23:28karolherbst: meaning if I write "shl $r4 $r4 0x14" for gm107 it should be equal, no?
23:32HdkR: Should be
23:32HdkR: Even though it doesn't really matter for immediate
23:33karolherbst: but uhm, the error is quite small now
23:33karolherbst: 15.4355 vs -15.5
23:33karolherbst: maybe some odd error somewhere
23:35karolherbst: oh wait no, it actually should return 0