13:10 karolherbst: mhh, maybe I take some time today and look into dual issueing (especially in the maxwell ISA)...
13:10 karolherbst: would be cool to have it
13:26 karolherbst: nice, the perf counters for dual issuing seem to be equal to gm200.. nice :)
13:27 karolherbst: nice nice :)
13:27 karolherbst: maybe I only expose those for now
13:28 karolherbst: inst_executed = inst_issued1 + 2*inst_issued2 and inst_issued2 is 0 without enabling dual issueing :)
13:49 karolherbst: MOV and s2r can be dual issued, interesting
18:34 karolherbst: imirkin: if I have a BasicBlock or an Instruction being the last of a BB, how would I get the "binary" next instruction?
18:34 karolherbst: like imagine you have a simple if else and it gets optimized through predicated
18:35 imirkin: right
18:35 karolherbst: *predicates
18:35 imirkin: so
18:35 imirkin: this is where bb edges come in
18:35 imirkin: these are used to lay out the BB's in a linear sequence
18:35 imirkin: each instruction has a "serial" which indicates its sequence
18:36 imirkin: not sure if there's an easy way to look up the insturction at serial+1 though
18:36 karolherbst: yeah..but I need it for dual issueing :)
18:36 imirkin: the dual-issue stuff tends to be done at the very end
18:36 karolherbst: yep
18:36 imirkin: so like
18:37 imirkin: (gimme a min to find the thing)
18:37 karolherbst: I am sure we get this wrong for kepler as well btw
18:37 imirkin: i remember it's somewhere illogical
18:37 karolherbst: or... dunno
18:37 imirkin: unfortunately... where is taht illogical place
18:38 karolherbst: the issue is if you have two instructions predicated and one has a not
18:38 karolherbst: so we have three bbs in play
18:38 karolherbst: and we usually compare each of those with the cfg next bbs first instruction
18:38 imirkin: right.
18:38 imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp#n254
18:38 imirkin: so look at that prepareEmission thing for example
18:38 imirkin: this will, within a bb, try to reorder stuff
18:38 imirkin: there is no cross-bb stuff
18:39 imirkin: (basically on nv50, "short" instructions have to be in pairs. but not all instructions can be short instructions)
18:40 imirkin: i think hakzsam is the one who wrote all the scheduler stuff for maxwell
18:40 karolherbst: ahh, similiar problem
18:40 karolherbst: yeah, I know
18:40 karolherbst: but there is no dual issue support yet ;)
18:40 imirkin: and calim for kepler :)
18:41 karolherbst: I am less concerned about kepler though as getting dual issueing wrong has no negative impact besides a small perf one
18:41 karolherbst: on maxwell... I already broke shaders :)
18:41 karolherbst: right now I have plot3d broken
18:42 karolherbst: imirkin: ehh.. this code cheats as well and just sets the enc size to 8 :/
18:42 karolherbst: for the last instruction in case it wasn't merged with the previous one
18:42 imirkin: ;)
18:43 karolherbst: mhh
18:43 karolherbst: bbArray
18:44 karolherbst: I guess that's the only way
18:44 karolherbst: but from an instruction or bb you also don't get the index :/
18:44 karolherbst: or will that be eequal to bb->id
18:45 karolherbst: probably not
18:45 karolherbst: but it's a bit too late anyway
18:46 karolherbst: "for (j = func->bbCount - 1; j >= 0 && !func->bbArray[j]->binSize; --j);" heh :D
18:46 karolherbst: yeah well..
18:48 karolherbst: ohhhh
18:48 karolherbst: wait
18:49 karolherbst: we do sched calculation after the emitter...
18:49 karolherbst: yeah, that's sane
19:05 karolherbst: imirkin: https://github.com/karolherbst/mesa/commit/0fb15df113bb99826ea31dbcbaa4c3e1a56ef944
19:06 karolherbst: fixes the plot3d rendering issue
19:06 karolherbst: dual issue rate went up a lot as well
19:06 karolherbst: :)
19:06 karolherbst: by roughly 20%
19:28 imirkin: karolherbst: you could do the bbArray calc once at the start
19:28 imirkin: karolherbst: but with cross-bb stuff it's tricky
19:28 imirkin: since there could be multiple in-edges
19:29 imirkin: i don't think it really makes sense to allow dual-issue in those cases
19:34 karolherbst: well...
19:35 karolherbst: yeah, might be not worth the effort, yes
19:35 karolherbst: but you still have to compare against the right one
19:35 karolherbst: imirkin: anyway, perf results: https://gist.github.com/karolherbst/874bd7d75ad35175723a507303b3a36b
19:35 karolherbst: mhh.. just not high enough to actually care/risk it still
19:35 karolherbst: and juliafp64 is broken
19:35 karolherbst: and JuliaFP32 probably as well somehow
19:36 imirkin: did you ever get a chance to run CTS btw?
19:36 imirkin: now that all the tests should pass on their own, at least
19:36 karolherbst: on that or something I forgot?
19:36 karolherbst: ohh
19:36 karolherbst: that
19:36 karolherbst: not yet
19:36 imirkin: kepler and/or any other gens you feel like
19:36 karolherbst: and I don't have the proper GPUs here except pascal.. and a maxwell.. and a kepler one.. *sigh* :D
19:36 karolherbst: and now even a turing one
19:36 karolherbst: all laptops
19:36 imirkin: i like your definition of "don't have the proper GPUs"
19:36 imirkin: "only 4 generations, so weak!"
19:37 karolherbst: yeah.. but laptops, you know
19:37 karolherbst: I am convinced one of those has a broken firmware
19:37 karolherbst: anyway
19:37 karolherbst: I guess I will find some time for that in the future... I still want to be able to create OS images on the fly for testing
19:38 karolherbst: and then I just netboot those laptops and run tests on them
19:38 karolherbst: would be cool
19:39 karolherbst: it's also for the CI think and then I would just wire up the CTS for now...
19:40 karolherbst: but it takes soo long to run
19:40 karolherbst: :/
19:40 karolherbst: imirkin: did you manage to use piglit for the CTS run?
19:40 karolherbst: never bothered with it yet
19:41 airlied:still can't get piglit to run a cts
19:41 airlied: cts crashes generating the test case list
19:42 karolherbst: ohh
19:42 karolherbst: 20% isntructions are issued as pairs in pixmark_julia_fp32
19:42 karolherbst: but I get the feeling it looks different
19:43 karolherbst: imirkin: https://i.imgur.com/K5syNDq.png
19:47 imirkin: karolherbst: never tried
19:47 imirkin: karolherbst: do a tracediff with glretrace
19:47 imirkin: it will compare frame-by-frame (or draw-by-draw)
19:48 karolherbst: good idea
19:48 karolherbst: but the dual issueing rate is higher than expected
19:49 karolherbst: it's not like on kepler where you can dual issue two alu instructions
19:50 karolherbst: and this is still without my reordering pass
22:27 karolherbst: ehhh.. I think isCommutationLegal is buggy
22:27 karolherbst: ohhh, no
22:27 karolherbst: I am just stupid
22:31 imirkin: it's pretty well tested :)
22:35 karolherbst: imirkin: actually wondering if this is correct or not as this doesn't really show much of an impact: https://github.com/karolherbst/mesa/commit/4126293bc476485e089b2488ca57e55b368ee15c
22:36 imirkin: that seems SUPER expensive, esp for a large BB though
22:36 imirkin: i'd feel better if you limited your search of C to some number of instructions
22:36 karolherbst: it's super cheap on kepler :p
22:36 karolherbst: but yeah
22:36 imirkin: like a BB with 500 instructions, and it'll take an hour
22:36 imirkin: or 1000
22:36 karolherbst: uhm.. maybe
22:36 imirkin: e.g. shadertoy.com :)
22:36 karolherbst: pixmark piano compile time is still fine
22:36 imirkin: i'm not talking about reasonable shaders
22:37 karolherbst: :D
22:37 karolherbst: fun fact, this pass has no impact on pixmark piano shaders on maxwell
22:37 imirkin: // we can't move fixed, flow instructions and instruction marked as join
22:37 imirkin: we also can't move across such instructions
22:38 imirkin: i.e. when you hit an op like that, stop looking
22:38 karolherbst: dunno
22:38 karolherbst: mhh
22:38 imirkin: e.g. BAR
22:38 imirkin: or EMIT
22:38 karolherbst: ahh, right
22:38 imirkin: if fixed means the op can't move
22:38 imirkin: but every other op can move around it
22:38 karolherbst: I think this pass is just broken anyway
22:38 imirkin: then fixed doesn't mean much
22:38 karolherbst: mhh, right
22:52 karolherbst: ehhh.. uff
22:53 karolherbst: I forgot a i->prev
23:38 pendingchaos: karolherbst: in case you didn't know, someone else is also working on dual-issue: https://github.com/devkitPro/uam/commits/dual-issue-2
23:39 karolherbst: this is my stuff :p mainly
23:39 karolherbst: and the top commit was my idea :D
23:39 karolherbst: it even says so in the commit
23:45 pendingchaos: I think there's some dual-issue stuff that isn't in your v3 branch though (mainly on the second page)
23:47 linkmauve: pendingchaos, can you feed these compiled programs back to Mesa?
23:49 pendingchaos: I'm not sure what you mean?
23:50 pendingchaos: what compiled programs are you referring to?
23:51 karolherbst: pendingchaos: yeah.. but the maxas stuff is a bit weird and I was testing with it... I want to be a bit more conservative on the rules
23:52 linkmauve: pendingchaos, the .dksh your README is talking about.
23:52 linkmauve: Or how else can one use them?
23:52 linkmauve: Also is it only for homebrew or do you plan on making that usable with Nouveau on Linux (my main target)?
23:53 pendingchaos: it's a compiler for https://github.com/devkitPro/deko3d (not my readme btw)
23:54 linkmauve: Ah, meh.
23:55 karolherbst: I still don't understand why implementing a complete new proprietary API is an achievable goal, but.. oh well
23:55 karolherbst: at least the compiler is mostly the same
23:56 karolherbst: especially after they already got mesa to work with their homebrew stuff already