13:10karolherbst: mhh, maybe I take some time today and look into dual issueing (especially in the maxwell ISA)...
13:10karolherbst: would be cool to have it
13:26karolherbst: nice, the perf counters for dual issuing seem to be equal to gm200.. nice :)
13:27karolherbst: nice nice :)
13:27karolherbst: maybe I only expose those for now
13:28karolherbst: inst_executed = inst_issued1 + 2*inst_issued2 and inst_issued2 is 0 without enabling dual issueing :)
13:49karolherbst: MOV and s2r can be dual issued, interesting
18:34karolherbst: imirkin: if I have a BasicBlock or an Instruction being the last of a BB, how would I get the "binary" next instruction?
18:34karolherbst: like imagine you have a simple if else and it gets optimized through predicated
18:35imirkin: this is where bb edges come in
18:35imirkin: these are used to lay out the BB's in a linear sequence
18:35imirkin: each instruction has a "serial" which indicates its sequence
18:36imirkin: not sure if there's an easy way to look up the insturction at serial+1 though
18:36karolherbst: yeah..but I need it for dual issueing :)
18:36imirkin: the dual-issue stuff tends to be done at the very end
18:36imirkin: so like
18:37imirkin: (gimme a min to find the thing)
18:37karolherbst: I am sure we get this wrong for kepler as well btw
18:37imirkin: i remember it's somewhere illogical
18:37karolherbst: or... dunno
18:37imirkin: unfortunately... where is taht illogical place
18:38karolherbst: the issue is if you have two instructions predicated and one has a not
18:38karolherbst: so we have three bbs in play
18:38karolherbst: and we usually compare each of those with the cfg next bbs first instruction
18:38imirkin: so look at that prepareEmission thing for example
18:38imirkin: this will, within a bb, try to reorder stuff
18:38imirkin: there is no cross-bb stuff
18:39imirkin: (basically on nv50, "short" instructions have to be in pairs. but not all instructions can be short instructions)
18:40imirkin: i think hakzsam is the one who wrote all the scheduler stuff for maxwell
18:40karolherbst: ahh, similiar problem
18:40karolherbst: yeah, I know
18:40karolherbst: but there is no dual issue support yet ;)
18:40imirkin: and calim for kepler :)
18:41karolherbst: I am less concerned about kepler though as getting dual issueing wrong has no negative impact besides a small perf one
18:41karolherbst: on maxwell... I already broke shaders :)
18:41karolherbst: right now I have plot3d broken
18:42karolherbst: imirkin: ehh.. this code cheats as well and just sets the enc size to 8 :/
18:42karolherbst: for the last instruction in case it wasn't merged with the previous one
18:44karolherbst: I guess that's the only way
18:44karolherbst: but from an instruction or bb you also don't get the index :/
18:44karolherbst: or will that be eequal to bb->id
18:45karolherbst: probably not
18:45karolherbst: but it's a bit too late anyway
18:46karolherbst: "for (j = func->bbCount - 1; j >= 0 && !func->bbArray[j]->binSize; --j);" heh :D
18:46karolherbst: yeah well..
18:49karolherbst: we do sched calculation after the emitter...
18:49karolherbst: yeah, that's sane
19:05karolherbst: imirkin: https://github.com/karolherbst/mesa/commit/0fb15df113bb99826ea31dbcbaa4c3e1a56ef944
19:06karolherbst: fixes the plot3d rendering issue
19:06karolherbst: dual issue rate went up a lot as well
19:06karolherbst: by roughly 20%
19:28imirkin: karolherbst: you could do the bbArray calc once at the start
19:28imirkin: karolherbst: but with cross-bb stuff it's tricky
19:28imirkin: since there could be multiple in-edges
19:29imirkin: i don't think it really makes sense to allow dual-issue in those cases
19:35karolherbst: yeah, might be not worth the effort, yes
19:35karolherbst: but you still have to compare against the right one
19:35karolherbst: imirkin: anyway, perf results: https://gist.github.com/karolherbst/874bd7d75ad35175723a507303b3a36b
19:35karolherbst: mhh.. just not high enough to actually care/risk it still
19:35karolherbst: and juliafp64 is broken
19:35karolherbst: and JuliaFP32 probably as well somehow
19:36imirkin: did you ever get a chance to run CTS btw?
19:36imirkin: now that all the tests should pass on their own, at least
19:36karolherbst: on that or something I forgot?
19:36karolherbst: not yet
19:36imirkin: kepler and/or any other gens you feel like
19:36karolherbst: and I don't have the proper GPUs here except pascal.. and a maxwell.. and a kepler one.. *sigh* :D
19:36karolherbst: and now even a turing one
19:36karolherbst: all laptops
19:36imirkin: i like your definition of "don't have the proper GPUs"
19:36imirkin: "only 4 generations, so weak!"
19:37karolherbst: yeah.. but laptops, you know
19:37karolherbst: I am convinced one of those has a broken firmware
19:37karolherbst: I guess I will find some time for that in the future... I still want to be able to create OS images on the fly for testing
19:38karolherbst: and then I just netboot those laptops and run tests on them
19:38karolherbst: would be cool
19:39karolherbst: it's also for the CI think and then I would just wire up the CTS for now...
19:40karolherbst: but it takes soo long to run
19:40karolherbst: imirkin: did you manage to use piglit for the CTS run?
19:40karolherbst: never bothered with it yet
19:41airlied:still can't get piglit to run a cts
19:41airlied: cts crashes generating the test case list
19:42karolherbst: 20% isntructions are issued as pairs in pixmark_julia_fp32
19:42karolherbst: but I get the feeling it looks different
19:43karolherbst: imirkin: https://i.imgur.com/K5syNDq.png
19:47imirkin: karolherbst: never tried
19:47imirkin: karolherbst: do a tracediff with glretrace
19:47imirkin: it will compare frame-by-frame (or draw-by-draw)
19:48karolherbst: good idea
19:48karolherbst: but the dual issueing rate is higher than expected
19:49karolherbst: it's not like on kepler where you can dual issue two alu instructions
19:50karolherbst: and this is still without my reordering pass
22:27karolherbst: ehhh.. I think isCommutationLegal is buggy
22:27karolherbst: ohhh, no
22:27karolherbst: I am just stupid
22:31imirkin: it's pretty well tested :)
22:35karolherbst: imirkin: actually wondering if this is correct or not as this doesn't really show much of an impact: https://github.com/karolherbst/mesa/commit/4126293bc476485e089b2488ca57e55b368ee15c
22:36imirkin: that seems SUPER expensive, esp for a large BB though
22:36imirkin: i'd feel better if you limited your search of C to some number of instructions
22:36karolherbst: it's super cheap on kepler :p
22:36karolherbst: but yeah
22:36imirkin: like a BB with 500 instructions, and it'll take an hour
22:36imirkin: or 1000
22:36karolherbst: uhm.. maybe
22:36imirkin: e.g. shadertoy.com :)
22:36karolherbst: pixmark piano compile time is still fine
22:36imirkin: i'm not talking about reasonable shaders
22:37karolherbst: fun fact, this pass has no impact on pixmark piano shaders on maxwell
22:37imirkin: // we can't move fixed, flow instructions and instruction marked as join
22:37imirkin: we also can't move across such instructions
22:38imirkin: i.e. when you hit an op like that, stop looking
22:38imirkin: e.g. BAR
22:38imirkin: or EMIT
22:38karolherbst: ahh, right
22:38imirkin: if fixed means the op can't move
22:38imirkin: but every other op can move around it
22:38karolherbst: I think this pass is just broken anyway
22:38imirkin: then fixed doesn't mean much
22:38karolherbst: mhh, right
22:52karolherbst: ehhh.. uff
22:53karolherbst: I forgot a i->prev
23:38pendingchaos: karolherbst: in case you didn't know, someone else is also working on dual-issue: https://github.com/devkitPro/uam/commits/dual-issue-2
23:39karolherbst: this is my stuff :p mainly
23:39karolherbst: and the top commit was my idea :D
23:39karolherbst: it even says so in the commit
23:45pendingchaos: I think there's some dual-issue stuff that isn't in your v3 branch though (mainly on the second page)
23:47linkmauve: pendingchaos, can you feed these compiled programs back to Mesa?
23:49pendingchaos: I'm not sure what you mean?
23:50pendingchaos: what compiled programs are you referring to?
23:51karolherbst: pendingchaos: yeah.. but the maxas stuff is a bit weird and I was testing with it... I want to be a bit more conservative on the rules
23:52linkmauve: pendingchaos, the .dksh your README is talking about.
23:52linkmauve: Or how else can one use them?
23:52linkmauve: Also is it only for homebrew or do you plan on making that usable with Nouveau on Linux (my main target)?
23:53pendingchaos: it's a compiler for https://github.com/devkitPro/deko3d (not my readme btw)
23:54linkmauve: Ah, meh.
23:55karolherbst: I still don't understand why implementing a complete new proprietary API is an achievable goal, but.. oh well
23:55karolherbst: at least the compiler is mostly the same
23:56karolherbst: especially after they already got mesa to work with their homebrew stuff already