00:41imirkin: HdkR: glBlendBarrier(). overdraw without _coherent is undefined.
00:52HdkR: imirkin: Yea, I found that out. Didn't realize it was undefined without coherent
00:54imirkin: allows the desktop gpu's to keep up :)
01:15HdkR: imirkin: Even though Intel and AMD can do FBFetch without much issue? ;)
01:16imirkin: intel on gen9+, and i don't think amd can
01:16HdkR: I thought Vega introduced some sort of FBFetch thing?
01:17imirkin: i think pascal has something too
01:17HdkR: I may also be making crap up though
01:17imirkin: check where _coherent is supported
01:17imirkin: should map nicely
01:17airlied: if crap comes with a patch then it's better :-P
01:19imirkin: huh. surprising. looks like maxwell+ supports _coherent
01:19imirkin: probably some rast option
01:19imirkin: (including GM10x)
01:19HdkR: Correct. It supports coherent
01:19imirkin: coz i don't think there's any op to do it
01:20imirkin: maybe the shared memory hack? dunno
01:20imirkin: would have to trace
01:27HdkR: imirkin: What's the shared memory hack?
12:24pendingchaos: ah, codegen has a sort of fallthrough for basic blocks like with switch statements in C/C++
12:24pendingchaos: which is used to implement branching
13:15imirkin: pendingchaos: yeah. there are missing bra's all over the place
13:27imirkin: HdkR: just guessing as to its potential existence, but like that GLES ext that lets you store stuff per frag which becomes accessible to the next frag shader for that position
13:27imirkin: which one could implement using shared memory? dunno
13:27HdkR: ...How did you know I was about to say something
13:28imirkin: coz i'm standing over your shoulder
13:28imirkin: knock knock
13:28imirkin: follow the white rabbit
13:29HdkR: Could that actually be implemented with shared memory?
13:31HdkR: Trying to remember how shared memory works. Haven't really messed with it
13:31HdkR: Something like up to 96KB/warp?
13:32imirkin: depends on model
13:33imirkin: i think yeah on maxwell+
13:33HdkR: Although I think there is an issue where the state of the shared memory isn't kept across invocations?
13:33imirkin: but afaik it's not accessible
13:33imirkin: but perhaps there's some way? dunno
13:33HdkR: Also you can't use LDS/STS outside of compute, so you would need to LD/ST in to the shared window
13:33imirkin: no shared window either
13:33imirkin: at least ... wasn't
13:34imirkin: but again, my knowledge mostly centers on fermi/kepler
13:36HdkR: At that point you may as well as stuff it in to a device local only SSBO though
13:38HdkR: Scariest thing is random breaking that could occur do to the unknown of shared data sticking around across invocations, and I'm assuming nothing happening in the case of some work happening to get between the invocations, corrupting it
13:53pendingchaos: karolherbst: I think I'll override things like setSrc() and swapSources() in PhiInstruction to print a warning
14:05pendingchaos: imirkin: wouldn't it be more robust to not have the basic block fallthrough until codegen?
14:05pendingchaos: and codegen orders basic blocks and takes advantage of fallthrough to remove unneeded jumps
14:05pendingchaos: the current approach feels a little fragile
14:15HdkR: LLVM kills fallthrough branches at the MachineIR level ;)
14:20pendingchaos: not the other way around (adding fallthrough during or right before code generation)?
14:24HdkR: It's before code generation, but it is in the MachineInstr form. Can't remember if it is pre or post RA. I think Pre
14:51RSpliet: HdkR: is that the Cuda definition of "shared memory" (aka. OpenCL "Local memory")? If so, 96KiB/warp sounds like an awful lot. Sure that's not per work-group?
14:51RSpliet: Thread block... whatever
14:53RSpliet: Yeah, defo per thread block/work-group ;-)
15:23HdkR: RSpliet: Ah, per CTA then
15:23HdkR: Guess that makes sense
15:30RSpliet: HdkR: I've only looked at them from hardware limitations, in which case Kepler has up to 48KiB available per SM and only an ability to schedule one work-group in parallel. Maxwell or Pascal started allowing to schedule multiple TBs in parallel (in an attempt to reduce stalling time) on a single SM, but added more local mem to compensate.
15:31HdkR: I see
16:06pendingchaos: imirkin: what's the condition for GM107's exit do
16:35pendingchaos: seems, at least for exit, the condition code should actually be a predicate
16:41pendingchaos: seems I was feeding nvdisasm wrong actually
18:35karolherbst: pendingchaos: yeah well, that thing is, if you "disable" getSrc for PhiInstructions, then there is no point having in inherit Instruction in the first place
19:00pendingchaos: karolherbst: It's simpler than having a BaseInstruction and a PhiInstruction and NormalInstruction (both inheriting from BaseInstruction) though
19:00pendingchaos: though I think phi instructions are stored specially
19:01karolherbst: pendingchaos: well, but that would be the better design ;)
19:01karolherbst: pendingchaos: kind of, they are the first instructions inside a BB
19:01karolherbst: we _can_ be more explicit about it and simply say, they are no normal instructions
19:01karolherbst: because of those things
19:02karolherbst: the thing with overwriting setSrc is, that it would make it illegal to use PhiInstruction in places you use Instruction
19:02karolherbst: and thigs is a big headache
19:02karolherbst: normally you would say: okay, we can just use it here, but now you have to be sure it isn't a PhiInstruction
19:05pendingchaos: I don't think there is much code calling setSrc on random instructions?
19:05pendingchaos: if it were to assert or print a warning, afaik biggest thing to do would be changing ValueDef::replace() to handle PhiInstructions
19:07karolherbst: pendingchaos: a lot of code in peephole, no?
19:08pendingchaos: changing a phi instruction's source before this change had to be done carefully already though
19:08karolherbst: pendingchaos: with my nir stuff I have 400 setSrc calls
19:08karolherbst: 351 without it
19:09karolherbst: pendingchaos: yeah, practically this is fine. the class design isn't really that good as it is, but it works alright overall
19:09karolherbst: I just don't want to have more hacky things added to it
19:10pendingchaos: I think the setSrc calls in peephole is fine
19:11pendingchaos: it seems to only call it when it knows what type the instruction is, or when iterating over a basic blocks instructions starting with getEntry()
19:12karolherbst: that's what I meant with "practically this is fine"
19:13pendingchaos: I think I'll create an updated patch with setSrc() overridden to print a warning
19:14karolherbst: I think I would still rather want to have some kind of class hierachy to better deal with that
19:15karolherbst: I doubt there would be that many changes actually needed
19:15karolherbst: well, only if something iterates over all instructions inside a BB
19:15karolherbst: but we can already handle it inside the loop
19:16karolherbst: and won't have to change code all over the place
19:17karolherbst: in the end it is a question of maintainability and how painful it would be to write code in the future or make the code readable for people not involved, etc...
19:17karolherbst: and those hidden dependencies are always painful to get at first
19:59pendingchaos: karolherbst: how about performing a sanity check for phi instructions in GCRA::doCoalesce?
19:59karolherbst: depends on what we want to verify
19:59Lyude: RSpliet: poke, "16:15 <freenode#nouveau> <RSpliet> nyef: lyude is the person to ask further about this" <-- what's up?
20:00karolherbst: pendingchaos: what kind of illegal state do you want to detect?
20:00nyef: Lyude: I was noticing that the blob tends to reload the context-switch microcode fairly often.
20:00karolherbst: there isn't really much which doesn't go, except sources having no BB attached
20:00pendingchaos: that basicBlocks.size() == srcCount() (slightly simplified) and that each source is defined in it's basic block (or in one that dominates it)
20:00karolherbst: and maybe all BBs should be different
20:00Lyude: nyef: oh? what generation
20:01nyef: I have an NVAF / MCP89.
20:01Lyude: i'm assuming this is related to powergating as well?
20:01karolherbst: pendingchaos: well, I would rather want to have the API written in a way, we don't have to do that in the first place though
20:01nyef: Ah... probably not in my case, but it could be for all I know?
20:01Lyude: I'm not sure :P, tbh I don't work with the firmware/µcode stuff at all
20:02Lyude: i remember on some of the tesla generations they had to continuously reupload firmware to work around some issues from elpg
20:33pendingchaos: karolherbst: I think I'll see if having a parent class with read-only sources for both Instruction and PhiInstruction could be reasonably unintrusive
20:37karolherbst: pendingchaos: yeah, I think that would lead to a cleaner code design
23:00pendingchaos: karolherbst: it's turning out to be pretty big so far: 470 insertions(+), 375 deletions(-)
23:00pendingchaos: 238 of the changes is mostly moving around code in nv50_ir.h
23:00pendingchaos: a lot of it is also changing type declarations or adding casts