12:33zmike: eric_engestrom: when is next branchpoint? I didn't see in docs
13:50simon-perretta-img: The demote_to_helper_invocation spec states that stores should be suppressed, but doesn't specify how that applies to atomics; is it safe enough to assume it follows the gl_HelperInvocation spec (atomic op return values are undefined), or should atomic ops essentially turn into memory loads?
14:05glehmann: undefined helper return values should be fine
14:06glehmann: amd drivers implement it like that at least
14:09alyssa: simon-perretta-img: undefined, but I don't have a spec citation
14:09simon-perretta-img: Cool cool, thanks both!
14:10alyssa: (undefined value, not undefined behaviour, although that's actually an interesting question.)
14:10alyssa: anyway uh we have a nir pass for this
14:10alyssa: predicate_helper_writes or something like that
14:10alyssa: should work for demote too if you do the right magic sequence
14:10alyssa: or maybe im thinking of terminate
14:10alyssa: idk, just copy whatever i did in my powervr compiler, this works there
14:13simon-perretta-img: To be honest I was considering handling it in the backend instead, replacing the store addresses in demoted/helper invocations and null buffer descriptors with a dummy buffer, just saves on a bit of control flow
14:16alyssa: it's allowed
14:16alyssa: on AGX, assuming the r/e cycle count table is accurate, the if-endif isn't anymore expensive than the pile of bcsel's
14:17alyssa: so I didn't bother
14:17simon-perretta-img: Fair
14:17alyssa: Faith and I have talked about adding some sort of predicated load/store stuff to NIR
14:17alyssa: I'm not sure what that would look like yet
14:18alyssa: but it seems like "@store_if(value, address, condition 1-bit bool)" would be useful to a bunch of backends
14:18alyssa: and would let nir_opt_peephole_select be a lot more effective
14:18alyssa: the trap here is that IDK the right level of generality
14:18alyssa: do we want to be able to predicate any intrinsic? probably not
14:19glehmann: I wouldn't worry about it during bring up, stores/atomics in FS are not that common
14:19alyssa: do we want to add 50 new intrinsics that are just predicated variants of existing ones? probably not
14:19alyssa: do we want to add a new source to 50 existing intrinsics? probably not
14:19alyssa: I think NAK has a backend version of peephole select using hw predication (?)
14:19simon-perretta-img: Makes sense yeah, that'd be useful for us too but I see the generality issue
14:20alyssa: but doing it in NIR pre-RA would bring benefits for scheduling (bigger basic blocks & less control flow)
14:21alyssa: but like... atomics are uncommon & heavyweight anyway
14:21alyssa: so if we just did load_global/store_global that would cover probably 80% of compute shader use cases in practice
14:22alyssa: (and then replacing a pile of bcsel with hw predication is probably doable.)
14:22alyssa: (doable in backend post-RA i mean)
14:23simon-perretta-img: That sounds like it'd be a good first step, yeah; beyond that, the other sorts of ops I can think of that could be useful to use predication on are basically already in that form (demote/terminate_if, etc.)
14:24alyssa: right
14:26simon-perretta-img: I feel like going even more general with predication could make sense as part of e.g. something like a big control-flow overhaul to introduce constructs beyond the single while {} loop form perhaps
14:26simon-perretta-img: But I suspect that's not in the cards anytime soon :P
14:31alyssa: I don't want to think about the implications of that
14:34simon-perretta-img: Understandable haha
14:46glehmann: loops in NIR have to be simple, and most backends can optimize them well enough at the moment to whatever the hw provides
14:49simon-perretta-img: Oh no for sure; I was deliberately choosing a relatively unlikely scenario by using that as the example
14:51simon-perretta-img: I have seen a paper before that modelled predicated execution using a psi instruction, but don't think it really covered ops like stores where no defs are being produced
14:58alyssa: glehmann: switch statements are where NIR is being held back
14:59glehmann: sure, but I don't think they would gain amd much
14:59glehmann: unless the condition is uniform, then we can use a jump table
14:59glehmann: I guess NAK could always use a jump table...
15:00glehmann: but I think there's also low hanging fruit for switches in vtn
21:22mlankhorst: airlied: ping?
21:55airlied: mlankhorst: pong
21:55airlied: I've pulled your xe patch into the branch here fyi, will post it on a resend