18:51 haasn: is there a dif between subgroupAdd(1u) and subgroupBallotBitCount(subgroupBallot(true)) ?
18:51 haasn: other than requiring a different subgroup feature set
18:55 haasn: for context, I want optimize a histogram measurement code by if (subgroupAllEqual(bin)) { if (subgroupElect()) { atomicAdd(hist[bin], gl_SubgroupSize); }} else { atomicAdd(hist[bin], 1u); /* slow path */ }
18:55 haasn: but this does not produce correct results
18:55 haasn: because sometimes not all invocations are active
19:05 pendingchaos: subgroupBallotBitCount(subgroupBallot(true)) is faster, because ACO does not optimize the former at the moment
19:05 haasn: interesting
19:05 haasn: seems like low-hanging fruit to optimize arithmetic ballots of cvals
19:06 haasn: that aside I'm a bit perplexed by why GLSL doesn't have an innate primitive for counting the number of active invocations
19:06 haasn: I guess because it's redundant with that
19:06 haasn: or because they expect most shader authors to be able to tell which shader invocations will be active and which not