05:24 endrift: Hmm, is the modulo operator (on integers) just really slow on Maxwell shaders? I have some shader code that conditionally does x -= x % y to grid-align coordinates to an arbitrary (read: npot) grid, but when it's enabled it's super slow on TX1 using Nouveau
05:24 endrift: it's fine everywhere else
05:25 imirkin: endrift: nvidia gpu's don't have integer division
05:26 endrift: that would do it
05:26 HdkR: None of them have integer division, you're relying on bigger GPUs brute forcing it
05:26 endrift: I was just wondering if that was the case
05:27 endrift: so is it possible to make this faster on my end? Or do I just have to suffer?
05:27 endrift: casting to float, dividing, then back to int seems...less than ideal
05:27 HdkR: If you can do BFEs it'll be faster
05:27 endrift: I might be able to munge it into some weird reciprocal division I guess
05:27 endrift: BFE?
05:27 HdkR: bitfield extract, and masking, etc
05:27 imirkin: if you do x % <immediate value> then it'll be faster
05:28 endrift: it's configurable via uniform unfortunately
05:28 imirkin: maybe blob inlines it, dunno
05:28 imirkin: oh, also i recommend making these values unsigned
05:28 imirkin: modulo with signed quantities is extra-annoying
05:29 imirkin: i.e. ensure that both x and y have unsigned types
05:29 endrift: ohhh fair
05:31 imirkin: (like wtf is -10 % -2? who knows. takes extra ops to figure it out)
06:02 endrift: given the limited range of both sides I wonder if there are any weird tricks I can do
06:03 endrift: lhs is always going to be between 0 and 255, rhs is always going to be between 2 and 8, both inclusive
06:04 HdkR: Use the shader5 extension to get uint8_t types? Maybe mesa will optimize it for you :P
06:31 endrift: ok, I did it the way the compiler does integer division by constant :P
06:31 endrift: multiply by a shifted up reciprocal, then shift down
06:31 endrift: since the number of divisors was small I can just precompute it and stuff it in a table
06:32 endrift: hopefully I can get away with using 20 bits of integer in the process though