13:09 pendingchaos: imirkin: isn't 1 << 11 here too little space for NVC0_CB_AUX_BINDLESS_INFO: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nve4_compute.c#n614 ?
14:24 kwizart: hi there, I have a nouveau DATA_ERROR I cannot decode using nouveau on armhfp for jetson tk1 on a 4.18.7-300.fc29 kernel (and fedora 29 userspace)
14:24 kwizart: nouveau 57000000.gpu: gr: DATA_ERROR 0000009c [] ch 2 [0400155000 gnome-shell[1007]] subc 0 class a297 mthd 0d78 data 00000004
14:24 kwizart: everything works with under Xorg
15:07 RSpliet: Is there a reason why the blob might not be emitting sin/cos instructions on Kepler?
15:14 RSpliet: Oh nm, that's the "sin()" vs. "native_sin()" OpenCL oddity
15:22 pendingchaos: imirkin: the (samples+2)>>2, samples > 2 thing sounds good
15:22 pendingchaos: I think I'll create a newer version with it
15:53 karolherbst: RSpliet: ;)
16:05 RSpliet: karolherbst: guess I can't blame the parboil devs falling for the same fallacy... but they do drop 60% perf on the floor there :-/
16:06 karolherbst: RSpliet: relaxed-math
16:07 karolherbst: RSpliet: -cl-fast-relaxed-math
16:07 RSpliet: It's hardly relaxed though...
16:07 karolherbst: == -cl-finite-math-only and -cl-unsafe-math-optimizations
16:07 karolherbst: -cl-finite-math-only : assume no NaN
16:07 karolherbst: -cl-unsafe-math-optimizations: all the other stuff :p
16:07 karolherbst: and breaks compliance
16:08 karolherbst: and makes mad == fma
16:08 RSpliet: The error bounds on native_sin() are for x in [-π,π], the maximum absolute error is 2^-21.41 ,and larger otherwise.
16:08 karolherbst: and stuff
16:08 karolherbst: I am sure that -cl-fast-relaxed-math gives you a _huge_ perf improvement
16:08 RSpliet: Whereas regular sin() has 2ULP max error
16:10 RSpliet: I get a 60% perf increase by writing "native_sin()" manually in the kernels, no need for compiler flags :-P
16:11 RSpliet: Thing is, I'm not *that* interested in performance, I care about the instruction mix. And having 666 instructions in an 840 instruction program just to perform an ever-so-slightly-higher-precision sin/cos doesn't sound relevant to what I wish to extract from the benchmarks.
16:25 karolherbst: :p I see
16:25 RSpliet: That being said, unsafe-math-opts gives a pretty similar effect to manually using the native_sin()/native_cos() operations
16:26 RSpliet: Whoa what happened here...
16:26 RSpliet: 00000300: e0001de7 4003ffff B bra 0x300
16:26 RSpliet: ^ that location is supposed to contain sched codes :')
16:32 RSpliet: This instruction should never be reached, as the preceding instruction is an exit, but still... nasty!