05:05mareko: those u_default helpers are already fast paths
16:18karolherbst: not sure that's true for all use cases. in CL those basically end up stalling queues, because you have to execute them on the CPU while everything else gets ran on the GPU.
16:20karolherbst: there is already the issue with get_query_result that causes performance issues like that
16:59karolherbst: I have a CTS test where "find_next_divisor" (part of intel isl) is 35% of the CPU cycles