09:11hallo1: Actually I have tried NPU-GPU co-operation using the custom operator API inside the closed-source SDK, but it is very slow, I think mainly because the tensor data get converted from NCHWc to NCHW and backwards by CPU on every custom op call
09:25hallo1: In fact GPU can process NCHWc data too (if you know how to program it) so this convertion should be optional, but RK does not give an option ¯\_ (ツ)_/¯