IRC Logs of #ml-mainline on irc.freenode.net for 2024-04-09

14:56 f_: tomeu: Seems like the bridge died last night, so you lost operator status. Just gave you back +o.
14:56 f_: Consider enabling AUTOOP too.
15:42 tomeu: cool, thanks, will give it a look
16:01 hallo1: Currently I'm pretty confused about why all other activation functions is significantly slower than ReLU on RKNPU2?
16:02 hallo1: For example from my benchmark of Yolov5s, the model is 1.8x faster if all Sigmoid in the model is replaced by ReLU
16:02 phh: because only ReLu is done in hardware
16:03 phh: for other activations it goes to fallback implementation
16:03 hallo1: If it is correct, the NPU have a LUT for other functions
16:03 hallo1: It shouldn't be that slow
16:06 phh: I haven't seen a LUT in the TRM, but I could have missed it. Either way, as I understand it, for the moment tomeu only wired things for MobileNet8
16:06 hallo1: Yeah, this is tested under the closed source SDK
16:08 phh: ah ok. that section of the TRM is rather hard to read, but as I understand it there is indeed only RELU (or rather "RELUX"....) that's supported. There is technically MUL, ADD and SUB that are also listed but hum, yeah
16:10 hallo1: There are some LUT configuration registers in the DPU part, as seen in the TRM.
16:12 phh: (+abs + neg + floor + ceil)
16:13 phh: ah yeah you're right, i've always skimmed over those considering how not-understandable they are
16:20 phh: anyway a LUT requires memory access which can't be done in just one clock cycle, performance loss is not too surprising
16:20 phh: exception of SRAM, it's possible that doing co-work between gpu and npu could be useful, but even though Android's NNAPI allows that, it doesn't look like tflite does
16:22 f_: I hope they didn't quit because of the +o I set..