09:15 tomeu: pH5: so I ended up looking at using more of the internal SRAM, which should benefit the i.MX8M as well
09:15 tomeu: we were using it only for weights, and I'm looking at using it as well for caching the input tensor
09:15 tomeu: preliminary results look quite promising :)
09:15 tomeu: numbers in a bit
09:27 pH5: tomeu: awesome. while you're at this, there still is a place with 512 KiB SRAM hard-coded in create_nn_config() (where the kernel_pattern are determined).
09:28 tomeu: ah, will fix it
09:28 tomeu: looks like the image cache helps mobilenetv1 to go from 9.9ms to 6.6ms, and ssdlite mobiledet from 27ms to 24.8ms
09:28 pH5: regarding your comment about the zero-point yesterday, it appears that indeed coefficients are stored with the zero point subtracted, so zero was actually zero for all examples I've thrown at the blob so far.
09:29 tomeu: hmm, that could be because you are using a model with INT8 weights, instead of UINT8
09:30 tomeu: the HW seems to be only UINT8, so the blob converts to unsigned
09:30 tomeu: oh, but that could be the case only if the zero point was added, so ignore this
09:35 pH5: switching from uint8 to int8 weights for the same values and zero point (13 in the example I've just tried) only changes the coef_zero_point field (from 0x0d to 0x8d).
09:35 tomeu: ok, so I guess the huffman compression masks that difference