08:09 tomeu: pH5: looks like the NPU in the s905d3 also uses huffman compression
08:10 tomeu: this is the first 12 bytes:
08:10 tomeu: 10 04 00 01 00 01 00 00 00 00 00 00
08:10 tomeu: does it look like that to you?
08:23 pH5: tomeu: that does look familiar. see struct etna_nn_header_v8 in !27769, that would be version=1, run_length_size=4, run_length_table=[0,1,0,1].
08:31 tomeu: nice
08:31 tomeu: I'm still not getting an interrupt back even with an identical cmdstream, so I guess I have the power sequence wrong
09:01 pH5: tomeu: which chip ids does this NPU report? I'd be interested to know if the corresponding hwdb entry has the NN_XYDP0 bit set.
10:59 tomeu: this is 0x8000,
10:59 tomeu: 0x7131, // ChipRevision
10:59 tomeu: so no
11:09 pH5: interesting. so it may be not quite as easy as NN_XYPD0 -> v8 -> huffman compression
11:10 pH5: OTOH, from the kernel flop reset it looks like it doesn't use the same separated out bias correction that the npu on i.mx8mp does, so maybe the encoding is different as well.
11:10 pH5: compare e.g.:
11:11 pH5: https://github.com/khadas/android_vendor_amlogic_common_npu/blob/khadas-vim4-r-64bit/hal/kernel/arch/gc_hal_kernel_hardware_func_flop_reset_config.h#L708-L715
11:11 pH5: and
11:11 pH5: https://github.com/nxp-imx/linux-imx/blob/lf-6.1.36-2.1.0/drivers/mxc/gpu-viv/hal/kernel/arch/gc_hal_kernel_hardware_func_flop_reset_config.h#L231-L242
11:16 tomeu: guess it makes sense that the compressed buffer's format isn't very stable, as that is the part of the NPU that needs to be kept continuously improving
11:20 tomeu: unfortunately, I need to figure out why the NPU doesn't seem to be fully powered up before I get to this :/
11:28 tomeu: and that is fixed now :)
13:09 pH5: I think that NPU might be VSIMULATOR_CONFIG=VIPNANOQI_PID0XA1 in the TIM-VX emulator. If so, the encoding is clearly different.
13:11 tomeu: that sounds some variant more similar to the NPU in the A311D (with 8 NN cores)
13:11 tomeu: this should be something like vippico_v3
13:12 tomeu: with customerID 0x99 instead of 0xa1
13:15 tomeu: # 0. Set correct NPU target name for your device, you can learned this from your soc vendor
13:15 tomeu: export VSIMULATOR_CONFIG=PID_0x99
13:15 tomeu: so maybe just that? PID_0x99
13:16 pH5: VIPPICO_V3_PID0X99
13:16 tomeu: that worked?
13:18 pH5: seems to, running now (I'm just producing all 256 [x 0 0 0] 2x2 kernels and listing the bitstreams)
13:21 pH5: yup, same results though. http://paste.debian.net/1315063 (nano-si+) vs http://paste.debian.net/1315064 (pico-v3)
13:22 tomeu: by same results you mean that the simulator generated the same compressed BO for both? or that they are different in the same way we already have seen?
13:24 pH5: The simulator seems to produce the same minimal bitstreams for VIPNANOQI_PID0XA1 and VIPPICO_V3_PID0X99.
13:26 tomeu: hmm, ok
13:27 tomeu: I will see what I get here in a bit. what are the parameters for the conv2d that you pasted?
13:31 pH5: tomeu: that's output from https://gitlab.freedesktop.org/pH5/ask-tim/-/blob/main/src/conv2x2.cc with the weights overwritten via cmdline, so no scaling, no bias, zero point 0.
13:32 pH5: stored the binary with VIV_VX_ENABLE_SAVE_NETWORK_BINARY=1, grepped for the bitstream in scripts/nb_dump.py's output.
15:05 tomeu: pH5: btw, it may be useful to you to run the simulator with VIV_VX_ENABLE_HUFFMAN=0
15:05 tomeu: here the HW hangs with it though
15:06 tomeu: ah no, sorry, I was thinking of disabling compression, but that's another env var
15:11 tomeu: pH5: have you already looked at research papers about compressing weights with Huffman? maybe some share some implementation details that match this HW
15:33 pH5: tomeu: a few look vaguely similar, but I haven't found a close match yet.
16:03 tomeu: pH5: have you considered already that there may be some requantization going on?