12:08 tomeu: phh: btw, I can confirm that the buffer pointed to by REG_DPU_RDMA_RDMA_BS_BASE_ADDR contains the biases
12:09 phh: Ah bs is biases, cool
15:42 tomeu: so, I got to that dreaded point in REing in which sending bit-identical buffers to the HW results in a hang
15:42 tomeu: phh: any ideas? https://paste.debian.net/1309916/
15:43 tomeu: this is what the kernel prints:
15:43 tomeu: [ 29.149629] RKNPU: job: ffff00010210ab00, wait_count: 1, continue wait: 0, commit elapse time: 6131398us, wait time: 6131402us, timeout: 6000000us... (full message at <https://matrix.org/_matrix/media/v3/download/matrix.org/jKDPPDnOUoNYlOWwDDLvfDbd>)
15:44 tomeu: that causes a hang that requires rebooting the machine
15:44 phh: is it possible you omit some leading 0 0 0 after last instr?
15:44 phh: it's possible 0 0 0 instruction has some magical meaning to pc
15:45 tomeu: could be, indeed, that was causing trouble in the VSI NPU
15:45 tomeu: I haven't paid attention to padding yet, and my scripts trim the zeroes from the start and end of the buffers
15:55 tomeu: no luck so far
15:56 phh: why is the input all -?
15:57 tomeu: because the blob has zeroes in what I think is the input buffer
15:57 tomeu: REG_CNA_FEATURE_DATA_ADDR
15:57 tomeu: haven't figured out why yet
15:58 phh: you didn't miss any cpu2device memory barrier?
15:58 tomeu: not that I know, but I guess I should be dumping other ioctls besides submit
15:59 phh: well my test code does ret = ioctl(fd, DRM_IOCTL_RKNPU_MEM_SYNC, &m_sync);
15:59 phh: of all gems basically
15:59 phh: with RKNPU_MEM_SYNC_TO_DEVICE
15:59 tomeu: yeah, I do that on unmap, after writing
15:59 phh: ah you unmap. you don't close though?
15:59 tomeu: nope
15:59 phh: fwiw I don't unmap
16:00 tomeu: ah, I don't really unmap either, I just do that in the unmap callback in gallium
16:00 tomeu: I only sync there
16:00 tomeu: all my BOs have RKNPU_MEM_KERNEL_MAPPING, wonder if that could ever be a problem
16:00 tomeu: hmm, guess it could
16:01 phh: I don't really see how, but yeah, the only kernel_mapping i have is for task_obj_addr
16:01 tomeu: and that one sin't accessed by the HW, so I guess it could be it
16:08 phh: yeah it's reasonable it's the only one with kernel mapping, it's just that I fail to see why having a kernel mapping would break this
16:08 tomeu: haven't checked, but I would imagine their driver skipping something needed for HW-accessible buffers
16:09 tomeu: doesn't seem to be it though :(
16:20 phh: you're using three cores, fwiw my test code only used one
16:20 phh: so
16:20 phh: .subcore_task = {
16:20 phh: // Only use core 1, nothing for core 2/3
16:20 phh: {
16:20 phh: .task_start = 0,
16:20 phh: .task_number = 1,
16:20 phh: }, { 0, 0}, {0, 0},
16:20 phh: },
16:21 phh: and .core_mask = 1,
16:22 phh: I also have task_base_addr = instr_dma (so phys address of task)
16:25 phh: I don't understand how task_base_addr can be null, because driver does
16:25 phh: 274: REG_WRITE(args->task_base_addr, RKNPU_OFFSET_PC_DMA_BASE_ADDR);
17:07 tomeu: I don't think I'm using 3 cores, I checked in the kernel, and rknpu_job_subcore_commit_pc gets called only for core 0
17:07 tomeu: why do you say that task_base_addr is NULL?
17:09 tomeu: I think next I should add the rest of the ioctls to the trace logs, and probably also all register writes that the kernel driver does
17:10 phh: well in your pastebin ` .task_base_addr = 0x0,`
17:10 phh: which is null in my book
17:12 tomeu: ah, I think it's correct for that to be NULL
17:12 tomeu: For feature DMA, weight DMA, DPU DMA, PPU DMA, the address
17:12 tomeu: is set as offset address. Final address appear on AXI bus is base
17:12 tomeu: address + offset address.
17:12 tomeu: wonder why they add this kind of functionality
17:13 phh: dirt-cheap dirt-broken iommu?
17:13 tomeu: ah, makes sense
17:13 tomeu: this si supposed to work without a iommu as well
17:13 phh: not really -_-'
17:13 phh: especially since it's still only 32bits
17:14 tomeu: well, the whole memory management is bonkers
17:14 tomeu: guess the SW is written by HW engineers
17:14 phh: I don't understand why I have `tasks[0].regcmd_addr = instr_dma` `tasks[0].regcmd_addr = instr_dma` and result was working iirc...
17:15 phh: that must be wrong, I agree that task_base_addr = instr_dma would just break everything according to the documentation
17:20 tomeu: tasks[0].regcmd_addr = instr_dma is ok, right?
17:20 tomeu: task_base_addr = instr_dma sounds wrong
17:20 phh: yeah I agree it sounds very wrong
17:20 tomeu: but maybe it is disabled by some bit
17:20 phh: maybe my memory is faulty and I never managed to get an IRQ back...
17:21 tomeu: but Jasbir got it working, he was comparing the contents of the output BO
17:24 phh: did you send RKNPU_ACT_RESET?
17:24 tomeu: yep
17:24 tomeu: https://gitlab.freedesktop.org/tomeu/mesa/-/tree/rocket?ref_type=heads
17:25 tomeu: have pushed here
17:25 phh: I don't understand .task_number = 3, your instructions only have one task
17:25 tomeu: I will add ioctls and register writes to the traces
17:25 tomeu: and see
17:25 tomeu: fortunately, there are far fewer ioctl calls and registers than for VSI... shouldn't be too bad