21:09dwlsalmeida[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1305278276545286144/NvidiaGSPLogs.zip?ex=673272a0&is=67312120&hm=4583a5f5591ae915f29483a1b21b88f6849266668bc00c6db87a13e5114abff4&
21:09dwlsalmeida[d]: skeggsb9778: skeggsb9778 Hi there 🙂
21:09dwlsalmeida[d]: There you go, all four logs.
21:09dwlsalmeida[d]: I've included the logs for 4 different videos btw, one of which decodes OK (`NvidiaLog_I`) . `I_P` and `AUD_2frames` decode but frame 1 is identical to frame 0, and `AUD_full` crashes
21:10airlied[d]: zhiwanglinux: hey so when I applied that patch I get 0x1f error code back from the ADD_VGPU_TYPE rpc
21:11dwlsalmeida[d]: dwlsalmeida[d]: I am on 6.12-rc6 btw
21:11dwlsalmeida[d]: gfxstrand[d]: does NVK work on Ada Lovelace?
21:11gfxstrand[d]: Yup. That's what's in my laptop
21:11dwlsalmeida[d]: I will need a new card for the AV1 stuff, thinking about a 4060
21:11dwlsalmeida[d]: do you think that's good?
21:12gfxstrand[d]: Should be fine
21:12dwlsalmeida[d]: ack ty
21:27zhiwanglinux[d]: airlied[d]: Replied in the PM.
21:37skeggsb9778[d]: dwlsalmeida[d]: all the logs (even NvidiaLog_I) show nvdec0 errors
21:37skeggsb9778[d]: are you sending method 0x44 somewhere, for some reason?
21:38dwlsalmeida[d]: skeggsb9778[d]: ah yes, they all print this do dmesg btw:
21:38dwlsalmeida[d]: [ 339.940285] nouveau 0000:01:00.0: gsp: rc engn:00000013 chid:32 type:68 scope:1 part:233
21:38dwlsalmeida[d]: [ 339.940295] nouveau 0000:01:00.0: fifo:6606c307:0004:0020:[gst-launch-1.0[3133]] errored - disabling channel
21:38dwlsalmeida[d]: [ 339.940300] nouveau 0000:01:00.0: gst-launch-1.0[3133]: channel 32 killed!
21:38dwlsalmeida[d]: 0x44..
21:40dwlsalmeida[d]: while I run this again, what is 0x44? Is it something `WAIT_FOR_IDLE` related?
21:43skeggsb9778[d]: i just see a message about class 0xc4b0 mthd 0x44 not being valid
21:43skeggsb9778[d]: right before the crash
21:43skeggsb9778[d]: no idea if it's related 🙂
21:43dwlsalmeida[d]: ok, I can debug this
21:52dwlsalmeida[d]: yeah, I don't think there's anything `0x44` in `c5b0.h`
21:52dwlsalmeida[d]: subchannel 4 is previously assigned to `cls_copy`, I wonder if I need to reassign that once I am done decoding a frame?
21:53dwlsalmeida[d]: maybe somebody else is trying to submit method `0x00000044` and it's being sent to the video decoder by mistake
21:54skeggsb9778[d]: why are you using subchannel 4?
21:59dwlsalmeida[d]: IIRC that is what the blob was using
22:03dwlsalmeida[d]: We are using a tracer written by avhe[d]
22:03dwlsalmeida[d]: Says subchannel 4:
22:03dwlsalmeida[d]: 8 Forwarding memory access for inst "mov dword ptr [rcx + 0x8c], edx" (14 -> 0x7c1f9f03208c)
22:03dwlsalmeida[d]: 7 Ioctl on 36 (/dev/nvidiactl): req 0xc030462b (type 70 'F', dir 3, nr 0x2b, size 0x30)
22:03dwlsalmeida[d]: 6 Alloc: class 0xc4b0 (NVC4B0_VIDEO_DECODER), root 0xc1d00141, parent 0x80000012, handle 0x80000030, flags 0, alloc params: (nil), size 0
22:03dwlsalmeida[d]: 5 Ioctl on 36 (/dev/nvidiactl): req 0xc020462a (type 70 'F', dir 3, nr 0x2a, size 0x20)
22:03dwlsalmeida[d]: 4 Control: cmd 0x906f0101 (NV906F_CTRL_GET_CLASS_ENGINEID), handle 0x80000012
22:03dwlsalmeida[d]: 3 Intercepted access @ 0x7c1f5c862994 to fake 0x7c1f9f03108c -> real 0x7c1f9f03208c (/dev/nvidia0)
22:03dwlsalmeida[d]: 2 Nvdec kickoff detected with offset 0x4
22:03dwlsalmeida[d]: 1 Gpfifo entries: 0x20030004 0x00001001 -> off 0x120030004, len 0x4
22:03dwlsalmeida[d]: 0 0x000000: 20018000 0000c4b0 20018081 5ffffffe
22:03dwlsalmeida[d]: 1 Method 000000 (0x20018000): type 1, size 1, subchannel 4, reg 0000000000 (NVA16F_SET_OBJECT)
22:03dwlsalmeida[d]: 3 Method 0x0002 (0x20018081): type 1, size 1, subchannel 4, reg 0x00000204 (NVC7B0_SET_WATCHDOG_TIMER)
22:03dwlsalmeida[d]: 5 Gpfifo entries: 0x20054018 0x00001801 -> off 0x120054018, len 0x6
22:03dwlsalmeida[d]: 6 0x000000: 20050017 20029000 00000001 00040005
22:03dwlsalmeida[d]: 7 0x000010: 00000000 00000001
22:03dwlsalmeida[d]: 8 Method 000000 (0x20050017): type 1, size 5, subchannel 0, reg 0x0000005c (unknown)
22:03dwlsalmeida[d]: 9 0x20029000
22:03dwlsalmeida[d]: 10 0x00000001
22:03dwlsalmeida[d]: 11 0x00040005
22:03dwlsalmeida[d]: 12 0000000000
22:03dwlsalmeida[d]: 13 0x00000001
22:03dwlsalmeida[d]: i.e.:
22:03dwlsalmeida[d]: 1 Method 000000 (0x20018000): type 1, size 1, subchannel 4, reg 0000000000 (NVA16F_SET_OBJECT)
22:03dwlsalmeida[d]: although this is `NVA16F_SET_OBJECT`, but here I have `NV906F_SET_OBJECT`
22:06skeggsb9778[d]: yeah no, it's probably fine (dev_ram.ref.txt says subchannel is ignored on non-gr runlists)
22:08skeggsb9778[d]: "4" is generally used for copy engines, and i wondered if something was going on there
22:13avhe[d]: dwlsalmeida[d]: those names are hardcoded for my card but it doesn't really matter
22:14dwlsalmeida[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1305294540281872434/Pushbuf_I_P.txt?ex=673281c5&is=67313045&hm=9ba2b2e4888899c69c14f306223049d19f83440f5480baf2c98a621844979abc&
22:14dwlsalmeida[d]: Btw, I have dumped all the pushbufs for the `I_P` video ^
22:14dwlsalmeida[d]: airlied[d] can you guess where this `0x44` comes from? I can't see anything in this file
22:16avhe[d]: i think that would be `mthd 0110 NVC597_WAIT_FOR_IDLE`, since 0x44<<2 == 0x110
22:17dwlsalmeida[d]: But the error says:
22:17dwlsalmeida[d]: > i just see a message about class 0xc4b0 mthd 0x44 not being valid
22:17dwlsalmeida[d]: this seems unrelated with `c4b0`?
22:18dwlsalmeida[d]: but yeah, there's this extra WAIT_FOR_IDLE thing that is different between NVK and the blob. I have 0 clue where that comes from..
22:22skeggsb9778[d]: avhe[d]: yeah, i wondered if it might mean that
22:23skeggsb9778[d]: if something in nvk is sending that to a video channel, it might be worth *not* doing that and seeing if it helps
22:24skeggsb9778[d]: i had a look at the dump above, and it seems to get a bit scrambled after/around the nvdec error
22:25dwlsalmeida[d]: you mean this?
22:25dwlsalmeida[d]: NVDEC - Error code 0[0x00000000] HDR 20010044 subch 0 NINC
22:25dwlsalmeida[d]: mthd 0110 NVC597_WAIT_FOR_IDLE
22:25dwlsalmeida[d]: .V = (0x0)
22:43airlied[d]: probably try killing with WFI and seeing if it helps
22:49dwlsalmeida[d]: ^ can you expand a bit on this?
22:50airlied[d]: is your latest tree anywhere?
22:50dwlsalmeida[d]: Interesting, if we comment this out, the error goes away:
22:50dwlsalmeida[d]: } else if (barriers & NVK_BARRIER_RENDER_WFI) {
22:50dwlsalmeida[d]: /* If this comes from a vkCmdSetEvent, we don't need to wait */
22:50dwlsalmeida[d]: // if (wait)
22:50dwlsalmeida[d]: // P_IMMD(p, NVA097, WAIT_FOR_IDLE, 0);
22:50dwlsalmeida[d]: } else {
22:51dwlsalmeida[d]: nothing on dmesg too
22:51airlied[d]: yup it's likely you shouldn't emit that on video queues at all
22:51dwlsalmeida[d]: airlied[d]: Let me get you a link
22:52airlied[d]: I don't think we should emit any of the idle or cache invlidation on the video queue
22:53airlied[d]: at least on radv we just nop out of pipeline barrier on video
22:54dwlsalmeida[d]: ok, I now get two frames in `I_P` and both are correct
22:54dwlsalmeida[d]: progress, finally!
22:58dwlsalmeida[d]: The pipeline barrier is probably GStreamer trying to do an image layout transition on the frame it has just decoded, why shouldn't we emit this WAIT_FOR_IDLE thing?
22:58dwlsalmeida[d]: i.e.:
22:58dwlsalmeida[d]: new_layout = (self->layered_dpb || pic->dpb) ?
22:58dwlsalmeida[d]: VK_IMAGE_LAYOUT_VIDEO_DECODE_DST_KHR :
22:58dwlsalmeida[d]: VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR;
22:58dwlsalmeida[d]: gst_vulkan_operation_add_frame_barrier (priv->exec, pic->out,
22:58dwlsalmeida[d]: VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR,
22:58dwlsalmeida[d]: VK_ACCESS_2_VIDEO_DECODE_WRITE_BIT_KHR, new_layout, NULL);
22:59airlied[d]: because it's not a thing the video queue needs or if it does need it, it's not in the form it requires
23:02dwlsalmeida[d]: dwlsalmeida[d]: btw issue seems resolved, but if you want to browse anything else, I am currently hacking your C code:
23:02dwlsalmeida[d]: https://gitlab.freedesktop.org/dwlsalmeida/mesa/-/blob/nvk-vulkan-video-dave/src/nouveau/vulkan/nvk_video.c?ref_type=heads#L425
23:07dwlsalmeida[d]: skeggsb9778[d]: thanks a lot for your help, I would have never suspected this `WAIT_FOR_IDLE` business otherwise
23:25skeggsb9778[d]: no problem