03:32gfxstrand[d]: dwlsalmeida[d]: That looks kinda like a stride/tiling problem. Or at least the first few rows could be explained that way.
03:34gfxstrand[d]: The rest is harder to guess
05:35avhe[d]: if anything i'd say the stride is ok since we have these sort of "diagonal gradients" which ime are typical of hevc corruptions
05:36avhe[d]: if there was a stride issue they would get messed up
05:39avhe[d]: there are also these b&w checkerboard patterns which look like an idct or whatever
05:40airlied[d]: it might just be garbage memory from a previous run,
05:50avhe[d]: would be interesting to see if the iframe of this file decodes or not
08:19airlied[d]: _lyude: I'm starting to wonder if we should turn off immediate vblank disables on nouveau to see if it helps some of those wierd races we saw before
08:37g0z: has anyone seen menus in some apps not appearing correctly with Nouveau? like I'll open a menu and it'll be tiny until I mouse over each entry on the menu. (specifically in this case Transmission a bittorrent client, GTK version, fedora 41 pkg). also some dialogs will have transparent bits in it where they weren't before.
08:45avhe[d]: dwlsalmeida[d]: i haven't checked the dpb management code as this stuff gives me headaches (and i don't believe your issue looks like a dpb problem), but here are some comments on the rest:
08:45avhe[d]: - your implementation is missing scaling list related code atm, not sure if this is a issue for your problematic samples
08:45avhe[d]: - same this with long references stuff, though i doubt this is needed yet
08:45avhe[d]: - <https://gitlab.freedesktop.org/dwlsalmeida/mesa/-/blob/nvk-vulkan-video-h265/src/nouveau/vulkan/video/decode/h265.rs?ref_type=heads#L507-513>
08:45avhe[d]: these appear to be missing the right-shift by 8
08:45avhe[d]: - <https://gitlab.freedesktop.org/dwlsalmeida/mesa/-/blob/nvk-vulkan-video-h265/src/nouveau/vulkan/video/decode/h265.rs?ref_type=heads#L827-832>
08:45avhe[d]: these are supposed to be the offset *within the filter buffer*, but this seems to push the gpu addresses rather?
08:45avhe[d]: also c/p error on HevcBsdCtrlOffset
08:45avhe[d]: - <https://gitlab.freedesktop.org/dwlsalmeida/mesa/-/blob/nvk-vulkan-video-h265/src/nouveau/vulkan/video/decode/h265.rs?ref_type=heads#L725-731>
08:45avhe[d]: this would overwrite the column widths with the row heights? and i think you have an ob-1 with the xxx_minus1 business
08:45avhe[d]: also generally this function doesn't seem to match what i had on tegra. the first part of the code seems based on what i was writing to the `tile_thing` variable, at a 0x380 offset from the start of the tile sizes buffer, but yours writes to the same location as the main tile sizes
08:45avhe[d]: finally perhaps make sure that this works when tile sizes are disabled in the pps (ie, that num_tile_columns/rows_minus1, and column_width_minus1/row_height_minus1 are non-zero), i see that i have a special path for that in my code
08:45avhe[d]: - <https://gitlab.freedesktop.org/dwlsalmeida/mesa/-/blob/nvk-vulkan-video-h265/src/nouveau/vulkan/video/decode/h265.rs?ref_type=heads#L992-994>
08:45avhe[d]: c/p error, i believe this can be removed
08:45avhe[d]: - <https://gitlab.freedesktop.org/dwlsalmeida/mesa/-/blob/nvk-vulkan-video-h265/src/nouveau/vulkan/video/decode/h265.rs?ref_type=heads#L1027>
08:45avhe[d]: afaik this is something to be used when performing dithering to 8-bit, you're supposed to switch between 0 and 1 between frames, presumably to avoid banding. i imagine using 2 bypasses or disables this step
09:09airlied[d]: scaling lists might be the answer for that one
09:09airlied[d]: I've vague memories of junk like that at some point
10:47avhe[d]: daniel said scaling lists were disabled for this one so probably not
10:49avhe[d]: i think the sao/bsdctrl and coloc/filter buffer stuff would be the most important issue presently (and also easiest to fix)
11:50dwlsalmeida[d]: The very first frame doesn’t decode for this file btw, since you asked
11:51dwlsalmeida[d]: DPB works, in fact I can decode a 300 frame file just fine
11:53dwlsalmeida[d]: This branch has the bsd and sao buffers in their own memories, but I’ve already tried to place them together with the filter buffer, it doesn’t work either
11:53dwlsalmeida[d]: (Your math for the offsets also doesn’t match the blob btw)
11:55dwlsalmeida[d]: Scaling lists and long references are not used in most of the broken stuff, specially references are not the problem here because even I frames are broken
11:56dwlsalmeida[d]: I have hardcoded the blob values for BsdCtrl and Sao btw, it doesn’t work either
11:59dwlsalmeida[d]: You’re right, there’s a bug on the loop for the tile size things, however most of the time, uniform tiles are being used so the branch you commented on is not taken. This is the case for this video too
12:04dwlsalmeida[d]: So, when uniform tiles are used, my code is essentially equal to yours here:
12:04dwlsalmeida[d]: https://github.com/averne/FFmpeg/blob/nvtegra/libavcodec/nvtegra_hevc.c#L191
12:04dwlsalmeida[d]: (also, if the tile buffer was broken, AMP_B or anything else really wouldn't decode)
12:20avhe[d]: dwlsalmeida[d]: i'm not sure you can do that, nvidia comments says this should be aggregated in the filter buffer <https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/video/nvdec_drv.h#L385-L386>
12:20avhe[d]: then the values in the picture setup indicate the relevant offsets within that buffer
12:21dwlsalmeida[d]: yeah you're right, but what I am trying to say is that I also hardcoded the values from the blob
12:21dwlsalmeida[d]: and it also doesn't work
12:21avhe[d]: but did your filter buffer have the same size as they did when you tried that
12:22avhe[d]: if not the offsets would point outside the buffer
12:22dwlsalmeida[d]: I assume that the kernel driver would report mmu faults if so, but not sure
12:23dwlsalmeida[d]: I just assumed that the filter buffer would be large enough
12:23avhe[d]: it's possible the microcode validates stuff prior to launching the hardware
12:25avhe[d]: the sample you sent was quite high resolution so i could imagine it wasn't
12:26avhe[d]: maybe you could find from the tracer the size of the map containing the filter buffer, then hardcode in your impl accordingly
12:26dwlsalmeida[d]: is there a way to do that with the tracer?
12:27avhe[d]: i posted some guidelines back then https://discord.com/channels/1033216351990456371/1034184951790305330/1301537120099500063
12:28avhe[d]: but if you can post a dump i can take a look
12:31dwlsalmeida[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1324716713341751419/push_blob_AMP_A_Samsung_7.txt?ex=67792a14&is=6777d894&hm=2b521c09df0b77e375f3af966b51165c969fc58f2b9685f10be3cf981862d42c&
12:37dwlsalmeida[d]: avhe[d]: All our buffers are sized for 8K basically, so yes it's big enough
12:38avhe[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1324718447594045511/image.png?ex=67792bb1&is=6777da31&hm=bdce0888bb22fb234a6009314b0f107dd724d57c9b3ecae9a945350192bcb5b9&
12:38avhe[d]: i believe that would be the one
12:39dwlsalmeida[d]: dwlsalmeida[d]: that's because we don't have any data for the current stream when the client is allocating the VkMemories for the VkVideoSession object, i.e.:
12:39dwlsalmeida[d]: case VK_VIDEO_CODEC_OPERATION_DECODE_H265_BIT_KHR: {
12:39dwlsalmeida[d]: const int CTU_SIZE = 64;
12:39dwlsalmeida[d]: const int MB_SIZE = 16;
12:39dwlsalmeida[d]: const int aligned_w = align(vid->vk.max_coded.width, CTU_SIZE);
12:39dwlsalmeida[d]: const int aligned_h = align(vid->vk.max_coded.height, CTU_SIZE);
12:39dwlsalmeida[d]: size_t coloc_size = (aligned_w * aligned_h) + (aligned_w * aligned_h / MB_SIZE);
12:39dwlsalmeida[d]: size_t filter_size = 480 * aligned_h;
12:39dwlsalmeida[d]: size_t colmv_size = aligned_w * aligned_h / MB_SIZE;
12:40dwlsalmeida[d]: hmm..interesting
12:41dwlsalmeida[d]: this says `0x1000000`, which is indeed larger than ours:
12:41dwlsalmeida[d]: p/x filter_size
12:41dwlsalmeida[d]: $1 = 0x3c0000
12:44dwlsalmeida[d]: in any case, the blob size doesn't match the math on your tegra code btw:
12:44dwlsalmeida[d]: >>> hex ((480 + 3840 + 60)*1600)
12:44dwlsalmeida[d]: '0x6aef00'
12:45avhe[d]: dwlsalmeida[d]: yep but it seems the intra_top buffer is included in that big map
12:45avhe[d]: so in reality it's probably smaller than that
12:45avhe[d]: in any case it's worth trying to match the blob and see if that fixes anything
12:47avhe[d]: dwlsalmeida[d]: yeah it would be nice to see the expressions used in nvcuvid... and also to understand whether that actually matters
13:07dwlsalmeida[d]: yeah, even with `filter_size = 0x1000000` and
13:07dwlsalmeida[d]: nvh265.HevcSaoBufferOffset = 3800;
13:07dwlsalmeida[d]: nvh265.HevcBsdCtrlOffset = 34200;
13:07dwlsalmeida[d]: We still get the same thing
13:08avhe[d]: hmm shame i was really hoping that would be it
13:09avhe[d]: how complicated would it be if i wanted to try that out myself? build linux with the nvdec patch i've seen floating around, build your branch, test with ffmpeg?
13:10dwlsalmeida[d]: yeah, except I am testing with gstreamer instead, but I can give you a branch
13:10dwlsalmeida[d]: debugging this with ffmpeg is still another step I have to do...
13:12dwlsalmeida[d]: it's a mountain of work and I am usually stuck on these puzzling errors so I didn't have the time to get ffmpeg working too yet
13:12avhe[d]: i might give it a try in the weekend
13:12dwlsalmeida[d]: yeah alright, that would help!
13:14dwlsalmeida[d]: I'll poke Tony on the vulkan video tsg call today about these buffers, not sure how much he'll be able to disclose anyways because of all the red tape around this
13:14dwlsalmeida[d]: but lets see, he's the nvidia guy for vulkan video, he might know the answer
13:16avhe[d]: yeah, a set of expressions based on video parameters would be nice
13:17avhe[d]: hopefully he can move things, considering the published headers already contain this sort of macro for certain things
13:17dwlsalmeida[d]: fyi, removing this doesn't solve it either:
13:17dwlsalmeida[d]: https://gitlab.freedesktop.org/dwlsalmeida/mesa/-/blob/nvk-vulkan-video-h265/src/nouveau/vulkan/video/decode/h265.rs?ref_type=heads#L992-994
13:19avhe[d]: yeah i don't see how a deblocking bug could lead to such a drastic visual corruption 😄
13:21dwlsalmeida[d]: actually this sao buffer is apparently a part of the loop filter IIUC
13:22dwlsalmeida[d]: so yeah, it's related
13:22dwlsalmeida[d]: ah, I forgot to say one thing
13:23dwlsalmeida[d]: if you remove all of the implementation, you get the same error
13:23dwlsalmeida[d]: something is making this bail out so early, that I don't think it's even looking at the picture parameters
13:40avhe[d]: shot in the dark, maybe try adding a 0 before the main bitstream? hevc slices are prefixed by a 3-byte 000001 startcode, but nvdec might want 4 bytes instead
13:40avhe[d]: see <https://gitlab.freedesktop.org/gstreamer/gstreamer/-/blob/main/subprojects/gst-plugins-bad/gst-libs/gst/vulkan/gstvkdecoder-private.c#L1191> and <https://github.com/averne/FFmpeg/blob/e932809b8fc4c2a434cc2f2360e0e12de901b268/libavcodec/vulkan_decode.c#L212> vs <https://github.com/averne/FFmpeg/blob/e932809b8fc4c2a434cc2f2360e0e12de901b268/libavcodec/nvtegra_hevc.c#L617-L623>
13:41avhe[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1324734337421152309/image.png?ex=67793a7e&is=6777e8fe&hm=b7d93cf43dbbd7b120f20ab51f73f93d49860be25fe38bd446fb5e3cc172e371&
13:43dwlsalmeida[d]: wait, really?
13:43dwlsalmeida[d]: we do start with a 00 00 01 start code
13:43avhe[d]: ¯\\_(ツ)\_/¯
13:44avhe[d]: i remember testing whether this was actually required and it looked like not
13:45dwlsalmeida[d]: nope
13:45dwlsalmeida[d]: same error
13:46dwlsalmeida[d]: (also, I do have some 20% of the tests passing, if the 00 00 01 start code didn't work, that wouldn't be possible)
13:46avhe[d]: yeah guess so
13:49dwlsalmeida[d]: funny, AMP_B works even with 00 00 00 01, I guess it just discards everything until it reads the actual 3 byte thing
13:54dwlsalmeida[d]: Is there some sort of EOS marker ?
13:54dwlsalmeida[d]: I remember someone from nvidia saying that the hardware would bail if it didn’t read that in h264
14:30dwlsalmeida[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1324746646415867964/Screenshot_2025-01-03_at_11.30.04.png?ex=677945f5&is=6777f475&hm=494c447c31b0fa8b4ebc68bc23304de40c39c13e235f7e903a20e1313a6dc504&
14:30dwlsalmeida[d]: the bottom part is leftover from a previous decode
15:10avhe[d]: dwlsalmeida[d]: not that i know for hevc
16:04_lyude[d]: airlied[d]: not sure where the message about the vblanks went but I'm down to try that
18:34dwlsalmeida[d]: Is there a way to choose which GPU is used through the ICD stuff?
19:01dwlsalmeida[d]: Just found out that things don’t work as well on Ada vs Turing sadly :/
19:12HdkR: dwlsalmeida[d]: https://docs.mesa3d.org/envvars.html#vulkan-mesa-device-select-layer-environment-variables Is handy for that
19:13dwlsalmeida[d]: HdkR: ty, looks exactly like what I wanted!
19:48dwlsalmeida[d]: Btw I just realized that this exists:
19:48dwlsalmeida[d]: #define NVC9B0_ERROR_NONE (0x00000000)
19:48dwlsalmeida[d]: #define NVC9B0_OS_ERROR_EXECUTE_INSUFFICIENT_DATA (0x00000001)
19:48dwlsalmeida[d]: #define NVC9B0_OS_ERROR_SEMAPHORE_INSUFFICIENT_DATA (0x00000002)
19:48dwlsalmeida[d]: #define NVC9B0_OS_ERROR_INVALID_METHOD (0x00000003)
19:48dwlsalmeida[d]: #define NVC9B0_OS_ERROR_INVALID_DMA_PAGE (0x00000004)
19:48dwlsalmeida[d]: ...
19:48dwlsalmeida[d]: And
19:48dwlsalmeida[d]: #define NVC9B0_H264_VLD_ERR_SEQ_DATA_INCONSISTENT (0x00004001)
19:48dwlsalmeida[d]: #define NVC9B0_H264_VLD_ERR_PIC_DATA_INCONSISTENT (0x00004002)
19:48dwlsalmeida[d]: #define NVC9B0_H264_VLD_ERR_SLC_DATA_BUF_ADDR_OUT_OF_BOUNDS (0x00004100)
19:48dwlsalmeida[d]: #define NVC9B0_H264_VLD_ERR_BITSTREAM_ERROR (0x00004101)
19:48dwlsalmeida[d]: #define NVC9B0_H264_VLD_ERR_CTX_DMA_ID_CTRL_IN_INVALID (0x000041F8)
19:48dwlsalmeida[d]: ...
19:49dwlsalmeida[d]: I wonder if the NVIDIA people know how we can access these
19:53Jasper[m]: Hmmm, to come back to the general issues on gm20b I've been having. Any idea what this error means times a billion: "detected fb_set_par error, error code: -16"
19:53Jasper[m]: iirc mesa-utils cannot find a display either even if I do have display
23:12dwlsalmeida[d]: Hey Averne, fyi I pushed my latest the H265 code to the MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31867
23:18avhe[d]: thanks i'll try to play around with it