01:21 dwlsalmeida[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1331071054096891914/Screenshot_2025-01-20_at_22.15.04.png?ex=67904805&is=678ef685&hm=a13afa155e54893548f245b8cd1fbe25387d548fa1fa8fbbc4fb0f33fd85d3d6&
01:21 dwlsalmeida[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1331071054436896891/Screenshot_2025-01-20_at_22.16.26.png?ex=67904805&is=678ef685&hm=941162f8bd5d250ea7cd1986bdfcdc9d70fd16db5a752915005d547ad85f29a2&
01:21 dwlsalmeida[d]: avhe[d]: this..doesn't look like what we had in mind earlier today, `0101` is width and height in sbs, the rest is... frankly beyond me
01:24 dwlsalmeida[d]: the only pattern I could find was i.e. `0xbb - 0x02` or `0x1ba - 0xbb`which seems to be always `tile_size - 2`
01:25 dwlsalmeida[d]: I confirmed that the PROB_TAB_READ stuff is indeed the default prob table (at least for frame 0, of course)
01:26 dwlsalmeida[d]: also `tile_offset` is not being considered at all
01:27 dwlsalmeida[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1331072639619436584/av14tiles.ivf?ex=6790497f&is=678ef7ff&hm=23d6916a9cf013e1be62e6c2a31d3517f70a50a58a0e4547604525da9c3e01ae&
01:27 dwlsalmeida[d]: This is the input FYI
02:47 rinlovesyou[d]: tiredchiku[d]: 6.2.4 here but it runs!
03:01 tiredchiku[d]: rinlovesyou[d]: do have that working here too, I explicitly need testing on 6.2.5
03:01 rinlovesyou[d]: ah
03:29 tiredchiku[d]: from what I can tell it might be a qt bug
03:29 tiredchiku[d]: unable to open konsole in gnome shell either
04:03 tiredchiku[d]: nvm
04:03 tiredchiku[d]: mangohud env var related jank
05:57 tiredchiku[d]: :Thonk:
05:57 tiredchiku[d]: does the nvidia prop driver have a separate CTS mode?
05:58 tiredchiku[d]: was investigating a gamescope issue with present-wait + wsi, and on prop, `dEQP-VK.wsi.{xlib,xcb,wayland}.present_id_wait.wait.two_swapchains` failed
05:58 tiredchiku[d]: or rather
05:58 tiredchiku[d]: froze
09:05 avhe[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1331187924716359700/image.png?ex=6790b4dd&is=678f635d&hm=d90590f35ab7a519908dd44a6f1415833acb3309583ca7c7c9bf4ae20dfb81f2&
09:05 avhe[d]: dwlsalmeida[d]: your data matches exactly with what ffmpeg calculates into "tile_group_info"
09:05 avhe[d]: the first 2 u8's are width/height_in_sbs
09:06 avhe[d]: then the next u32 (at offset 2, there's no padding) is tile_offset in the pic above
09:07 avhe[d]: and the final u32 (offset 6) is tile_offset+tile_size
09:09 avhe[d]: it looks like Vulkan doesn't directly give you the data you'd need, I'd look into the code here to derive it <https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/av1dec.c#L426-L469>
09:12 avhe[d]: wait actually the data should be in VkVideoDecodeAV1PictureInfoKHR.pTileOffsets/pTileSizes
09:13 avhe[d]: if I'm reading the ffmpeg vulkan code right
15:31 dwlsalmeida[d]: yeah, according to Nicolas, the offset is userspace dependent
15:37 dwlsalmeida[d]: btw, it's a 10byte padding at the end, right?
15:38 dwlsalmeida[d]: judging by how the last 10 are zeroed out on each row
15:39 dwlsalmeida[d]: or simply align(16), as you said previously
16:16 avhe[d]: dwlsalmeida[d]: those are offsets within the compressed bitstream so yeah I imagine
16:17 avhe[d]: dwlsalmeida[d]: It's 6 bytes, the 4 zeroes you are seeing before that are the upper 16 bits of the tile start and end offsets
16:18 avhe[d]: it would be clearer if you did a byte-wise dump instead of dwords probably, since the tile start/end u32 are not in-between dwords
16:47 avhe[d]: are sitting in-between*
17:08 dwlsalmeida[d]: Lynne, are you around?
17:09 dwlsalmeida[d]: ^ do we need to wait on the previous frame before decoding the current one?
17:09 dwlsalmeida[d]: I assume that we can just submit as many jobs as we want, and the semaphores will handle the synchronization for us
17:12 Lynne: we don't wait on the previous frame, we use semaphores
17:12 Lynne: the only wait happens once we've submitted all frames in the ring buffer for decoding and we need a free slot
17:14 dwlsalmeida[d]: in which case, you wait on the most recent submission fence?
18:05 avhe[d]: i don't think nvdec can switch between jobs anyway so this is not a problem on current cards
18:05 avhe[d]: might be something to think about for 5090 support if you want to submit to the 2 engines instances
18:17 dwlsalmeida[d]: it's not about switching between jobs, but about letting the CPU parse the bitstream as fast as it can, without waiting
18:33 dwlsalmeida[d]: Lynne
18:33 dwlsalmeida[d]: I'm writing the gstreamer av1 vulkan code, I was hoping to return the first non-used entry in the dpb array, but I just realized that doesn't work, so I am switching to your implementation.
18:33 dwlsalmeida[d]: There is just one thing I didn't understand, when is `vp->ref_slots[j]` filled? is the driver supposed to move `vp->ref_slot` into a given `vp->ref_slots` entry somehow?
18:53 redsheep[d]: dwlsalmeida[d]: Just so you know, edits don't propagate to the irc side
18:54 dwlsalmeida[d]: redsheep[d]: I thought editing something would count as a new submission?
18:54 dwlsalmeida[d]: as if you had typed the whole thing again
18:55 orowith2os[d]: yeah, it resends the entire message
18:55 redsheep[d]: The old bridge did, this one doesn't. It avoids spam by just ignoring it. If you need to get a correction over to someone you're talking to on irc you need a new message
18:55 orowith2os[d]: urp
18:55 orowith2os[d]: I'll shut up
18:56 orowith2os[d]: I wonder when the bridge changed
18:56 redsheep[d]: Few months ago I think? Don't quite remember
18:58 redsheep[d]: I still use edits since at least the discord people won't see my typos but if it's substantial it's worth a new message
19:00 dwlsalmeida[d]: anyways,
19:00 dwlsalmeida[d]: Here's the corrected version:
19:00 dwlsalmeida[d]: Lynne
19:00 dwlsalmeida[d]: i'm writing the gstreamer av1 vulkan code, for the slot_idx, I was hoping to return the first non-used entry in the dpb array, but I just realized that doesn't work, so I am switching to your implementation.
19:00 dwlsalmeida[d]: There is just one thing I didn't understand, when is `vp->ref_slots[j]` filled? is the driver supposed to move `vp->ref_slot` into a given `vp->ref_slots` entry somehow?
19:05 asdqueerfromeu[d]: Since you seem to be moving to AV1, how's the progress with H.264 decoding? How many frames can the driver decode before causing a GPU hang?
19:09 dwlsalmeida[d]: asdqueerfromeu[d]: if not for a couple of minor things, h264 and HEVC could basically be shipped to users already
19:09 karolherbst[d]: orowith2os[d]: last year
19:10 dwlsalmeida[d]: dwlsalmeida[d]: in terms of actually being able to decode things correctly without crashing (i.e.: conformance), they're similar to VA-API, V4L2, etc
19:11 dwlsalmeida[d]: just gotta figure out why downloading the image from the GPU is so slow
19:11 karolherbst[d]: I'm curious.. how much of the decoding needs to be shader assisted?
19:11 dwlsalmeida[d]: asdqueerfromeu[d]: and also, we need an API to force the same GOB height to both the luma and chroma planes
19:13 dwlsalmeida[d]: karolherbst[d]: I don't think shaders are involved at all? From what we discovered, the FW is just talking to a Hantro decoder behind the scenes
19:13 karolherbst[d]: I see
19:14 dwlsalmeida[d]: the same IP used in a lot of other SoCs out there
19:14 karolherbst[d]: it might have been encoding, but I know that some hardware needed to run shaders for certain codecs with certain configs
19:14 karolherbst[d]: but yeah.. makes sense if it's not needed for decoding at all
19:15 karolherbst[d]: though
19:15 karolherbst[d]: wiki claims this: "Cards with this feature set use a combination of the PureVideo hardware and software running on the shader array to decode HEVC (H.265) as partial/hybrid hardware video decoding."
19:15 karolherbst[d]: set E which is... maxwell 2nd gen?
19:15 karolherbst[d]: maxwell in general
19:16 karolherbst[d]: but I think it's the only gen where it applies
19:16 dwlsalmeida[d]: we've been discussing how probability updates will work for VP9 and AV1, this is a bit unclear tbh
19:16 dwlsalmeida[d]: in V4L2 this is all done by the CPU
19:16 dwlsalmeida[d]: here..well, we assume the firmware does it
19:16 karolherbst[d]: I see...
19:17 karolherbst[d]: could be the firmware launching shaders
19:17 karolherbst[d]: could be ... actual shaders
19:17 karolherbst[d]: but yeah.. wouldn't be the first time
19:17 karolherbst[d]: anyway, I was just curious
23:49 Lynne: dwlsalmeida[d]: no, the _least_ recent one
23:51 dwlsalmeida[d]: ^ ah I figured out the slot thing
23:51 dwlsalmeida[d]: Lynne: Makes sense to be the least recent one
23:52 Lynne: the driver doesn't touch user-given structs, we don't copy vp->ref_slot into vp->ref_slots either
23:52 Lynne: each frame we construct new structs for each frame
23:52 Lynne: statefullness is evil when it comes to decoding
23:57 dwlsalmeida[d]: Would you know anything about hantro, by chance?
23:57 dwlsalmeida[d]: I am a bit frustrated today, not going to lie
23:58 dwlsalmeida[d]: Been trying the av1 stuff very aggressively but running out of ideas