IRC Logs of #nouveau on irc.freenode.net for 2025-01-20

07:30 karolherbst: with spam it's better to ping dwfreed ^^
07:31 karolherbst: uhm.. the elon musk thing above the usual suspect
07:31 karolherbst: (ohh.. you already got pinged in another channel, sorry for the noise then)
12:37 dragonfoodye: 397−43−43−69−69−144=29 and 230−43−43−69−69−3=3 so 43+43+69+69+3−230=-3 43+43+69+69-397=-29 so those are the both ends for 138 141-138=3 141+141-138=144 but you do not see the intermediate so to elaborate 43+43+69+69+3=227 512-227=285 and 285-256=29, hence 227+29+29 is 285 285-141-141=3 that is already it when you do 29+29 from first terms you end up in 29+29-26-26 where as if you do
12:37 dragonfoodye: 32+227-256 you end up in 3, as the delta is gotten, 141+3 is now 144/2 searched value 72, maybe not your fauvorite, but magic value just adds three. and we stay in plus optimism. divide by two has several tricks too. smallest possible storage, aka most aggressive compression. you end up having only one memory value to input to calculator.
15:17 dwlsalmeida[d]: dwlsalmeida[d]: @averne ideas?
15:17 dwlsalmeida[d]: dwlsalmeida[d]: I mean avhe[d]
15:37 avhe[d]: dwlsalmeida[d]: if I'm reading this right, the layout is something like this ```c
15:37 avhe[d]: struct nvdec_tile {
15:37 avhe[d]: uint8_t width;
15:37 avhe[d]: uint8_t height;
15:37 avhe[d]: uint32_t tile_start;
15:37 avhe[d]: uint32_t tile_end;
15:37 avhe[d]: } __attribute__((packed, aligned(0x10)));
15:38 avhe[d]: the layout is a bit strange, but definitely aligned to 16
15:38 avhe[d]: that might have been why you were thrown off
15:39 avhe[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1330924638695460904/image.png?ex=678fbfa9&is=678e6e29&hm=0fba8771963c6d420ff7ef0eaf066a2984b0a1de71e26285de38bc407b0d423b&
15:39 avhe[d]: Anyway on Tegra the library is passed data like this (from https://developer.nvidia.com/docs/drive/drive-os/archives/6.0.3/linux/sdk/api_reference/nvmedia_2mm_2inc_2public_2nvmedia__common__decode_8h_source.html)
15:40 avhe[d]: I got the and the tile_info is just the tile start/end positions (I got this from vdpau: <https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/vdpau_av1.c#L290-L291>)
15:44 dwlsalmeida[d]: I am getting a zeroed out buffer
15:44 dwlsalmeida[d]: A bit puzzled to be honest
15:45 dwlsalmeida[d]: Nothing on dmesg either
15:46 dwlsalmeida[d]: I tried uploading some random data to the tile buffer and that doesn’t make any difference
15:47 dwlsalmeida[d]: Submitting a bogus address to the YUV offsets do not do anything either
15:47 dwlsalmeida[d]: So the hardware doesn’t even attempt to decode anything it seems
15:51 avhe[d]: Are you reproducing 1:1 what the blob writes these maps? https://discord.com/channels/1033216351990456371/1034184951790305330/1329132422113001575
15:53 avhe[d]: I actually got started on re'ing the AV1 code a few days ago (that's why I was able to figure out the tile size layout quickly), and it seems there is quite a bit of data to fill out
15:56 dwlsalmeida[d]: avhe[d]: These look different here
15:56 dwlsalmeida[d]: Hold on
15:56 avhe[d]: At a glance, I'm seeing stuff using data from the `AV1_GLOBAL_MODEL` buffer, the `AV1_SET_FILM_GRAIN`, and the `PROB_TAB_WRITE`/`PROB_TAB_READ` maps, on the cpu
15:57 avhe[d]: "using" ie. reading or writing
15:58 avhe[d]: actually there's a situation where a memcpy happens from one of the PROB_TAB_WRITE maps to one of the PROB_TAB_READ
15:59 avhe[d]: might be something you want to do with the copy engine instead
15:59 dwlsalmeida[d]: what I get is https://pastebin.com/afeH8Tcv
15:59 avhe[d]: I also have the buffer sizing/allocation figured out if you need
16:00 dwlsalmeida[d]: avhe[d]: this is the same old "probability table on the cpu" discussion we had last week
16:00 avhe[d]: yeah but in this case it's just a simple copy... so pretty easy to do on the gpu
16:00 avhe[d]: but I don't have the full view yet
16:01 dwlsalmeida[d]: do you feel like having a look on my code so far? maybe you can spot something I didn't who knows
16:01 dwlsalmeida[d]: I also have the code to dump the pic params in the tracer
16:02 avhe[d]: Sure but I doubt I'll spot anything
16:04 avhe[d]: dwlsalmeida[d]: So it seems the only potentially meaningful difference is that the cuvid library pushes something to SET_COLOC_DATA_OFFSET, but the tvmr one doesn't?
16:06 dwlsalmeida[d]: not the only one, also missing `SET_SUB_SAMPLE_*` stuff, `SET_HISTOGRAM_OFFSET`
16:06 avhe[d]: tvmr also has something at SET_HISTOGRAM_OFFSET but afaik this is not required (I believe they use histogram data in the VIC engine in some situations, to do some fancy color correction stuff)
16:07 avhe[d]: oh hmm I think the `SET_SUB_SAMPLE_` stuff has to do with drm/encryption (at least the library takes the address to both these maps where setting up decryption params)
16:28 dwlsalmeida[d]: avhe[d]: ok so the GStreamer code is
16:28 dwlsalmeida[d]: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/8316
16:28 dwlsalmeida[d]: the av1 nvk branch is https://gitlab.freedesktop.org/dwlsalmeida/mesa/-/tree/nvk-vulkan-video-av1?ref_type=heads
16:29 dwlsalmeida[d]: The code to dump the pic params in the tracer is https://pastebin.com/WmCM6c7m
16:29 dwlsalmeida[d]: although I think you don't need that, I made sure that the pic params are similar enough, and as I said, the YUV buffers are not touched at all
16:32 avhe[d]: Nothing written to nvdec_status_s?
16:35 dwlsalmeida[d]: well...good idea
16:35 dwlsalmeida[d]: that has been so useless thus far that I didn't even bother
16:35 dwlsalmeida[d]: but let's check that anyways
16:39 avhe[d]: Well it would confirm whether the job was successfully queued
16:39 avhe[d]: (which I expect it does but you never know)
16:42 dwlsalmeida[d]: If you mess with stream_len, you get a mmu fault
16:42 dwlsalmeida[d]: So I assume this part is actually working
16:44 avhe[d]: So I guess that means that the microcode actually kicks off the engines?
16:46 avhe[d]: Anyway I'd try looking within the AV1_GLOBAL_MODEL, PROB_TAB_WRITE, PROB_TAB_READ maps, seeing if you're missing data
16:50 dwlsalmeida[d]: avhe[d]: I guess this is basically what the “execute” method does ?
17:00 avhe[d]: not sure whether this starts up the controller or the actual hardware
17:42 dwlsalmeida[d]: avhe[d]: error_status == 11
17:46 dwlsalmeida[d]: I guess the NVIDIA people said in the past that this is just a bitfield with some cryptic name for the bits
17:50 dwlsalmeida[d]: What do you think about this tile layout?
17:50 dwlsalmeida[d]: https://elixir.bootlin.com/linux/v6.12.6/source/drivers/media/platform/verisilicon/rockchip_vpu981_hw_av1_dec.c#L566
17:52 avhe[d]: dwlsalmeida[d]: x0 and y0 doesn't match what nvdec needs from what I see, start and end do
17:53 avhe[d]: it is also 16 bytes long
17:54 dwlsalmeida[d]: It doesn't seem to match any of the structs they have in the header:
17:54 dwlsalmeida[d]: typedef struct _AV1TileInfo_OLD
17:54 dwlsalmeida[d]: {
17:54 dwlsalmeida[d]: unsigned char width_in_sb;
17:54 dwlsalmeida[d]: unsigned char height_in_sb;
17:54 dwlsalmeida[d]: unsigned char tile_start_b0;
17:54 dwlsalmeida[d]: unsigned char tile_start_b1;
17:54 dwlsalmeida[d]: unsigned char tile_start_b2;
17:54 dwlsalmeida[d]: unsigned char tile_start_b3;
17:54 dwlsalmeida[d]: unsigned char tile_end_b0;
17:54 dwlsalmeida[d]: unsigned char tile_end_b1;
17:54 dwlsalmeida[d]: unsigned char tile_end_b2;
17:54 dwlsalmeida[d]: unsigned char tile_end_b3;
17:54 dwlsalmeida[d]: unsigned char padding[6];
17:54 dwlsalmeida[d]: } AV1TileInfo_OLD;
17:54 dwlsalmeida[d]: typedef struct _AV1TileInfo
17:54 dwlsalmeida[d]: {
17:54 dwlsalmeida[d]: unsigned char width_in_sb;
17:54 dwlsalmeida[d]: unsigned char padding_w;
17:54 dwlsalmeida[d]: unsigned char height_in_sb;
17:54 dwlsalmeida[d]: unsigned char padding_h;
17:54 avhe[d]: dwlsalmeida[d]: So we know one of these bits is either MB_SYNTAX or EC_DONE https://discord.com/channels/1033216351990456371/1034184951790305330/1315672391346815046
17:54 dwlsalmeida[d]: } AV1TileInfo;
17:54 dwlsalmeida[d]: typedef struct _AV1TileStreamInfo
17:54 dwlsalmeida[d]: {
17:54 dwlsalmeida[d]: unsigned int tile_start;
17:54 dwlsalmeida[d]: unsigned int tile_end;
17:54 dwlsalmeida[d]: unsigned char padding[8];
17:54 f_: dwlsalmeida[d]: don't paste here use a pastebin........
17:54 dwlsalmeida[d]: } AV1TileStreamInfo;
17:54 avhe[d]: you can try disabling error concealment to figure out which is whi
17:55 dwlsalmeida[d]: f_: hey man I was under the impression that small pastes wouldn't be so bad, you can see above that I've used pastebin today
17:55 avhe[d]: dwlsalmeida[d]: actually, it matches AV1TileInfo_OLD?
17:56 avhe[d]: I guess b0-3 stands for byte 0-3
17:57 dwlsalmeida[d]: well not really, because width and height in sb are `u8`
17:58 dwlsalmeida[d]: but in the kernel driver, there's a padding
17:58 dwlsalmeida[d]: maybe that's to account for the compiler padding?
17:59 avhe[d]: yeah, looks like nvidia moved the padding around
17:59 avhe[d]: in any case the hantro code and nvdec struct you found pretty much confirm what I re'd above
18:00 avhe[d]: I'm curious about the _OLD thing though, maybe that changed in later gens?
18:03 dwlsalmeida[d]: f_: how much does it take to break the bot? Would a couple of lines do it already?
18:03 f_: dwlsalmeida[d]: it's not a question of if a bot breaks
18:04 f_: but it's spammy to paste lines directly in the chat, on IRC that is
18:04 f_: and frowned upon
18:04 f_: Now, I don't know what discord users do in that regard
18:13 dwlsalmeida[d]: avhe[d]: what about this padding[6] business at the end?
18:16 avhe[d]: this is what I translated to `__attribute__((aligned(0x10)))`
18:21 dwlsalmeida[d]: avhe[d]: this crashes FYi
18:22 avhe[d]: like, segfaults?
18:22 dwlsalmeida[d]: no, MMU fault
18:23 dwlsalmeida[d]: let's try the OLD one
18:23 avhe[d]: well... at least it's doing something now?
19:55 dwlsalmeida[d]: well, apparently the PROB_TAB_READ buffer does have data
19:55 dwlsalmeida[d]: at least the tracer will dump it
19:56 dwlsalmeida[d]: I'll try loading the default prob table into it
19:56 dwlsalmeida[d]: the global model buffer too..
19:57 dwlsalmeida[d]: PROB_TAB_WRITE is empty
21:01 tiredchiku[d]: can someone on arch confirm that they're able to load up a zink+nvk plasma wayland session?
21:02 tiredchiku[d]: plasma 6.25
22:45 dwlsalmeida[d]: ok, loading the default probs in PROB_TAB_READ actually works
22:46 dwlsalmeida[d]: I mean, it now produces some gray output, but at least it's not all zeroes anymore
23:31 arrigopapito: 29+3-52=-20 and -20+29+3=12 -20+29+29=38 38-12-12=14 and 38-52=-14 that situation one wants to land. where a constant based logics reach both signs of the same value. 14 and -14, now finally 14 can be cancelled or eliminated, however 29+29-52=6 38-6-6=26 and hence 26-14=12 , and final answer is 29-12+26=43+26=69