IRC Logs of #nouveau on irc.freenode.net for 2024-12-24

00:11 dwlsalmeida[d]: dwlsalmeida[d]: ok this tile size business is definitely something that must be figured out... hardcoding `0x00030003` makes it work.
00:11 dwlsalmeida[d]: Looking at your code, i.e.:
00:11 dwlsalmeida[d]: uint16_t *tile_thing = sizes + 0x380;
00:11 dwlsalmeida[d]: if (pps->uniform_spacing_flag) {
00:11 dwlsalmeida[d]: for (i = 0; i < pps->num_tile_columns; ++i)
00:11 dwlsalmeida[d]: *tile_thing++ = (i + 1) * sps->ctb_width / pps->num_tile_columns <<
00:11 dwlsalmeida[d]: (sps->log2_diff_max_min_coding_block_size + sps->log2_min_cb_size - 4);
00:11 dwlsalmeida[d]: for (i = 0; i < pps->num_tile_rows; ++i)
00:11 dwlsalmeida[d]: *tile_thing++ = (i + 1) * sps->ctb_height / pps->num_tile_rows <<
00:11 dwlsalmeida[d]: (sps->log2_diff_max_min_coding_block_size + sps->log2_min_cb_size - 4);
00:11 dwlsalmeida[d]: } else {
00:11 dwlsalmeida[d]: sum = 0;
00:11 dwlsalmeida[d]: for (i = 0; i < pps->num_tile_columns; ++i)
00:11 dwlsalmeida[d]: *tile_thing++ = (sum += pps->column_width[i]) <<
00:11 dwlsalmeida[d]: (sps->log2_diff_max_min_coding_block_size + sps->log2_min_cb_size - 4);
00:11 dwlsalmeida[d]: sum = 0;
00:11 dwlsalmeida[d]: for (i = 0; i < pps->num_tile_rows; ++i)
00:11 dwlsalmeida[d]: *tile_thing++ = (sum += pps->row_height[i]) <<
00:11 dwlsalmeida[d]: (sps->log2_diff_max_min_coding_block_size + sps->log2_min_cb_size - 4);
00:12 dwlsalmeida[d]: }
00:12 dwlsalmeida[d]: for (i = 0; i < pps->num_tile_rows; ++i) {
00:12 dwlsalmeida[d]: for (j = 0; j < pps->num_tile_columns; ++j) {
00:12 dwlsalmeida[d]: sizes[0] = pps->column_width[j];
00:12 dwlsalmeida[d]: sizes[1] = pps->row_height [i];
00:12 dwlsalmeida[d]: sizes += 2;
00:12 dwlsalmeida[d]: }
00:12 dwlsalmeida[d]: }
00:12 dwlsalmeida[d]: }
00:12 dwlsalmeida[d]: At least the middle part looks like the spec here?
00:12 dwlsalmeida[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1320906873876250705/Screenshot_2024-12-23_at_21.12.10.png?ex=676b4de3&is=6769fc63&hm=6c28523e0b86cefdfbf49e25899c81af67958c14536ffddd7a5e83ea7c2f28f8&
00:12 dwlsalmeida[d]: except for this `0x380` offset, and the last for loop, which I haven't figured out yet
00:14 dwlsalmeida[d]: Also, what is this? `clc9b0::SetIntraTopBufOffset `? It shows up in the tracer, but I don't know the right size, nor if I should be providing any data vs treating it as an opaque buffer
00:17 dwlsalmeida[d]: If it's not there we get an mmu fault, so it's being accessed somehow.
00:17 dwlsalmeida[d]: I noticed that they provide a size for the AV1 one:
00:17 dwlsalmeida[d]: // AV1 Intra Top buffer
00:17 dwlsalmeida[d]: #define AV1_INTRA_TOP_BUF_SIZE NVDEC_ALIGN(8*8192)
00:17 dwlsalmeida[d]: But it's unclear if I can steal that for HEVC too
07:08 avhe[d]: dwlsalmeida[d]: i honestly don't remember what this is about (going by how i named it, it looks like i never did). at a glance, it looks like tile boundaries maybe?
07:16 avhe[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1321013509546311680/image.png?ex=676bb133&is=676a5fb3&hm=b408bd848e429fed903c5b524cf24dc7d0e114870c8bf0a2e076b67ce6f29b72&
07:16 avhe[d]: dwlsalmeida[d]: this doesn't exist on maxwell, that's why i don't have that in my code. looking at the library for xavier (ie. volta), it seems like a per-decoder-instance buffer with a fixed width of 0x10000. it gets cpu-mapped depending on some flag, but i can't see anything writing to it
07:16 avhe[d]: so, same size as the av1 buffer
12:31 Green: Is ad103 usable with no firmware blobs? Such as with linux-libre
13:59 tiredchiku[d]: openrm supports VM_BIND, right?
14:27 gfxstrand[d]: I assume it does in some form. IDK how the API works, though.
14:32 tiredchiku[d]: different question then, does NVK support running _without_ VM_BIND?
14:32 gfxstrand[d]: No. Not anymore
14:33 gfxstrand[d]: Well, sort-of.
14:33 gfxstrand[d]: You could probably make some compute stuff work without it.
14:33 tiredchiku[d]: :myy_TinyThink:
14:33 tiredchiku[d]: working on fixing up yusuf's openrm winsys code
14:34 gfxstrand[d]: But for full image support it's a hard requirement.
14:34 tiredchiku[d]: and `nouveau_ws_device` has aa `has_vm_bind` bool
14:34 gfxstrand[d]: Oh, that's mostly so that we can detect old kernels and fail to enumerate if we see one l.
14:34 tiredchiku[d]: I thought I'd default it to true for now, but couldn't find info on openrm supporting it
14:34 tiredchiku[d]: oh
14:35 mohamexiety[d]: you can probably safely assume that openrm has something similar
14:35 mohamexiety[d]: just not it specifically/doesn't have the same API
14:35 gfxstrand[d]: I'm sure openrm supports it in some form. It's impossible to implement Vulkan on Nvidia without VM_BIND.
14:35 mohamexiety[d]: yeah, also CUDA etc
14:36 gfxstrand[d]: Also, I wouldn't bother porting the nouveau_ws stuff. It's pretty much only helpful for dealing with a bit of DRM weirdness.
14:36 tiredchiku[d]: not doing all of it, no
14:37 tiredchiku[d]: just the very basics that nvkmd needs
14:38 tiredchiku[d]: like, nvkmd calls `nvkmd_nouveau_try_create_pdev()`, which then calls `nouveau_ws_device_new()`
14:38 tiredchiku[d]: stuff like that
14:38 gfxstrand[d]: Yeah, point is that that abstraction may not be useful for openrm at all.
14:38 tiredchiku[d]: tiredchiku[d]: fwiw, notthatclippy[d]
14:38 tiredchiku[d]: gfxstrand[d]: ..huh
14:39 gfxstrand[d]: It's kinda nice to have an extra layer there for nouveau.ko because of dma-buf import/export rules and the way we enumerate devices.
14:41 tiredchiku[d]: so you'd recommend plugging nvkmd directly into openrm?
14:41 tiredchiku[d]: and not bothering with going through winsys?
14:41 gfxstrand[d]: So like nouveau_ws_device is useful because we need to create one for the nvkmd_phys_dev in order to fill out nv_device_info and we also need one for nvkmd_dev. It's helpful to have the other abstraction so we can share code between the two.
14:42 tiredchiku[d]: I see
14:42 tiredchiku[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1321125823180181514/0PM1yxK.png?ex=676c19cd&is=676ac84d&hm=cd9eaaf6204bb2a588e4ee2620a628df39e1b8741194dfa7d5177eb85bdfc381&
14:42 tiredchiku[d]: something like this then, that directly calls to rmapi
14:42 gfxstrand[d]: But that's just because nouveau.ko has no way to give us class information without creating a full context.
14:42 tiredchiku[d]: yeah, did see the tmp_ctx stuff
14:43 gfxstrand[d]: tiredchiku[d]: Yeah. I'd start there and see how bad the mismatch is and what internal abstractions we need to build.
14:44 gfxstrand[d]: But I wouldn't assume that the annoyances with nouveau.ko will match openrm.
14:44 tiredchiku[d]: okay!
14:44 tiredchiku[d]: time to throw out everything I did yesterday 😅
14:44 tiredchiku[d]: https://tenor.com/view/throw-out-rage-parks-and-rec-nick-offerman-ron-swanson-gif-17688925
14:45 gfxstrand[d]: I'm sure openrm will have annoyances. Don't get me wrong there. 😅 Just that they'll be different ones.
14:45 tiredchiku[d]: yeah, that's fair
14:45 tiredchiku[d]: I was thinking I'd set up an openrm winsys layer and just run conditionals to call them in nvkmd
14:45 tiredchiku[d]: but I suppose this makes more sense
14:45 gfxstrand[d]: Go ahead and keep what you've got. Just copy+paste it in.
14:46 gfxstrand[d]: tiredchiku[d]: Oh, you definitely want openrm to be its own NVKMD backend in its own subfolder.
14:47 gfxstrand[d]: But go ahead and copy+paste the nouveau folder if that helps you get started.
14:48 tiredchiku[d]: yee
14:48 tiredchiku[d]: thanks :)
19:47 airlied[d]: Pretty sure you need to use UVM interfaces for VM mgmt
19:55 notthatclippy[d]: You only need uvm if you want to service remote pagefaults I think. For simple mapping of sysmem to GPU and vidmem to CPU you can just go through rmapi.
19:56 notthatclippy[d]: I don't think mesa ever relies on seamless migration, hmm and so on, so rmapi should be sufficient.
19:58 notthatclippy[d]: Also, keep in mind that mapping to GPU goes entirely through ioctl, but to map to CPU it needs a specific dance of ioctl and mmap commands. Best trace how the proprietary stack does it.
20:47 airlied[d]: Oh cool, I thought the NVIDIA Vulkan driver went via UVM for some stuff like sparse bindings but I haven't looked too closely