04:26airlied[d]: skeggsb9778[d]: dwlsalmeida[d] so I wonder should we expose a flag to userspace so it knows nvdec might work, or should we just create a context on driver start to check for exposing extensions?
04:28skeggsb9778[d]: just bump the driver minor version or patchlevel?
04:30airlied[d]: hmm I wonder do we even read those, but yeah that might be sufficient, or a getparm
04:32airlied[d]: ah yeah we bumped it for vmbind, so perhaps that'll work
04:39airlied[d]: dwlsalmeida[d]: I think the userspace patches need to be a bit further along so we can validate the kernel change properly as uAPI instead of hacks
04:40airlied[d]: and I wonder if we should consider what enabling encode would look like at the same time
04:49skeggsb9778[d]: yeah, may as well expose nvdec/nvenc *and* ofa
04:49skeggsb9778[d]: nvjpg too while you're at it
04:49skeggsb9778[d]: they don't require anything special afaik
04:49skeggsb9778[d]: RM handles all that now
05:03skeggsb9778[d]: (you might want to test allocating the class, and sending a SET_OBJECT mthd for each at least, to be sure)
05:03skeggsb9778[d]: i've done that from the kernel already though, so it should work
05:03airlied[d]: what is ofa for?
05:04skeggsb9778[d]: not a clue, i've never looked it up except a quick google a year or so ago - "optical flow acceleration" iirc
05:08airlied[d]: ah might be used in VR
05:09tiredchiku[d]: optical flo API is also used for DLSS FrameGen afaik
05:23babblebones[d]: airlied[d]: Not on linux at least SteamVR's optical flow dumps were never implemented on the vulkan vrcomp and monado doesn't use them, although maybe one day it could for framegen
05:23babblebones[d]: There's technically better depth based approaches but apps need to stop being dumb and submit depth to XR
09:30avhe[d]: https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s91024-nvidia-optical-flow-sdk-in-turing-gpus.pdf
09:30avhe[d]: if i understand this right, OFA is the motion estimation block inside NVENC exposed as a separate engine?
09:32avhe[d]: and yeah, applications are basically framegen and video classification
10:17mohamexiety[d]: Yeah the big usecase I know of is DLSS framegen
10:34dwlsalmeida[d]: airlied[d]: I need to hack something together for ampere and Ada too
10:35airlied[d]: I think they should work fine with that paych
10:35dwlsalmeida[d]: As soon as I post that, the people working on GStreamer will try to test it, and they might not be on Turing
10:35dwlsalmeida[d]: Oh great
10:38airlied[d]: Pretty sure I wrote initial code on an ada
20:07airlied[d]: dwlsalmeida[d]: have you rebased onto mesa master?
23:48dwlsalmeida[d]: airlied[d]: I am working on the Rust port, now that I got the C stuff to work reasonably well
23:51dwlsalmeida[d]: I think I am failing to upload the push buffer somehow, what is this `mem_offset` thing?
23:51dwlsalmeida[d]: VkResult map_result = nvkmd_mem_map(mem, &dev->vk.base,
23:51dwlsalmeida[d]: NVKMD_MEM_MAP_RD, NULL,
23:51dwlsalmeida[d]: &map);
23:51dwlsalmeida[d]: if (map_result == VK_SUCCESS) {
23:51dwlsalmeida[d]: struct nv_push push = {
23:51dwlsalmeida[d]: .start = mem->map + mem_offset,
23:51dwlsalmeida[d]: .end = mem->map + mem_offset + p->range,
23:51dwlsalmeida[d]: };
23:53dwlsalmeida[d]: I am copying stuff to `mem->map` directly:
23:53dwlsalmeida[d]: VkResult
23:53dwlsalmeida[d]: nvk_cmd_buffer_append_rust_push(struct nvk_cmd_buffer *cmd,
23:53dwlsalmeida[d]: uint32_t *data,
23:53dwlsalmeida[d]: uint32_t dw_count)
23:53dwlsalmeida[d]: {
23:53dwlsalmeida[d]: VkResult result;
23:53dwlsalmeida[d]: if (!cmd->rust.mem) {
23:53dwlsalmeida[d]: result = nvk_cmd_buffer_alloc_mem(cmd, false, &cmd->rust.mem);
23:53dwlsalmeida[d]: if (result != VK_SUCCESS)
23:53dwlsalmeida[d]: return result;
23:53dwlsalmeida[d]: }
23:53dwlsalmeida[d]: if (dw_count > cmd->rust.mem->mem->size_B / 4) {
23:53dwlsalmeida[d]: nvk_cmd_buffer_flush_push(cmd);
23:53dwlsalmeida[d]: }
23:53dwlsalmeida[d]: memcpy(cmd->rust.mem->mem->map, data, dw_count * 4);
23:53dwlsalmeida[d]: cmd->rust.dw_count += dw_count;
23:53dwlsalmeida[d]: return VK_SUCCESS;
23:53dwlsalmeida[d]: }
23:59airlied[d]: seems about right, but I'm not 100% sure on all the new nvkmd code, as long as you are only putting one push into one allocation you don't need the offset I don't think