00:15Jasper[m]: airlied, just for my confirmation (in case I misread), did the "sync dma after setup is called" patch you made for this regression https://lore.kernel.org/nouveau/CAPM=9txeL+WxYuuGYyhGouXYC0=Y=YS=k=-4G74JbfT2mvkn2g@mail.gmail.com/ land in 6.12?
00:22airlied: seems to be
13:28dwlsalmeida[d]: Hey, a colleague is trying to build NVK:
13:28dwlsalmeida[d]: $ meson builddir -Dvulkan-drivers=nouveau -Dvideo-codecs=h264dec -Dprefix=pwd/builddir/install -Dgallium-drivers=nouveau
13:28dwlsalmeida[d]: /DEV/mesa/src/nouveau/winsys/./nouveau_bo.h:41:4: error: unknown type name 'atomic_uint_fast32_t'
13:28dwlsalmeida[d]: Unable to generate bindings: clang diagnosed error:/DEV/mesa/src/nouveau/winsys/./nouveau_bo.h:41:4: error: unknown type name 'atomic_uint_fast32_t'
13:28dwlsalmeida[d]: anyone knows what this is?
13:29dwlsalmeida[d]: it's ubuntu 22.04 btw,
14:19marysaka[d]: dwlsalmeida[d]: Seems to be this? https://gitlab.freedesktop.org/mesa/mesa/-/issues/11869
14:20dwlsalmeida[d]: marysaka[d]: thanks Mary!
15:09gfxstrand[d]: Ugh... Maybe bindgen isn't using C11?
15:10gfxstrand[d]: Or maybe it's trying to treat the header as C++? Tough if that were the case it should still work.
15:11gfxstrand[d]: I'm not particularly happy with how fragile bindgen has been. 🫤
15:14gfxstrand[d]: Jasper[m]: Tegra X1 is Maxwell so NVK itself should be fine. However, there are kernel synchronization issues which cause problems. GL seems to mostly dodge them for some reason but they hammer NVK.
15:46dwlsalmeida[d]: gfxstrand[d]: IMHO bindgen is so underfunded given how extremely important it is
15:47dwlsalmeida[d]: Or at least it looks underfunded
15:49karolherbst[d]: when using the meson builtin, bindgen should get the proper std flag passed in from the project setting... I hope 🙃
15:50karolherbst[d]: maybe ubuntu's bindgen is weird?
16:04gfxstrand[d]: dwlsalmeida[d]: Yeah, it's not great. But also, from the bug, it looks like LLVM was that thing that changed behavior on us. I can't say I'm surprised...
16:10karolherbst[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11869#note_2715259 ahh yeah.. classic :ferrisUpsideDown:
16:10karolherbst[d]: I think this is one of the issues which only happens with distro provided bindgens
16:11karolherbst[d]: and it kinda has to do how bindgen loads clang
16:16lingm[d]: Was there any build error caused by bindgen which would have been reproducible with the crate version? To me it looks like we always see those kind of problems because Mesa and maybe the kernel are one of the very few projects using bindgen's command line interface.
16:18zmike[d]: I'm back
16:18zmike[d]: I assume everyone is using zink+nvk successfully now
16:19karolherbst[d]: I'm actually considering using the rust API moving forward, because there are a few things I want to use 🙃
16:20karolherbst[d]: but I also don't want to write 100 wrap files
16:20karolherbst[d]: the API allows you to specify the type for macro constants, which is something which is something I'd like to use
16:20lingm[d]: oh, that reminds me. there was an MR you might want to rb/ack/merge. sec, let me search
16:20karolherbst[d]: uhm...
16:21karolherbst[d]: ohh. mhh I have an idea
16:21karolherbst[d]: I could just do a mesa-bindgen project being all cargo as a dependency 😄
16:21lingm[d]: RiiR meson. with blackjack and build scripts
16:23lingm[d]: karolherbst[d]: this is the mr you made me think of <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32357>
16:24karolherbst[d]: yeah, but bindgen has more deps 😄
16:31dwlsalmeida[d]: lingm[d]: You mean cargo?
16:31dwlsalmeida[d]: This is a joke, do not kill me
16:52Booti: Hi, do you know what is the current status of the kernel module for GA107 (Ampere, laptop)?
17:23gfxstrand[d]: karolherbst[d]: We could just stop using C11 atomics and use p_atomic instead. I'm already using p_atomic for the decrement. C11 atomics aren't 100% C++ compatible anyway (which caused problems for unit tests in the past).
17:26gfxstrand[d]: Oh, I guess I'm not using p_atomic. I'm using the C11 things. But still...
17:27marysaka[d]: We do use p_atomic for query pools available part if I remember correctly
17:31mhenning[d]: Yeah, we've gotten enough different reports of it that it might be worthwhile to rewrite at least that usage
17:32mhenning[d]: I could put a patch together if nobody is typing it up already
17:33gfxstrand[d]: I also might be worth someone spinning up an Ubuntu VM or container and figuring out why it's failing. If bindgen is missing args, we should fix that.
17:33gfxstrand[d]: Before we start changing code.
17:34gfxstrand[d]: But I'm also a fan of dropping the C11 atomic given how much of a headache it's been. 🤷🏻♀️
17:34mhenning[d]: The reports generally indicate that it's fixed by changing installed clang header files / libclang versions
17:34mhenning[d]: so it sounds like maybe their compilers are slightly broken in a rather common way
17:35mhenning[d]: because of that I kind of doubt that a bindgen arg will fix it
17:35gfxstrand[d]: But does it compile the C/C++ code okay?
17:36mhenning[d]: Yeah, the reports only ever have errors in bindgen
17:38gfxstrand[d]: That's why I'm wondering if we just need to pass `-std=c2x`. Maybe the command line default isn't the same as the API default.
17:38mhenning[d]: eg. https://gitlab.freedesktop.org/mesa/mesa/-/issues/12197 uses gcc as the main c compiler and only reports issues with bindgen's clang
17:40mhenning[d]: gfxstrand[d]: Yeah, that's fair. That might be worth checking
17:55karolherbst[d]: gfxstrand[d]: I think the issue is that bindgen simply picks the wrong libclang.so file and then things break in subtle ways
17:56karolherbst[d]: and usually if you get a newer major version of clang while keeping bindgen around it starts to break
18:37redsheep[d]: zmike[d]: From what I understand tiredchiku[d] has mostly been able to get it working with plasma Wayland now
18:38redsheep[d]: Apparently it was a qt bug in conjunction with me still having the nvidia userspace installed causing the session to try to use the wrong icd which broke all of my efforts last year
18:41redsheep[d]: Having to do a workaround isn't ideal, but also people trying to use nvk+zink should avoid also having the nvidia drivers installed since you already have to work around them blocking nouveau loading so 🤷
18:42zmike[d]: yeah if you're doing both you have to be prepared to do all the switcharounds
19:01Jasper[m]: @_oftc_gfxstrand[d]:matrix.org thanks for the notes! I'm having many other issues with the gl and kernel driver already anyways so I'm ways away from having anything working at all
19:02Jasper[m]: I will try to fetch more logs this evening and try to get some stuff better set up w.r.t. kernel config and including gpu firmware into the initfs for example
19:03Jasper[m]: Along those lines, does anyone have a good config for a Tegra X1 device that may have a functioning GPU? I can at least have something to compare my current setup to.
19:24gfxstrand[d]: karolherbst[d]: If true, that's... aggregating.
19:24karolherbst[d]: yeah, so just use bindgen from cargo and you won't have to deal with that
19:24gfxstrand[d]: So you think it's using the C++ API from one version and then loading the wrong one?
19:26karolherbst[d]: I have no idea, at some point I felt adventurous and wanted to figure out what exactly went wrong, but then I saw it using this dynamic library loading thing and I lost interest, because it's kinda more of a mess than just "bindgen picking the wrong libclang". It's more an implication of how this lib loader crate is functioning
19:26karolherbst[d]: but I don't really know _why_ it's breaking things
19:26karolherbst[d]: but I think bindgen gets compiled against one version, then using another can break things
19:26karolherbst[d]: like...
19:26karolherbst[d]: it's weird
19:27karolherbst[d]: it makes sense once you check how this works from a development perspective
19:28karolherbst[d]: https://docs.rs/libloading/latest/libloading/ is the thing
19:28karolherbst[d]: and between that and bindgen is clang-sys
19:28mhenning[d]: Looking at the ubuntu package, it looks like bindgen statically links libclang - there's no dependency on any kind of libclang package nor any so in the package contents https://packages.ubuntu.com/oracular/bindgen
19:28karolherbst[d]: it's not linked
19:28karolherbst[d]: it's loaded at runtime
19:29mhenning[d]: Okay, but you can install bindgen without any libclang on ubuntu then
19:29karolherbst[d]: _though_ I think it _can_ be linked statically through clang-sys, but uhm.. not sure
19:29karolherbst[d]: I mean...
19:29karolherbst[d]: removing libclang fixes the problem for users
19:30karolherbst[d]: there is also the issue of clang resource files
19:30asdqueerfromeu[d]: dwlsalmeida[d]: NVK is still building fine on Chaotic-AUR (which uses an Arch chroot)
19:30mhenning[d]: My guess is that it's related to data files rather than the so itself
19:30karolherbst[d]: there are multiple layers of pain
19:30karolherbst[d]: yeah.. and maybe it tries to also fetch at the wrong location or something, but I _think_ it depends on libclang being installed and then depending on that it searches for resource files or something weird like that
19:30karolherbst[d]: it's really weird
19:31karolherbst[d]: what I do know is, that bindgen from cargo doesn't have that problem
19:31karolherbst[d]: and I'm sure if one bindgen is linked statically against libclang, it's that one
19:34mhenning[d]: Yeah, I'm playing with an ubuntu vm right now and I'm getting `/home/mel/comp/mesa/src/./util/memstream.h:31:10: fatal error: 'stdbool.h' file not found` from bindgen which is one I haven't seen before
19:35mhenning[d]: It is correctly passing `-std=c11` so that's not the issue
19:35karolherbst[d]: same issue, just different version or so
19:36karolherbst[d]: or maybe different file to be generated
19:36tiredchiku[d]: redsheep[d]: blisto[d] has it working on plasma wayland too
19:36blisto[d]: Yaha
19:37karolherbst[d]: I'm sure we could add some workaround to meson to make it more reliable, but I'm sure it's some debugging effort to have a more predictable model on how to prevent it from failing like this
19:42gfxstrand[d]: mhenning[d]: That makes sense. If it's statically linked against 17 but only the 18 files exist or something like that... That's a mess. 😩
19:44mhenning[d]: Yeah. The ubuntu mesa package build-depends on libclang-19-dev, which is presumably how they get it working there
19:45mhenning[d]: But actually can't find any indication in the bindgen package as to why that's the correct version to install - there's really no documentation on that from a user perspective
20:37karolherbst[d]: there is some magic stuff going on with the libloading crate
20:37karolherbst[d]: or rather clang-sys abstracts the version away
20:57mhenning[d]: Does meson have an equivalent of `compiler.compiles` for bindgen? It would be nice if we could at least give a useful error message on systems with the issue
21:05karolherbst[d]: yeah it's `rust.bindgen`
21:05karolherbst[d]: oh wait
21:05karolherbst[d]: you mean to check if something compiles...
21:06mhenning[d]: Yeah
21:16karolherbst[d]: it's fun. the bindgen I have from cargo does use my system libclang
21:17karolherbst[d]: `bindgen include/c99_alloca.h -- --version` or some other header, doesn't really matter
22:37dwlsalmeida[d]: hey, sadly I need to map a `VkBuffer` for the video stuff, are you guys OK if I store the `VkDeviceMemory` used in `nvk_bind_buffer_memory`
22:39airlied[d]: dwlsalmeida[d]: is it just to initialise it? or for something bad?
22:39dwlsalmeida[d]: I need to parse the slice header, like ANV does
22:39airlied[d]: no we don't want to do that
22:40airlied[d]: I'm 99.9% sure NVIDIA doesn't require that
22:40airlied[d]: anv doing it is a gross hack which violates vulkan API
22:41dwlsalmeida[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31867/diffs#fcacaa3a8247e8ab06d0c24ca05fa42b7d32d845_0_741
22:41dwlsalmeida[d]: notably:
22:41dwlsalmeida[d]: // XXX: we also need to compute the bits for the long term RPS.
22:41dwlsalmeida[d]: // Unfortunately this is not in the Vulkan Video API, but it should
22:41dwlsalmeida[d]: // be doable by inferring from the values in the SPS, PPS and
22:41dwlsalmeida[d]: // StdVideoH265PictureInfo.
22:41dwlsalmeida[d]: //
22:41dwlsalmeida[d]: // It would be great if we had some
22:41dwlsalmeida[d]: // `std_pic_info.NumBitsForLTRefPicSetInSlice` instead.
22:42airlied[d]: so infer it from the values in the sps/pps and picture info
22:42airlied[d]: that shouldn't need to map the slice header
22:42airlied[d]: we can ask NVIDIA what the ended up doing, but I expect it might be quicker to figure it out ourselves, I don't have a direct line to the video codec folks
22:44airlied[d]: do you have a video that needs it? might help figure it out
22:48dwlsalmeida[d]: yeah, the problem is that I just realized we can't, actually.
22:48dwlsalmeida[d]: If long term references are disabled, then we can infer from the SPS because all syntax elements are read with `u()`, i.e.: they're not arithmetic-coded, meaning we know exactly how many bits were read.
22:48dwlsalmeida[d]: which is the trick I used here:
22:48dwlsalmeida[d]: if pps.flags.output_flag_present_flag() != 0 {
22:48dwlsalmeida[d]: sw_hdr_skip_len += 1; // output_flag_present_flag
22:48dwlsalmeida[d]: }
22:48dwlsalmeida[d]: if sps.flags.separate_colour_plane_flag() != 0 {
22:48dwlsalmeida[d]: sw_hdr_skip_len += 2; // colour_plane_id
22:48dwlsalmeida[d]: }
22:48dwlsalmeida[d]: i.e.:
22:48dwlsalmeida[d]: output_flag_present_flag is `u(1)`, separate_colour_plane_flag is `u(2)`, so we can just add the number of bits to get the final value to skip.
22:48dwlsalmeida[d]: Just to be clear, here is the equivalent parser code in `vk_video.c`:
22:48dwlsalmeida[d]: if (pps->flags.output_flag_present_flag)
22:48dwlsalmeida[d]: /* pic output flag */
22:48dwlsalmeida[d]: vl_rbsp_u(&rbsp, 1); <--- 1 bit here
22:48dwlsalmeida[d]: if (sps->flags.separate_colour_plane_flag)
22:48dwlsalmeida[d]: /* colour_plane_id */
22:48dwlsalmeida[d]: vl_rbsp_u(&rbsp, 2); <--- 2 bits here
22:48dwlsalmeida[d]: ... and so on
22:48dwlsalmeida[d]: But if long term references are in place, then we have to read arithmetic-coded syntax elements:
22:48dwlsalmeida[d]: if (sps->flags.long_term_ref_pics_present_flag) {
22:48dwlsalmeida[d]: unsigned num_lt_sps = 0;
22:48dwlsalmeida[d]: if (sps->num_long_term_ref_pics_sps > 0)
22:48dwlsalmeida[d]: num_lt_sps = vl_rbsp_ue(&rbsp); <--- like here
22:48dwlsalmeida[d]: unsigned num_lt_pics = vl_rbsp_ue(&rbsp); <--- or here
22:48dwlsalmeida[d]: unsigned num_refs = num_lt_pics + num_lt_sps;
22:48dwlsalmeida[d]: These read a variable number of bits by definition
22:48dwlsalmeida[d]: i.e.: this trick doesn't work
22:51dwlsalmeida[d]: the proper solution is to get `NumBitsForLTRefPicSetInSlice` in the API, apparently someone forgot
22:51airlied[d]: it might be worth asking Tony @ nvidia if they can provide some ideas,
22:52airlied[d]: Intel missing things I generally can buy, NVIDIA usually don't drop the ball so badly :-
22:54dwlsalmeida[d]: are you sure they don't parse the header in the blob? because if they do, then they didn't really forget, they just cheat (like I am trying to do now lol)
22:54airlied[d]: you can't technically parse the header and be a proper implementation
22:54airlied[d]: unless you use a compute shader
22:54airlied[d]: and generate commands to submit work
22:54airlied[d]: which I don't think they would do
22:55airlied[d]: vulkan allows you to record the command stream without the buffer ever having been filled at that point
22:57airlied[d]: if you have to do a CPU operation on a resource inside a vkCmd you lose, I hacked anv because I lost, and in theory intel has fw to do it properly, it just requires a different interface
22:59dwlsalmeida[d]: hmm, I see
23:00dwlsalmeida[d]: avhe[d]: do you know what this `bBitParsingDisable` business is? maybe we don't have to provide this skip length value if we set this to 1, just a wild idea
23:00dwlsalmeida[d]: yeah Dave, I don't know really, I will try asking Tony in the vulkan video tsg call
23:02avhe[d]: dwlsalmeida[d]: nope sorry
23:03avhe[d]: but they do have a NVC9B0_SET_APPLICATION_ID_ID_HEVC_PARSER pseudo-codec that i believe they use for encrypted bitstreams
23:03avhe[d]: so that's possibly a way forward
23:05avhe[d]: airlied[d]: fwiw vp9 with backwards updates might be even worse, because the probability update has to be performed on the cpu, which forces an explicit synchronization
23:05dwlsalmeida[d]: avhe[d]: you just confirmed what I asked about AV1 the other day, unfortunately
23:05dwlsalmeida[d]: I was hoping that wasn't the case
23:05avhe[d]: in my tegra stuff i just wait at each frame but you might be able to get more clever about it
23:05dwlsalmeida[d]: because this just sucks really
23:06avhe[d]: dwlsalmeida[d]: i haven't looked at av1, so maybe
23:07dwlsalmeida[d]: they provide a symbol count, I don't think there's a reason to do so other than have the driver do the prob updates :/
23:09avhe[d]: well, at least inside the tegra library they have a function that retrieves the entropy buffer, and it aborts if the codec isn't vp9
23:10avhe[d]: so it might be that the av1 block manages that stuff internally
23:11dwlsalmeida[d]: btw since you're here, do you recognize this scan order?
23:11dwlsalmeida[d]: [0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15];
23:11dwlsalmeida[d]: The scaling matrix is sent in raster order, but the driver expects the order above
23:12dwlsalmeida[d]: I have 0 idea what this is, but if you permute the 4x4 table like that, it works
23:13dwlsalmeida[d]: I noticed that this is not the case on tegra, apparently
23:13avhe[d]: <https://github.com/FFmpeg/FFmpeg/blob/nvtegra/libavcodec/vulkan_hevc.c#L186>
23:13avhe[d]: isn't it just that stuff?
23:15dwlsalmeida[d]: this looks like up-right diagonal order, which is
23:15dwlsalmeida[d]: static const guint8 uprightdiagonal_4x4[16] = {
23:15dwlsalmeida[d]: 0, 4, 1, 8,
23:15dwlsalmeida[d]: 5, 2, 12, 9,
23:15dwlsalmeida[d]: 6, 3, 13, 10,
23:16dwlsalmeida[d]: 7, 14, 11, 15
23:16dwlsalmeida[d]: };
23:16dwlsalmeida[d]: not the same thing
23:17dwlsalmeida[d]: not zig-zag order either:
23:17dwlsalmeida[d]: static const guint8 zigzag_4x4[16] = {
23:17dwlsalmeida[d]: 0, 1, 4, 8,
23:17dwlsalmeida[d]: 5, 2, 3, 6,
23:17dwlsalmeida[d]: 9, 12, 13, 10,
23:17dwlsalmeida[d]: 7, 11, 14, 15,
23:17dwlsalmeida[d]: };
23:21redsheep[d]: Isn't that just a 4 bit count with the two most and least significant digits swapped?
23:21avhe[d]: not sure then, on tegra this is just raster order it seems
23:21avhe[d]: i'd be surprised if the discrete cards did any different
23:27dwlsalmeida[d]: redsheep[d]: which one?
23:28redsheep[d]: dwlsalmeida[d]: This one. I thought you were just asking what pattern it is
23:28avhe[d]: dwlsalmeida[d]: anyway if i had to call it anything this would be vertical raster order
23:28avhe[d]: since it just goes by column instead of row
23:28redsheep[d]: Exactly
23:30dwlsalmeida[d]: ok I totally did not realize this
23:31dwlsalmeida[d]: for the 8x8 matrix, it's
23:31dwlsalmeida[d]: [
23:31dwlsalmeida[d]: 0, 8, 16, 24, 32, 40, 48, 56, 1, 9, 17, 25, 33, 41, 49, 57, 2, 10,
23:31dwlsalmeida[d]: 18, 26, 34, 42, 50, 58, 3, 11, 19, 27, 35, 43, 51, 59, 4, 12, 20,
23:31dwlsalmeida[d]: 28, 36, 44, 52, 60, 5, 13, 21, 29, 37, 45, 53, 61, 6, 14, 22, 30,
23:31dwlsalmeida[d]: 38, 46, 54, 62, 7, 15, 23, 31, 39, 47, 55, 63,
23:31dwlsalmeida[d]: ];
23:31dwlsalmeida[d]: so, same pattern I guess..
23:33redsheep[d]: Looks like it, yeah. Swapping the most and least at the halfway point is how it will always look swapping rows with columns
23:33redsheep[d]: At least for a square
23:38avhe[d]: are you sure it's not a business of having to undo the conversion to diagonal order, that leaves you with the matrix transposed
23:42dwlsalmeida[d]: I didn't realize that either ^
23:45dwlsalmeida[d]: btw, the spec says:
23:45dwlsalmeida[d]: ScalingList4x4, ScalingList8x8, ScalingList16x16, and ScalingList32x32 correspond to ScalingList[0], ScalingList[1], ScalingList[2], and ScalingList[3], respectively, as defined in section 7.3.4 of the ITU-T H.265 Specification;
23:45dwlsalmeida[d]: The lists are parsed in diagonal order, so apparently GStreamer is just wrong by converting to raster order before submitting to the driver?
23:45airlied[d]: probably is, I think ffmpeg had same problem