13:17 djdeath3483[d]: gfxstrand[d]: : any reason that vk_pipeline_precomp_shader has a SHA1 hash rather than blake3?
13:17 djdeath3483[d]: gfxstrand[d]: : other than maybe some other drivers are using that, but we could otherwise switch the runtime to use blake3 for its precomp
13:17 gfxstrand[d]: Yeah, we could use blake3 for precomp
13:18 djdeath3483[d]: cool, thanks
13:18 djdeath3483[d]: working on a runtime version of KHR_pipeline_binary
13:18 gfxstrand[d]: It's mostly because that's what we use for hash_shader_stage and apparently that's fixed at SHA1 for forefer
13:18 djdeath3483[d]: having everything at 32B it nicer to the app I think
13:19 djdeath3483[d]: gfxstrand[d]: and you mind if I make a variant of that function for blake3?
13:20 djdeath3483[d]: or maybe I should make the sha1 variant use the output of blake3 hashing to generate a sha1 πŸ”₯
13:20 gfxstrand[d]: A variant is probably fine. I honestly don't remember the details on why drivers want sha1. Something is baked into Proton somewhere.
13:21 djdeath3483[d]: module identifier seems to be 32B too
13:21 djdeath3483[d]: dunno lol
13:21 gfxstrand[d]: <a:shrug_anim:1096500513106841673>
14:45 gfxstrand[d]: ahuillet: FYI: Some of the video headers in open-gpu-doc don't build. There's a leftover `#endif` at the end that blows up: https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/video/clc7b7.h#L306
14:45 gfxstrand[d]: Looks like it was left over from some of the stripping
14:50 gfxstrand[d]: mohamexiety[d]: Can you please finish off the queue alloc MR?
14:50 gfxstrand[d]: I'm rebasing dwlsalmeida 's video stuff and it's gonna want that
14:55 mohamexiety[d]: ohh woops, sorry. got distracted by the other stuff. will do
15:22 mohamexiety[d]: gfxstrand[d]: fixed. running vk api.* cts to verify
16:20 mohamexiety[d]: gfxstrand[d]: fixed pushed and rebased. passes api.* cts
16:29 mhenning[d]: gfxstrand[d]: fwiw, I'm currently working on exposing transfer-only queues which will probably also be useful for video
16:41 gfxstrand[d]: mhenning[d]: Transfer-only queues will be useful in general
16:41 gfxstrand[d]: https://www.collabora.com/news-and-blog/news-and-events/mesa-25.2-brings-new-hardware-support-for-nouveau-users.html
16:41 gfxstrand[d]: And the MR is merged. πŸ˜„
16:44 mhenning[d]: gfxstrand[d]: About this - do we need to land https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36393 before 25.2 releases?
16:45 mhenning[d]: I think kepler doesn't currently pass cts on 25.2 or main, iirc
16:48 snowycoder[d]: mhenning[d]: Some KeplerA hardware doesn't, and even with that MR there seem to still be occasional faults due to scheduling.
16:49 snowycoder[d]: (also, SSAO doesn't work and I have no idea where to begin with that😒 )
16:52 gfxstrand[d]: mhenning[d]: It's already tagged fixes but yes we should
16:52 gfxstrand[d]: snowycoder[d]: SSAO?
16:55 snowycoder[d]: gfxstrand[d]: Screen Space Ambient Occlusion seems broken in The Talos Principle and a similar problem seems to occur in pixmark and other games.
16:55 snowycoder[d]: This is both on KeplerA and B, with or without scheduling active
16:55 gfxstrand[d]: Right
16:55 gfxstrand[d]: There's also a bug report about flickering grass in a game
16:55 gfxstrand[d]: I asked for a GL apitrace but I haven't seen anything yet
16:55 snowycoder[d]: snowycoder[d]: Old Screenshot:
16:55 gfxstrand[d]: I suspect we still have an encoding error or two
16:56 mhenning[d]: gfxstrand[d]: I'm not worried about it being tagged fixes, I'm worried about us reviewing and landing it in time for the possible release on weds
16:56 gfxstrand[d]: Sure
16:58 mhenning[d]: snowycoder[d]: You could try getting a renderdoc capture, and then try to pinpoint which shader is going wrong using renderdoc
17:06 huntercz122[d]: as always Michael only linked commit
18:34 gfxstrand[d]: Yeah...
18:36 gfxstrand[d]: I guess I could start putting the blog post link in the commit message. πŸ˜‚
18:36 gfxstrand[d]: But the reality is that Michael reads commits, not MRs.
19:20 gfxstrand[d]: Ugh... gpuinfo.org is down
19:21 mhenning[d]: https://vulkan.gpuinfo.org/ works for me right now
19:21 gfxstrand[d]: I'm getting a 520
19:21 gfxstrand[d]: Weird
19:23 gfxstrand[d]: Works if I VPN to the US. Weird.
19:36 gfxstrand[d]: dwlsalmeida: Where's that kernel patch I need for video
19:37 gfxstrand[d]: nvm. Found it
19:37 dwlsalmeida[d]: it's in the MR
19:51 gfxstrand[d]: Of course it doesn't apply. 🀦🏻‍♀️
19:51 gfxstrand[d]: It's okay. I've reworked it
20:22 gfxstrand[d]: Okay, I've got stuff rebased and running CTS now.
20:23 gfxstrand[d]: Video is a bit broken but I think that's because the image tiling hacks are needed for video but are busted for 3D. I can figure that out. It's just a bit annoying.
20:23 gfxstrand[d]: Also, I'm the first person to run any of this code on Blackwell, so...
20:24 gfxstrand[d]: And with mohamexiety[d]'s nvk_queue patch, that gets rid of a bunch of the worst of the hacks.
20:25 mohamexiety[d]: When this lands we will get HW accel encode/decode via vulkan video right?
20:25 gfxstrand[d]: decode
20:25 gfxstrand[d]: encode isn't implemented yet
20:26 mohamexiety[d]: Ahh fair
20:26 mohamexiety[d]: But yeah decode is super good and generally more used anyways
20:26 gfxstrand[d]: But dwlsalmeida[d] has H.264, H.265, and AV1 mostly working.
20:27 mohamexiety[d]: Yep, really really nice
20:30 airlied[d]: encode shouldn't be insane to hook up either, valve used the radv encode for steam on deck
20:32 gfxstrand[d]: Yeah. We've had people complaining that OBS streaming doesn't work. I'd like to hook it up eventually.
20:33 gfxstrand[d]: But YouTube being able to use your GPU is the first task
20:34 esdrastarsis[d]: gfxstrand[d]: with zink video?
20:35 airlied[d]: not sure I'll ever get zink video done, my work focus has changed direction a bit πŸ˜›
20:35 gfxstrand[d]: I'm sure there's some reason why Zink video is good for AI. πŸ˜›
20:35 gfxstrand[d]: Just get creative
20:36 gfxstrand[d]: But yeah, we really need to get Zink video working if NVK video is going to be broadly useful.
20:37 zmike[d]: I'm planning to finish it before I leave on vacation
20:37 zmike[d]: one way
20:37 gfxstrand[d]: Should work with gstreamer and anything that uses that so it's not totally useless. And I think VNC and ffmpeg have Vulkan now.
20:37 zmike[d]: **or another**
20:37 gfxstrand[d]: https://tenor.com/view/one-way-or-another-blondie-deborah-harry-80s-music-im-gonna-getcha-gif-18168957
20:38 zmike[d]: it's pretty close, and it might even have been the case that it didn't work last time because of driver bugs
20:38 zmike[d]: AMD just swept through and implemented dmabuf handling
20:38 mohamexiety[d]: Nicee then that’s promising
20:38 zmike[d]: which I had attempted to hack in and then the MR laid fallow for like 2 years
20:39 HdkR: zmike will finish it before their one-way vacation :nodders:
20:40 HdkR: Will definitely be nice to see vulkan video in more places :)
20:43 gfxstrand[d]: airlied[d]: Is there any reason why you added transfer to the video queue? Or just because NVIDIA does? Because the kernel will reject creating the copy subchannel if I ask for it.
20:43 airlied[d]: I think I just copied nvidia and because the subchannel should be there
20:44 airlied[d]: but I don't think it needs transfer
20:45 linkmauve: Oh, is transfer not needed for host->device and device->host copies?
20:45 linkmauve: In my Vulkan video driver I’ve had to also lie that I support graphic in my queue for Gstreamer to accept it, and lie that I support compute for ffmpeg to accept it.
20:45 linkmauve: I’d like to fix that in both Gstreamer and ffmpeg eventually.
20:46 linkmauve: But first, I’m trying to make it work with as much lying as required.
20:47 airlied[d]: I thought ffmpeg used separate queues if they were available
20:49 gfxstrand[d]: I mean, I can create two separate queues if I need to. It'll just be super annoying to plumb through since we'll have to distinguish per-push
20:49 gfxstrand[d]: And we'll have to sync between them
20:53 gfxstrand[d]: I wonder if there are tests for this. (Probably not.)
20:53 linkmauve: airlied[d], in my case I exclusively have a Vulkan video queue, and no other Vulkan available.
20:54 mhenning[d]: If we're allowed to expose video without transfer that sounds preferable
20:54 airlied[d]: is that some ARM platform? I've though a vulkan to v4l2 layer would be nice to have
21:00 linkmauve: Yes, that’s exactly what I’m writing.
21:01 linkmauve: Testing on the Rockchip rk3588 and on the AllWinner A64.
21:01 linkmauve: But it should work on any V4L2 stateless decoder.
21:01 linkmauve: Currently only doing H.264, but AV1 will be next.
21:02 airlied: linkmauve: nice!
21:02 linkmauve: I haven’t tested running the CTS just yet, I’ve only implemented enough to get both Gstreamer and ffmpeg to be happy with it.
21:04 airlied: linkmauve: is it part of mesa?
21:05 linkmauve: No, it’s a full from-scratch Rust thingy.
21:05 airlied: I'd consider moving it towards mesa even in rust, just because getting something delivered to distros works much easier :-)
21:06 linkmauve: Yeah I know that, but my last attempt at getting something into Mesa didn’t really end up anywhere.
21:06 linkmauve: I’m not sure how useful it’d be to integrate it if it reuses no part of Mesa though.
21:07 linkmauve: But relying on Mesa’s CI infrastructure and such would be helpful anyway.
21:13 dwlsalmeida[d]: gfxstrand[d]: Correction, AV1 is not working at all, but recently poked the right people at nvidia and got a pretty detailed email on how h265 worked
21:13 dwlsalmeida[d]: So perhaps we can do the same for av1
21:14 gfxstrand[d]: Okay
21:14 gfxstrand[d]: I'm running into some CTS fails now. I'll work on debugging them tomorrow. Looks like something funky going on with slots.
21:16 dwlsalmeida[d]: These should be working fine
21:16 dwlsalmeida[d]: They pass on fluster
21:17 dwlsalmeida[d]: Also we should probably figure out why this is 10x slower than my C implementation from 2 years ago
21:17 dwlsalmeida[d]: We’ve established it’s not anything Rust related
21:17 dwlsalmeida[d]: It’s something in some other part of NVK that I can’t really pinpoint
21:17 dwlsalmeida[d]: I do have that old C branch though
21:19 mohamexiety[d]: Btw unrelated but this may be interesting if we can leverage it in gfx somehow https://mastodon.social/@never_released/114971653257425075 gfxstrand[d] mhenning[d]
21:19 gfxstrand[d]: Yeah, I saw
21:19 dwlsalmeida[d]: gfxstrand[d]: ah, I remember that I managed to isolate it to downloading the image from the gpu btw
21:19 dwlsalmeida[d]: That’s what’s taking much longer
21:19 mohamexiety[d]: Shared memory is the same HW as L1 cache so if we can do that it should be a lot more graceful than spilling to L2/global mem
21:20 gfxstrand[d]: It would be interesting to play around with that flag in nvcc and see what it generates
21:27 mhenning[d]: mohamexiety[d]: I think we typically hit L1 for spills though?
21:28 mohamexiety[d]: Interesting, the changelog calls out that they spilled to L2 but not sure if this is a CUDA thing or just a general HW thing
21:29 mhenning[d]: Maybe they're talking about cases where data doesn't fit in L1?
21:30 mhenning[d]: That is, I think they spill to L1 too but maybe this is useful if L1 entries get evicted really quickly?
21:31 mohamexiety[d]: Yeah that’s plausible too :thonk: