13:17djdeath3483[d]: gfxstrand[d]: : any reason that vk_pipeline_precomp_shader has a SHA1 hash rather than blake3?
13:17djdeath3483[d]: gfxstrand[d]: : other than maybe some other drivers are using that, but we could otherwise switch the runtime to use blake3 for its precomp
13:17gfxstrand[d]: Yeah, we could use blake3 for precomp
13:18djdeath3483[d]: cool, thanks
13:18djdeath3483[d]: working on a runtime version of KHR_pipeline_binary
13:18gfxstrand[d]: It's mostly because that's what we use for hash_shader_stage and apparently that's fixed at SHA1 for forefer
13:18djdeath3483[d]: having everything at 32B it nicer to the app I think
13:19djdeath3483[d]: gfxstrand[d]: and you mind if I make a variant of that function for blake3?
13:20djdeath3483[d]: or maybe I should make the sha1 variant use the output of blake3 hashing to generate a sha1 π₯
13:20gfxstrand[d]: A variant is probably fine. I honestly don't remember the details on why drivers want sha1. Something is baked into Proton somewhere.
13:21djdeath3483[d]: module identifier seems to be 32B too
13:21djdeath3483[d]: dunno lol
13:21gfxstrand[d]: <a:shrug_anim:1096500513106841673>
14:45gfxstrand[d]: ahuillet: FYI: Some of the video headers in open-gpu-doc don't build. There's a leftover `#endif` at the end that blows up: https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/video/clc7b7.h#L306
14:45gfxstrand[d]: Looks like it was left over from some of the stripping
14:50gfxstrand[d]: mohamexiety[d]: Can you please finish off the queue alloc MR?
14:50gfxstrand[d]: I'm rebasing dwlsalmeida 's video stuff and it's gonna want that
14:55mohamexiety[d]: ohh woops, sorry. got distracted by the other stuff. will do
15:22mohamexiety[d]: gfxstrand[d]: fixed. running vk api.* cts to verify
16:20mohamexiety[d]: gfxstrand[d]: fixed pushed and rebased. passes api.* cts
16:29mhenning[d]: gfxstrand[d]: fwiw, I'm currently working on exposing transfer-only queues which will probably also be useful for video
16:41gfxstrand[d]: mhenning[d]: Transfer-only queues will be useful in general
16:41gfxstrand[d]: https://www.collabora.com/news-and-blog/news-and-events/mesa-25.2-brings-new-hardware-support-for-nouveau-users.html
16:41gfxstrand[d]: And the MR is merged. π
16:44mhenning[d]: gfxstrand[d]: About this - do we need to land https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36393 before 25.2 releases?
16:45mhenning[d]: I think kepler doesn't currently pass cts on 25.2 or main, iirc
16:48snowycoder[d]: mhenning[d]: Some KeplerA hardware doesn't, and even with that MR there seem to still be occasional faults due to scheduling.
16:49snowycoder[d]: (also, SSAO doesn't work and I have no idea where to begin with thatπ’ )
16:52gfxstrand[d]: mhenning[d]: It's already tagged fixes but yes we should
16:52gfxstrand[d]: snowycoder[d]: SSAO?
16:55snowycoder[d]: gfxstrand[d]: Screen Space Ambient Occlusion seems broken in The Talos Principle and a similar problem seems to occur in pixmark and other games.
16:55snowycoder[d]: This is both on KeplerA and B, with or without scheduling active
16:55gfxstrand[d]: Right
16:55gfxstrand[d]: There's also a bug report about flickering grass in a game
16:55gfxstrand[d]: I asked for a GL apitrace but I haven't seen anything yet
16:55snowycoder[d]: snowycoder[d]: Old Screenshot:
16:55gfxstrand[d]: I suspect we still have an encoding error or two
16:56mhenning[d]: gfxstrand[d]: I'm not worried about it being tagged fixes, I'm worried about us reviewing and landing it in time for the possible release on weds
16:56gfxstrand[d]: Sure
16:58mhenning[d]: snowycoder[d]: You could try getting a renderdoc capture, and then try to pinpoint which shader is going wrong using renderdoc
17:06huntercz122[d]: as always Michael only linked commit
18:34gfxstrand[d]: Yeah...
18:36gfxstrand[d]: I guess I could start putting the blog post link in the commit message. π
18:36gfxstrand[d]: But the reality is that Michael reads commits, not MRs.
19:20gfxstrand[d]: Ugh... gpuinfo.org is down
19:21mhenning[d]: https://vulkan.gpuinfo.org/ works for me right now
19:21gfxstrand[d]: I'm getting a 520
19:21gfxstrand[d]: Weird
19:23gfxstrand[d]: Works if I VPN to the US. Weird.
19:36gfxstrand[d]: dwlsalmeida: Where's that kernel patch I need for video
19:37gfxstrand[d]: nvm. Found it
19:37dwlsalmeida[d]: it's in the MR
19:51gfxstrand[d]: Of course it doesn't apply. π€¦π»βοΈ
19:51gfxstrand[d]: It's okay. I've reworked it
20:22gfxstrand[d]: Okay, I've got stuff rebased and running CTS now.
20:23gfxstrand[d]: Video is a bit broken but I think that's because the image tiling hacks are needed for video but are busted for 3D. I can figure that out. It's just a bit annoying.
20:23gfxstrand[d]: Also, I'm the first person to run any of this code on Blackwell, so...
20:24gfxstrand[d]: And with mohamexiety[d]'s nvk_queue patch, that gets rid of a bunch of the worst of the hacks.
20:25mohamexiety[d]: When this lands we will get HW accel encode/decode via vulkan video right?
20:25gfxstrand[d]: decode
20:25gfxstrand[d]: encode isn't implemented yet
20:26mohamexiety[d]: Ahh fair
20:26mohamexiety[d]: But yeah decode is super good and generally more used anyways
20:26gfxstrand[d]: But dwlsalmeida[d] has H.264, H.265, and AV1 mostly working.
20:27mohamexiety[d]: Yep, really really nice
20:30airlied[d]: encode shouldn't be insane to hook up either, valve used the radv encode for steam on deck
20:32gfxstrand[d]: Yeah. We've had people complaining that OBS streaming doesn't work. I'd like to hook it up eventually.
20:33gfxstrand[d]: But YouTube being able to use your GPU is the first task
20:34esdrastarsis[d]: gfxstrand[d]: with zink video?
20:35airlied[d]: not sure I'll ever get zink video done, my work focus has changed direction a bit π
20:35gfxstrand[d]: I'm sure there's some reason why Zink video is good for AI. π
20:35gfxstrand[d]: Just get creative
20:36gfxstrand[d]: But yeah, we really need to get Zink video working if NVK video is going to be broadly useful.
20:37zmike[d]: I'm planning to finish it before I leave on vacation
20:37zmike[d]: one way
20:37gfxstrand[d]: Should work with gstreamer and anything that uses that so it's not totally useless. And I think VNC and ffmpeg have Vulkan now.
20:37zmike[d]: **or another**
20:37gfxstrand[d]: https://tenor.com/view/one-way-or-another-blondie-deborah-harry-80s-music-im-gonna-getcha-gif-18168957
20:38zmike[d]: it's pretty close, and it might even have been the case that it didn't work last time because of driver bugs
20:38zmike[d]: AMD just swept through and implemented dmabuf handling
20:38mohamexiety[d]: Nicee then that’s promising
20:38zmike[d]: which I had attempted to hack in and then the MR laid fallow for like 2 years
20:39HdkR: zmike will finish it before their one-way vacation :nodders:
20:40HdkR: Will definitely be nice to see vulkan video in more places :)
20:43gfxstrand[d]: airlied[d]: Is there any reason why you added transfer to the video queue? Or just because NVIDIA does? Because the kernel will reject creating the copy subchannel if I ask for it.
20:43airlied[d]: I think I just copied nvidia and because the subchannel should be there
20:44airlied[d]: but I don't think it needs transfer
20:45linkmauve: Oh, is transfer not needed for host->device and device->host copies?
20:45linkmauve: In my Vulkan video driver I’ve had to also lie that I support graphic in my queue for Gstreamer to accept it, and lie that I support compute for ffmpeg to accept it.
20:45linkmauve: I’d like to fix that in both Gstreamer and ffmpeg eventually.
20:46linkmauve: But first, I’m trying to make it work with as much lying as required.
20:47airlied[d]: I thought ffmpeg used separate queues if they were available
20:49gfxstrand[d]: I mean, I can create two separate queues if I need to. It'll just be super annoying to plumb through since we'll have to distinguish per-push
20:49gfxstrand[d]: And we'll have to sync between them
20:53gfxstrand[d]: I wonder if there are tests for this. (Probably not.)
20:53linkmauve: airlied[d], in my case I exclusively have a Vulkan video queue, and no other Vulkan available.
20:54mhenning[d]: If we're allowed to expose video without transfer that sounds preferable
20:54airlied[d]: is that some ARM platform? I've though a vulkan to v4l2 layer would be nice to have
21:00linkmauve: Yes, that’s exactly what I’m writing.
21:01linkmauve: Testing on the Rockchip rk3588 and on the AllWinner A64.
21:01linkmauve: But it should work on any V4L2 stateless decoder.
21:01linkmauve: Currently only doing H.264, but AV1 will be next.
21:02airlied: linkmauve: nice!
21:02linkmauve: I haven’t tested running the CTS just yet, I’ve only implemented enough to get both Gstreamer and ffmpeg to be happy with it.
21:04airlied: linkmauve: is it part of mesa?
21:05linkmauve: No, it’s a full from-scratch Rust thingy.
21:05airlied: I'd consider moving it towards mesa even in rust, just because getting something delivered to distros works much easier :-)
21:06linkmauve: Yeah I know that, but my last attempt at getting something into Mesa didn’t really end up anywhere.
21:06linkmauve: I’m not sure how useful it’d be to integrate it if it reuses no part of Mesa though.
21:07linkmauve: But relying on Mesa’s CI infrastructure and such would be helpful anyway.
21:13dwlsalmeida[d]: gfxstrand[d]: Correction, AV1 is not working at all, but recently poked the right people at nvidia and got a pretty detailed email on how h265 worked
21:13dwlsalmeida[d]: So perhaps we can do the same for av1
21:14gfxstrand[d]: Okay
21:14gfxstrand[d]: I'm running into some CTS fails now. I'll work on debugging them tomorrow. Looks like something funky going on with slots.
21:16dwlsalmeida[d]: These should be working fine
21:16dwlsalmeida[d]: They pass on fluster
21:17dwlsalmeida[d]: Also we should probably figure out why this is 10x slower than my C implementation from 2 years ago
21:17dwlsalmeida[d]: We’ve established it’s not anything Rust related
21:17dwlsalmeida[d]: It’s something in some other part of NVK that I can’t really pinpoint
21:17dwlsalmeida[d]: I do have that old C branch though
21:19mohamexiety[d]: Btw unrelated but this may be interesting if we can leverage it in gfx somehow https://mastodon.social/@never_released/114971653257425075 gfxstrand[d] mhenning[d]
21:19gfxstrand[d]: Yeah, I saw
21:19dwlsalmeida[d]: gfxstrand[d]: ah, I remember that I managed to isolate it to downloading the image from the gpu btw
21:19dwlsalmeida[d]: That’s what’s taking much longer
21:19mohamexiety[d]: Shared memory is the same HW as L1 cache so if we can do that it should be a lot more graceful than spilling to L2/global mem
21:20gfxstrand[d]: It would be interesting to play around with that flag in nvcc and see what it generates
21:27mhenning[d]: mohamexiety[d]: I think we typically hit L1 for spills though?
21:28mohamexiety[d]: Interesting, the changelog calls out that they spilled to L2 but not sure if this is a CUDA thing or just a general HW thing
21:29mhenning[d]: Maybe they're talking about cases where data doesn't fit in L1?
21:30mhenning[d]: That is, I think they spill to L1 too but maybe this is useful if L1 entries get evicted really quickly?
21:31mohamexiety[d]: Yeah that’s plausible too :thonk: