01:59fdobridge: <jekstrand> We're going to have a driver ID vor NVK in the next header update: https://github.com/KhronosGroup/Vulkan-Docs/pull/1983#event-7819126207
02:01fdobridge: <jekstrand> Don't anyone implement KHR_driver_info. There's an MR outstanding for it and I'd like Yusuf to be able to get a patch in.
02:02fdobridge: <jekstrand> I'll bump the header and rebase on Friday, probably.
02:04fdobridge: <alyssarzg> ✨
02:06fdobridge: <karolherbst🐧🦀> :ferrisBongo:
03:28fdobridge: <jekstrand> Gotta love it when an obvious C++ and C feature is still missing from rust...
03:28fdobridge: <jekstrand> https://github.com/rust-lang/rust/issues/76560
03:28fdobridge: <jekstrand> I can't have a const integer generic parameter and do math on it. 😫
03:29fdobridge: <jekstrand> Trying to make isaspec work for Rust and I wanted to have a BitSet<N> struct where N is the number of bits. But I can't divide by 32 to size my array. 😕
03:35mhenning: jekstrand: I'd claim that C doesn't really have that feature either
03:35mhenning: but yeah, some of those types of features have been taking a while
03:36fdobridge: <jekstrand> C doesn't have it officially in the same way as C++ does with constexpr but most C compilers will treat things as constants as long as they can reasonably fold it.
03:36fdobridge: <jekstrand> Then again, C doesn't have generics so...
03:36fdobridge: <jekstrand> But you can do it within a macro and it's generally fine
03:37mhenning: you might be able to do it with a rust macro?
03:39mhenning: yeah, I seem to be able to do `let array: [i32; 64 / 32];` locally
03:40mhenning: so I think the restriction is on type-level integers at the moment
04:10mhenning: jekstrand: Yeah, the macro thing seems to work. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=039959cf3419f562b42b0523498fe9dd
04:10mhenning: although not sure about msrv for that
06:53airlied: Test case 'dEQP-VK.sparse_resources.buffer.transfer.sparse_binding.buffer_size_2_10'.. Pass (Passed)
07:08airlied: dEQP-VK.sparse_resources.image_sparse_binding.* also all passing
07:17airlied: okay sparse residency will require some more thought I expect
07:19airlied: dakr: have to think a bit more about sparse residency and the kernel API
07:19airlied: we have to be able to not fault on read/writes to areas that aren't mapped
07:21airlied: skeggsb_: you might also have considered how to deal with that case, I'll have to look at other impl to see what ideas they had
07:25airlied: fdobridge: <jekstrand> maybe you too
07:29airlied: might need the ability to map a range with no backing, but have it not fault, then map proper bindings over the top of it
08:51fdobridge: <marysaka> I pushed my initial stuffs for the MME Fermi builder as previously discussed https://gitlab.freedesktop.org/marysaka/mesa/-/tree/maxwell-mme
08:52fdobridge: <marysaka> Still quite work in progress but most operations that can be defined compared to tu104 are here
08:53fdobridge: <marysaka> (There is also some commits that move the tu104 mme code to its own subdirectory, I still need to unify the mme_builder interface again but that's a story for another time)
09:13fdobridge: <karolherbst🐧🦀> very cool!
15:31dakr: airlied: wonder if from a UAPI pov it might be enough to just accept binds without a gem handle, mark the address range and map in a zero filled dummy page on fault or so. though, I hope the mmu supports something better than that.
16:37fdobridge: <jekstrand> We do want the ability to do VA range reservations. In particular, the NV page tables make a distinction between fully unbound and unbound but "safely faults". The later is needed for sparse resources so you can detect faults in the shader without taking a full fault that kills the context.
16:38fdobridge: <jekstrand> IDK what happens if you do "normal" shader access or non-shader access to such a memory region. Maybe it returns zero?
16:38fdobridge: <karolherbst🐧🦀> tex instructions have a predicate output for that e.g.
16:39fdobridge: <jekstrand> Generally, though, "map a page on fault" with GPUs is an absurdly hard problem.
16:39fdobridge: <jekstrand> Yeah, I know. The question is what happens if you don't provide a predicate or use a non-sparse form or hit it from vertex fetch or something.
16:39fdobridge: <karolherbst🐧🦀> there is no sparse form afaik
16:39fdobridge: <karolherbst🐧🦀> there is no no-sparse form afaik (edited)
16:40fdobridge: <jekstrand> Ok, so it probably just returns zero or garbage
16:40fdobridge: <karolherbst🐧🦀> there is just a predicate indicating if the access was sparse or not
16:40fdobridge: <jekstrand> Or maybe black
16:40fdobridge: <karolherbst🐧🦀> yeah.. no idea what the actual data will be when loaded
16:40fdobridge: <karolherbst🐧🦀> but you'll know when you load from sparse
16:41fdobridge: <🌺 ¿butterflies? 🌸> At least for the compute side - you have as much time as you need for that.
16:41fdobridge: <🌺 ¿butterflies? 🌸> Includes mapping such a page from disk
16:42fdobridge: <🌺 ¿butterflies? 🌸> Before resuming the task
16:42fdobridge: <jekstrand> In theory, yes. But getting the locking to work out inside the kernel isn't as easy as it sounds.
16:43fdobridge: <🌺 ¿butterflies? 🌸> Oh, I only have details on how UVM does it - never really poked into nouveau
16:43fdobridge: <karolherbst🐧🦀> okay.. LDG also has a flag to indicate sparse memory
16:43fdobridge: <karolherbst🐧🦀> just not LD
16:45fdobridge: <karolherbst🐧🦀> wouldn't be surprised if the register content stays the same
16:45fdobridge: <🌺 ¿butterflies? 🌸> (Interestingly, CUDA doesn’t provide such a thing as far as I know)
16:47fdobridge: <jekstrand> It's mostly a D3D12/Vulkan feature as far as I know.
16:50fdobridge: <🌺 ¿butterflies? 🌸> https://cdn.discordapp.com/attachments/1034184951790305330/1042481757259313172/image0.jpg
16:50fdobridge: <🌺 ¿butterflies? 🌸> Hmmm, it’s in OptiX too.
16:53fdobridge: <🌺 ¿butterflies? 🌸> CUDA sparse textures - 11.1
16:53fdobridge: <🌺 ¿butterflies? 🌸> https://github.com/NVIDIA/optix-toolkit/tree/master/DemandLoading
17:08fdobridge: <🌺 ¿butterflies? 🌸> > The OptiX Demand Loading library allows hardware-accelerated sparse textures to be loaded on demand, which greatly reduces memory requirements, bandwidth, and disk I/O compared to preloading textures. It works by maintaining a page table that tracks which texture tiles have been loaded into GPU memory. An OptiX closest-hit program fetches from a texture by calling library device code that checks the page table to see if the
17:08fdobridge: <🌺 ¿butterflies? 🌸> … ok so this impl isn’t like HMM land at all - but much simpler conceptually