00:21fdobridge: <airlied> I killed nearly all of dd.h
00:22fdobridge: <airlied> like just after we dropped classic
00:22fdobridge: <airlied> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14100
00:36fdobridge: <karolherbst🐧🦀> mhh, and that didn't lead to lower driver overhead or something?
00:36fdobridge: <karolherbst🐧🦀> benchmarking Civilization 5 might figure it out
00:36fdobridge: <karolherbst🐧🦀> end game saves have like 500k gl calls per frame
00:37fdobridge: <karolherbst🐧🦀> benchmarking Civilization 5 might give some results (edited)
00:41fdobridge: <airlied> draw is still indirect, but I think the alternate is a branch which may or may not be better
00:44fdobridge: <karolherbst🐧🦀> well.. most of those calls aren't draws
00:44fdobridge: <karolherbst🐧🦀> but yeah..
00:44fdobridge: <karolherbst🐧🦀> I'd benchmark with heavily CPU bound games, but indirect vs direct calls might not matter all that much overall.. mhh
00:45fdobridge: <karolherbst🐧🦀> maybe with LTO it makes more of a difference
00:45fdobridge: <karolherbst🐧🦀> or even less so
00:49Eighth_Doctor: what is fdobridge?
00:55airlied: bridge to discord
01:06Eighth_Doctor: you have a discord :o
03:07karolherbst: apparently we do
04:25airlied: dakr, jekstrand : okay I have to rethink the new uapi buffer allocs a bit, I think I've tied the non-sparse ones a bit too close to the kernel api
04:25airlied: forcing the over alignment problem
04:57airlied: I expect we'll have to treat buffers rather differently to images (at least images with kind flags)
04:58airlied: and burn some VM space mappings for non-sparse images
05:20fdobridge: <airlied> @gfxstrand so I probably need someone to think this over with, I think device memory should allocate a VMA range then that gets used for buffers and 0 kind images? and images that have a kind and sparse image should allocate a private VMA space and bind the mem bo into it? will this affect aliasing anywhere?
05:22fdobridge: <gfxstrand> Sounds right. Should work.
05:22fdobridge: <airlied> okay I'll go kick that around a bit and see where it ends up
05:23fdobridge: <gfxstrand> Cool
06:28fdobridge: <airlied> okay I've updated the branch, I think it should work properly
14:50fdobridge: <karolherbst🐧🦀> @airlied I might also look at the new UAPI from a CL perspective, because I need proper userptr support and kind of SVM as well
14:50fdobridge: <karolherbst🐧🦀> did you look into those parts already?
15:52fdobridge: <gfxstrand> Allocation shouldn't care about alignment for the most part. It should allocate an integer number of pages but otherwise shouldn't care.
15:52fdobridge: <gfxstrand> @airlied ^^
15:53fdobridge: <gfxstrand> Also, the kernel needs to stop assigning addresses. That's our job. We can align the base address of the memory object to whatever we want.
16:06fdobridge: <karolherbst🐧🦀> how are we going to support that without that being ioctl calling hell? provide a "initial" VM address on bo allocation? do two ioctls every time we allocate new bos or have an ioctl where we can pass a list of bos+addresses? Not sure how much overhead even matters here.
16:06fdobridge: <karolherbst🐧🦀>
16:06fdobridge: <karolherbst🐧🦀> Also.. do we already have any driver doing that completely from userspace?
17:52fdobridge: <gfxstrand> ANV has been assigning its own addresses since forever.
17:53fdobridge: <gfxstrand> Iris has never let the kernel assign addresses.
17:53fdobridge: <gfxstrand> I think the radeon drivers do address assignment in libdrm.
17:53fdobridge: <gfxstrand> As for how, yeah, it means two ioctls to allocate but meh. Vulkan uses sub-allocation so we really shouldn't have bajillions of BOs.
17:54fdobridge: <gfxstrand> And two ioctls aren't bad if your kernel driver is well-written.
17:54fdobridge: <pixelcluster> I think allocating and binding memory is two ioctls for amdgpu as well
17:54fdobridge: <pixelcluster> I think allocating (and binding) memory is two ioctls for amdgpu as well (edited)
17:55fdobridge: <pixelcluster> I think allocating memory with virtual addresses bound to it is two ioctls for amdgpu as well (edited)
18:00anholt: note: for virtgpu-native-context it's nice to be able to allocate and bind in one ioctl, since ioctl round trip time is preposterous.
18:17fdobridge: <gfxstrand> I think we can batch all the binds, just not the allocations, so we can amortize it well enough if that's a problem.
18:32fdobridge: <pixelcluster> I think allocating memory and binding is one ioctl each for amdgpu as well (edited)
18:42fdobridge: <airlied> The new API is all userspace allocated vma
18:42fdobridge: <airlied> Not sure if there was a question there
18:44fdobridge: <airlied> @karolherbst🐧userptr is another can of worms, maybe dakr next thing
18:58fdobridge: <karolherbst🐧🦀> sure, but how'd do you tell the kernel about your allocations
19:00fdobridge: <airlied> with a bind ioctl
19:00fdobridge: <karolherbst🐧🦀> right, and I asked about how that one is designed, if you have to call it for each bo, or can you submit an initial placement when allocating, or submit it in a batches
19:01fdobridge: <karolherbst🐧🦀> or rather what would be the end plan here
19:01fdobridge: <karolherbst🐧🦀> interesting.. because I tried to figure out where exactly that happens, maybe I'm just blind 🙂
19:02fdobridge: <gfxstrand> Search for `vma_heap`
19:02fdobridge: <karolherbst🐧🦀> yeah.. probably a good idea if one can batch it.. could even be smart about it and just collect the changes ... or make it part of the command submission one
19:03fdobridge: <airlied> No we will not be making it part of command submits
19:03fdobridge: <karolherbst🐧🦀> though with userspace command submission one might not want to mix those two anyway
19:03fdobridge: <gfxstrand> Iris tells the kernel addresses on every draw call because i915 is dumb.
19:03fdobridge: <karolherbst🐧🦀> yeah.. that sounds a bit... heavy
19:03fdobridge: <gfxstrand> Yeah, that's not going in command submit.
19:04fdobridge: <gfxstrand> I will nak that so hard...
19:04fdobridge: <airlied> The vm bind ioctl has alloc/free and map/unmap
19:04fdobridge: <airlied> The kernel tree has more documentation
19:04fdobridge: <karolherbst🐧🦀> sure, but that's per bo I assume?
19:04fdobridge: <karolherbst🐧🦀> oh well.. maybe that's fine
19:05fdobridge: <karolherbst🐧🦀> well.. just means we'll probably have to change that later if it becomes a overhead one might be able to optimize away
19:07fdobridge: <gfxstrand> If we need to do that, we can match `VM_BIND` ioctls. Not batch them in with other things. Just batch them by themselves.
19:09fdobridge: <airlied> it's never been a problem on radv, usually if those things are overhead you just cache in userspace if you can
19:09fdobridge: <airlied> at least for GL Drivers
19:10fdobridge: <airlied> we could maybe consider a bo alloc + vma bind combined thing, but I didn't want to distrub the current bo alloc ioctl more than necessary
19:17fdobridge: <gfxstrand> Keep it dumb for now
19:17fdobridge: <airlied> yeah I've seen no reason to change it, if I thought it was important I'd have designed it that way 😛
19:18fdobridge: <airlied> like even batching makes no real sense, since vulkan doesn't really work like that
19:19fdobridge: <airlied> if someone is writing a new GL driver on top of this, then just use the pb bufmgr stuff
19:19fdobridge: <gfxstrand> Vulkan will once we do sparse.
19:19fdobridge: <gfxstrand> vkQueueBindSparse is a batch thing. Within a given sparse bind, we may be binding multiple discontiguous ranges.
19:20fdobridge:<gfxstrand> starts reading up on proc macros in Rust. Help! Somebody save me before I do something stupid!
19:20fdobridge: <airlied> sparse is fully implemented
19:20fdobridge: <airlied> in that branch
19:20fdobridge: <airlied> it even passes all the cts tests
19:21fdobridge: <airlied> except for a couple where the gpu is too slow due to lack of reclocking
19:23fdobridge: <gfxstrand> cool
19:47fdobridge: <gfxstrand> Well, by "fully", I assume you just mean sparse binding, not sparse residency.
19:48fdobridge: <gfxstrand> NIL doesn't have nearly enough helpers for sparse residency yet
19:50fdobridge: <airlied> then clearly magic is fine, since it passes CTS with sparse residency enabled
19:50fdobridge: <airlied> sparseBinding = true
19:50fdobridge: <airlied> sparseResidencyBuffer = true
19:51fdobridge: <airlied> sparseResidencyImage2D = true
19:51fdobridge: <airlied> sparseResidencyImage3D = true
19:51fdobridge: <airlied> sparseResidency2Samples = true
19:51fdobridge: <airlied> sparseResidency4Samples = true
19:51fdobridge: <airlied> sparseResidency8Samples = true
19:51fdobridge: <airlied> sparseResidency16Samples = true
19:51fdobridge: <airlied> sparseResidencyAliased = true
19:52fdobridge: <airlied> maybe it's an accidental pass, would be good to get nil up to speed so we can validate the uapi
19:53fdobridge: <airlied> ah yeah should fill in nvk_GetImageSparseMemoryRequirements2 a bit more 🙂
19:59airlied: dakr, jekstrand : pushed a fix to new uapi, I didn't fully sync the headers
20:00fdobridge: <gfxstrand> I'm not super worried about that.
20:00fdobridge: <gfxstrand> I mean, yeah, it'd be great, but there's a pile of compiler work involved in sparse residency and I don't want to do that in codegen.
22:01fdobridge: <karolherbst🐧🦀> yeah.. we'd have to rework how codegen deals with predicates :/