06:05fdobridge: <bylaws> Oh this isn't an impl of the falcon exploit
06:06fdobridge: <bylaws> Sad
06:53fdobridge: <lingm> anholt: yeah, setting `rust-analyzer.linkedProjects` is how you enable rust-analyzer for now. however, warnings and errors still aren't shown inline, so it's only partially working.
16:19gfxstrand: dakr: It doesn't look like we're doing any validation of drm_nouveau_exec_push::va_len and we should be. It's a 64-bit value but it's practically limited to 24 bits with the top bit being used to indicate NO_PREFETCH
16:19gfxstrand: We should at least be validating that the top 40 bits are zero
16:21gfxstrand: I'd send the patch but I'm not sure where you'd want to do that validation. nouveau_exec_ucopy seems like a place or maybe nouveau_exec_job_init()?
16:21gfxstrand: It's annoying because it needs to be validated after we've copied from userspace.
16:22gfxstrand: We should probably also assert that it's a multiple of 4?
16:23gfxstrand: But the size assert is necessary because otherwise we end up truncating to a 32-bit signed integer inside the kernel.
16:32gfxstrand: I think I found a place to put it
16:45gfxstrand: airlied: Is it definitely too late to make UAPI changes?
16:46karolherbst: if nobody reports regressions :P
16:47gfxstrand: :P
16:47gfxstrand: I really hate how NO_PREFETCH is weird implicit UAPI
16:49dakr: gfxstrand: would you want to add a separate field, like split the va_len into two 32bit ones?
16:51gfxstrand: dakr: Yeah, I think that's what we'd do. Have a va_len and flags
16:52gfxstrand: Validate that va_len is at most 23 bits and then have NO_PREFETCH be a flag
16:52gfxstrand: But IDK how much I care. The hardware hasn't changed in forever and we're basically just giving the kernel a thing to shove in the HW.
16:53gfxstrand: But also if that's the way we're looking at it, why not just hand it the one u64 and be done with it? Why split it into two pices at all?
16:53gfxstrand: bah!
16:53gfxstrand: I hate making decisions
16:53gfxstrand: I just hate that we're currently sitting half-way between two sane things.
16:55airlied: gfxstrand: no we can still make them
16:55gfxstrand: So I guess the question is what do we want?
16:55gfxstrand: We could make it "here's an array of u64 to shove in the ring. Have fun!"
16:55gfxstrand: We could make it u64 va, u32 va_size, and u32 flags.
16:56dakr: I think we can probably just keep the 64-bit one, but properly document it.
16:56gfxstrand: We could do that, too
16:56dakr: And put the flag / masks in the uAPI header.
16:57gfxstrand: If the hardware ever does change, though, then we have old hardware encoded in uAPI
16:57dakr: right..
16:57airlied: for referenxe we can change uapi up until 6.6 releases
16:58dakr: On the other hand, how likely is it that the HW changes in this respect?
16:58gfxstrand: If we do "here's an array of u64 to shove in the ring" then it's clearly HW dependent and it's userspace's job to understand that hardware. Given that NVIDIA's FW queues are intended to be driven by hardware anyway, that doesn't actually seem like a bad idea.
16:58gfxstrand: *driven by userspace
16:58dakr: Maybe the 24bit limit expands, but that wouldn't be an issue.
16:58gfxstrand: If we do "va+size+flags" then it's clearly the kernel's job to make that work on any hardware and the only thing that changes from one HW to the next are potential limits.
16:59gfxstrand: And, sure, that leaves us with an arbitrary 32-bit limit on size but, realistically, no one wants 2GB command buffers anyway
17:00gfxstrand: How likely is it that hardware changes? IDK. Given the growth we're seeing in memory sizes, 40 bits for addresses is going to get limiting.
17:01gfxstrand: Are they going to just require command buffers to live in the bottom 40 forever or one day jump to a format that uses 128 bits per pushbuf?
17:02dakr: The latter would require a version bump either way, unless we reserve for that now, right?
17:03gfxstrand: If we change now to va+size+flags, we can handle the later entirely inside the kernel.
17:03gfxstrand: As it is now, we can handle it by NVK knowing its magic semi-HW/semi-UAPI bit moved.
17:04dakr: Maybe I misunderstood that, what's the shift of NO_PREFETCH?
17:04gfxstrand: NO_PREFETCH lives in the top bit of the 64-bit pushbuf HW packet.
17:05gfxstrand: It's 40 bits of address, 23 bits of size, and 1 NO_PREFETCH bit.
17:05gfxstrand: low-to-high
17:05gfxstrand: We set NO_PREFETCH by setting bit 23 of the size that we pass to the kernel and the kernel just hands that through to the HW unvalidated.
17:06gfxstrand: We also aren't validating that the va_start is in-bounds
17:07gfxstrand: We just hand the hardware (va | (va_size << 40)) and hope for the best.
17:08dakr: yeah, maybe we should split it up then.
17:09gfxstrand: And, to be clear, I'm over-all fine with a plan that trusts userspace to not be dumb and trusts the hardware to catch us if we are. The firmware is designed to be driven by userspace. But what we're doing right now isn't really that, either. It's some weird in-between where we pretend to have the kernel know what's going on but then we're hacking around the UAPI to set the NO_PREFETCH bit.
17:10dakr: Right, I agree that's inconsistent.
17:10gfxstrand: I think the most DRM-friendly way to do it would be va+size+flags
17:11dakr: Yep, let's do that.
17:11dakr: Want me to send a patch?
17:12gfxstrand: Please.
17:13gfxstrand: ping me with the patchwork link and I'll make a corresponding NVK MR.
17:13gfxstrand: It's only been a couple of weeks, we can eat the breakage.
17:13dakr: Alright, sure.
17:15gfxstrand: dakr: If we order things va, va_len, flags then I think we'll only actually break pre-Turing
17:15gfxstrand: and only when NO_PREFETCH is used
17:41fdobridge: <georgeouzou> CTS shader-object tests
17:41fdobridge: <georgeouzou> Pass: 62943, Fail: 4, Skip: 179014, Flake: 1, Duration: 1:30, Remaining: 0
17:42fdobridge: <karolherbst🐧🦀> kinda fast
17:42fdobridge: <georgeouzou> i also found a bug in some tests
17:42fdobridge: <karolherbst🐧🦀> shader-objects is this new thing, right?
17:42fdobridge: <georgeouzou> I get more errors on pipeline/shader-object*
17:42fdobridge: <karolherbst🐧🦀> this "only works on nvidia without pain" feature, no?
17:42fdobridge: <georgeouzou> haha yeah
17:42fdobridge: <karolherbst🐧🦀> hehe
17:43fdobridge: <georgeouzou> but it needs more refactoring
17:43fdobridge: <georgeouzou> now i duplicated some code with the pipelines
17:43fdobridge: <karolherbst🐧🦀> right...
17:43fdobridge: <karolherbst🐧🦀> I think from a mesa perspective a driver should be able to implement only shader-objects and everything else layers on top of it, but....
17:43fdobridge: <karolherbst🐧🦀> 🙃
17:44fdobridge: <georgeouzou> I think we can do this on NVK fairly easily
17:44fdobridge: <karolherbst🐧🦀> yeah
17:44fdobridge: <georgeouzou> but the other drivers are going to be a pain
17:44fdobridge: <karolherbst🐧🦀> yep
17:44fdobridge: <karolherbst🐧🦀> I think this can be an optional thing
17:44fdobridge: <karolherbst🐧🦀> and we can prototype this in nvk first
17:48fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> How about GPL?
17:49fdobridge: <georgeouzou> it could go like shader-objects -> GPL -> monolithic-pipelines
18:31fdobridge: <mohamexiety> so what did implementing this involve? didn't expect we'd get this until a fair bit later 😮
18:31fdobridge: <karolherbst🐧🦀> shader-objects basically map 1:1 to nvidia hardware
18:32fdobridge: <karolherbst🐧🦀> the hardware doesn't use monolithic blobs at all
18:32fdobridge: <karolherbst🐧🦀> so you compile one stage and use the result as is, no information of previous/next stages required at all
18:38fdobridge: <gfxstrand> If we'd already had a pipeline on ESO implementation, I would have done ESO from day 1
18:38fdobridge: <mohamexiety> I see. I didn't expect it would be this much easier
18:38fdobridge: <mohamexiety> so like, this is actually a viable path for NVK? overheads and all considered?
18:40fdobridge: <karolherbst🐧🦀> probably
18:42fdobridge: <mohamexiety> interesting
18:46fdobridge: <georgeouzou> We now push less state (more dynamic),
18:46fdobridge: <georgeouzou> with pipelines we pushed grouped state.
18:46fdobridge: <georgeouzou> Yet, I haven't measured the performance difference. I only do validation testing.
18:49fdobridge: <georgeouzou> We now push less state (more dynamic),
18:49fdobridge: <georgeouzou> with pipelines we pushed grouped state, on pipeline switches. (edited)
18:49fdobridge: <mohamexiety> I see I see, thanks
18:50fdobridge: <gfxstrand> Not really. We had a few things we grouped but it wasn't much at all. Most of the state has been going through the dynamic state mechanism which has a bit of CPU overhead but shouldn't be bad.
18:52fdobridge: <georgeouzou> yes, its not much less... but for example there are cases where the fragment state is not pushed again at all and the vertex/geometry shader is only changed.
20:41fdobridge: <airlied> got the results from some of the flakes, a lot of corrupt stencil content, looks tiling related and blank color/depth sometimes, but not seeing evictions happening
20:49fdobridge: <maba_kalox> Question, how stupid is idea to try to port amd implementation?
20:55fdobridge: <gfxstrand> Probably not a great idea. Hi ahead and reference it if you'd like but I doubt you can copy+paste much. Also, IIRC, the AMD one tries to unify stuff more than I'd like.
20:58fdobridge: <maba_kalox> Would you consider Intel implementation to be better reference?
20:58fdobridge: <gfxstrand> Maybe
20:59fdobridge: <gfxstrand> At the end of the day, though, it's just a version of `nvk_CmdDraw()` that loops
21:00fdobridge: <maba_kalox> Okay, probably will poke author of extension, for suggestions 😅
21:05fdobridge: <airlied> nope don't bother extension authors
21:05fdobridge: <airlied> the extension is the ideas
21:07fdobridge: <airlied> though zmike is around, it's likely you should be able to work out how to do it from the existing implementations and the current nvidia draw code, but I doubt we got time to give direct line-by-line pointers
21:16fdobridge: <maba_kalox> Instructions clear.
21:32fdobridge: <gfxstrand> If you start with the extension appendix in the spec, it tells you everything added by the extension. From that and what's already there in nvk_cmd_draw.c, you should be able to get an idea of what to do.
22:14dakr: @gfxstrand: https://gitlab.freedesktop.org/nouvelles/kernel/-/commits/uapi-no-prefetch
23:22gfxstrand: dakr: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24840
23:23gfxstrand: dakr: Your patch looks pretty good.
23:43dakr: @gfxstrand: looks to much better in nvk as well
23:44dakr: @gfxstrand: sent out the patch