01:12 airlied[d]: Page table entries being wrong was my other thought though I've chased that rabbit down a few holes before to no avail
03:02 cubanismo[d]: Yeah, I haven't thought hard about it in a long time. I'd go with what works
03:02 orowith2os[d]: did someone mention having corruption problems with Discord in here before? I recall something like that.
03:02 orowith2os[d]: (Discord on the web, at least. In Firefox.)
03:02 orowith2os[d]: er, rather, just small glitches here n there.
03:03 cubanismo[d]: It just worries me I'm forgetting something or we're getting lucky on timing in the proprietary driver.
03:03 orowith2os[d]: mangodev[d]: found you.
03:03 orowith2os[d]: Hi! Happening here on RADV too.
03:03 orowith2os[d]: experienced it opening an image popup.
03:12 mangodev[d]: orowith2os[d]: D:
03:12 mangodev[d]: so it may just be a mesa thing right now?
03:12 mangodev[d]: or zink?
03:12 orowith2os[d]: probably zink.
03:12 mangodev[d]: something funky is that it only happens in quadrants of the viewport
03:12 mangodev[d]: and some things are exempt?
03:13 mangodev[d]: e.g. on youtube, video thumbnails disappear, except for the area affected by `border-radius`
03:13 orowith2os[d]: I can consistently get it along the right of one image, rarely anywhere else (but it DOES happen)
03:14 orowith2os[d]: haven't tested too many other sites; my college stuff seems to be okay, so it hasn't affected me very much.
03:15 mangodev[d]: orowith2os[d]: strange
03:15 mangodev[d]: for me on nvk, happens very frequently on firefox
03:16 mangodev[d]: in a variety of places
03:16 mangodev[d]: but most often
03:16 mangodev[d]: - images (on dark mode backgrounds???)
03:16 mangodev[d]: - anywhere with an element with `backdrop-filter: blur()` visible
03:16 mangodev[d]: - anywhere with a good amount of translucency
03:17 orowith2os[d]: would be interesting to try hacking together a basic webpage that has everything to reproduce it.
03:17 orowith2os[d]: I can reproduce when clicking the image I fw'ed under light mode.
03:18 mangodev[d]: i said the dark mode thing because it's very apparent on discord's blog
03:18 mangodev[d]: but it seems to mostly disappear in light mode
03:18 mangodev[d]: in dark mode, the images flicker like crazy
03:19 mangodev[d]: something else to note is that the flicker sticks to one corner per tab, and whether the quadrant is clipped or not depends on scroll position… maybe some type of imprecision?
04:28 airlied[d]: gfxstrand[d]: okay finally got a few mins, the storm + membar fix does seem to be the best combo
04:56 gfxstrand[d]: Then let's go with that. If you want to post the patches, I can try my best to review the storm patch tomorrow (though IRQs really aren't my area).
04:57 gfxstrand[d]: I think this is going to fix a lot of the random hangs and faults users are seeing in the wild.
05:12 gfxstrand[d]: orowith2os[d]: mangodev[d] I think we still have synchronization issues in Zink. Likely the same issues that make non-kopper WSI a mess. Just most apps never not them. Web browsers, however, have an annoying habit of building their own compositor frameworks out of EGLImage and friends and abusing the shit out of it.
05:14 orowith2os[d]: I'm surprised zink is this flawless, though. I've been daily driving it for months now, and not a single big deal breaker. It's amazing, even with these small quirks ;)
05:14 orowith2os[d]: I think the latest problem I had was thunderbird not starting.
05:15 orowith2os[d]: I need to go see if my nixos install should be switched over, though.
05:15 orowith2os[d]: I don't remember if it was just apps, or everything.
05:18 gfxstrand[d]: IDK if I'd say "flawless". It feels like every few months I go beat in it with hammers until I fix more synchronization bugs. I do feel like we're finally reaching the long trail, though.
05:19 gfxstrand[d]: Like, most stuff working except some flickering in Firefox and Chrome is pretty amazing.
05:19 orowith2os[d]: minimally flawed?
05:20 redsheep[d]: Is the long trail a figure of speech that I somehow missed? Google says it's a trail in vermont 😂
05:20 gfxstrand[d]: But synchronization is hard, especially when you have to implement GL's implicit "just do stuff; it'll work" synchronization on top of Vulkan.
05:21 gfxstrand[d]: redsheep[d]: I meant "long tail". That might be more Googleable.
05:22 redsheep[d]: Oh that makes a lot more sense. Normal distributions and expotential decay and all that.
05:22 gfxstrand[d]: Yup
05:23 gfxstrand[d]: Like, there are always bugs. All software has them. But if you beat on it long enough and manage to fix more than you regress, eventually it gets less and less.
05:24 redsheep[d]: I thought we had established that NVK and Zink have no bugs
05:24 gfxstrand[d]: Lol
05:26 gfxstrand[d]: Hopefully in another few releases, we'll have fixed enough of the bugs that other driver teams will start eyeing Zink and looking at their GL drivers as tech debt they'd rather be rid of. I don't expect the mobile drivers to switch any time soon but Iris and radeonsi could be on the chopping block.
05:27 gfxstrand[d]: We've been talking about this as a this that'd be neat for as long as Vulkan has been around but it's taken a while to get there.
05:27 redsheep[d]: I'll have to get things spun back up to test it all again. Those pesky sync bugs in zink and the kernel, and some in nvk, have been the bane of the desktop pretty much since the beginning of that being possible to set up
05:28 gfxstrand[d]: Pull the top two patches from my nvk kernel tree. They're starting to look like magic for fixing NVK sync.
07:21 snowycoder[d]: gfxstrand[d]: That will surely make the phoronix comments angry
07:22 chikuwad[d]: <a:chopitall:1007465260397121566>
07:29 chikuwad[d]: gfxstrand[d]: with address binding report, do you want me to address all the comments or trash everything and report in nvkmd_dev_alloc_mem() and wherever we free it?
09:00 ermine1716[d]: gfxstrand[d]: Does it mean it would be unwise to consider iris/radeonsi for contribution?
11:01 karolherbst[d]: Okay
11:01 karolherbst[d]: I think I found another bottleneck
11:01 karolherbst[d]: I was doing testing for cl_khr_exeternal_semaphore on zink.. on anv: `real 0m0,199s`, on nvk: `real 0m2,015s`
11:02 karolherbst[d]: all this test is doing is to import/export semaphores between OpenCL and vulkan
11:02 karolherbst[d]: Not sure what exactly the bottle neck is... but it is a lot slower on nvk
11:03 karolherbst[d]: with the `cl_khr_semaphores` tests the difference is a lot smaller...
11:03 mohamexiety[d]: did you try this testing with that MR?
11:03 mohamexiety[d]: the compute one
11:03 karolherbst[d]: ohh.. good point
11:04 karolherbst[d]: tho I don't think this test actually launches compute jobs? but maybe it does
11:04 karolherbst[d]: nah, still super slow
11:05 karolherbst[d]: mohamexiety[d]: on which subchan are semaphores signaled and waited on?
11:05 karolherbst[d]: or is that all done in the kernel?
11:07 mohamexiety[d]: I am not sure, I just thought to ask since it was something opencl related it would probably have been all compute
11:08 karolherbst[d]: well.. compute only really matters for launching actual kernels
11:08 karolherbst[d]: most of the API is just copying things around 😄
11:08 marysaka[d]: might be worth doing a NVK_DEBUG=push_dump
11:09 karolherbst[d]: https://gist.githubusercontent.com/karolherbst/bdceee62c5ca5909331aa96d132b7976/raw/bee1fef95fda34eb682b16dc266bd018f0d7a294/gistfile1.txt
11:10 karolherbst[d]: or maybe it's just creating new channels between tests? dunno
11:11 karolherbst[d]: though I am smart and do reuse pipe_contexts 😄
11:11 karolherbst[d]: ehh `mthd 0118 unknown method`
11:11 karolherbst[d]: I broke it apparently
11:31 marysaka[d]: that's werid huh
11:38 mohamexiety[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13795 does the HW have native int bitfield extract?
11:42 karolherbst[d]: correct
11:42 karolherbst[d]: it used to tho
11:45 karolherbst[d]: there is a `BMSK` instruction tho to create a bitfield mask
11:53 mohamexiety[d]: well the test does pass on NV drivers so I guuuuuesss there's room for improvement here on our end
11:53 karolherbst[d]: well...
11:53 karolherbst[d]: `BMSK` would return a 0 mask
11:54 karolherbst[d]: but anyway
11:54 karolherbst[d]: could check codegen on nvidia, but I doubt it's _that_ much better
11:54 karolherbst[d]: though could always make the UB less weird
12:33 gfxstrand[d]: snowycoder[d]: Oh, they'll have a field day no matter what
12:34 gfxstrand[d]: ermine1716[d]: No, they'll probably be around for a while yet. But they may be on the chopping block eventually.
12:35 gfxstrand[d]: karolherbst[d]: Kernel
12:37 gfxstrand[d]: chikuwad[d]: Probably nvkmd. Though we need to audit what objects were passing in to log_obj so that we don't end up logging memory to the wrong thing.
12:38 phomes_[d]: I am still not seeing any of the gnome-shell timeouts anymore. Does it make sense that interacting with firefox and games would result in a timeout from the gnome-shell process?
12:40 gfxstrand[d]: Shouldn't
12:41 gfxstrand[d]: karolherbst[d]: I think we can go from 5 to 3 instructions.
12:41 phomes_[d]: It was reproducing before by scrolling in some youtube shorts in firefox. And the suddenly: `kernel: nouveau 0000:01:00.0: gnome-shell[6765]: job timeout, channel 12 killed!`
12:42 karolherbst[d]: gfxstrand[d]: with BMSK?
12:42 karolherbst[d]: probably
12:42 gfxstrand[d]: phomes_[d]: Yeah, I think that was our kernel bug
12:45 gfxstrand[d]: karolherbst[d]: Yeah, it could be bmsk, plop, shf.
12:45 karolherbst[d]: there is also UBMSK, so that's good
12:46 karolherbst[d]: it's basically ((1 << b) - 1) << a
12:46 gfxstrand[d]: Unless the number of bits out shift isn't an immediate in which case we may need a prmt, too
12:46 gfxstrand[d]: Because I think bmsk takes those as a single packed source.
12:46 karolherbst[d]: BMSK can clamp or wrap
12:47 gfxstrand[d]: Still better than 5 instructions, though.
12:47 karolherbst[d]: nah
12:47 karolherbst[d]: bmsk takes real sources
12:47 karolherbst[d]: a is a reg
12:47 gfxstrand[d]: Okay, cool
12:47 karolherbst[d]: b can be whatever
12:48 karolherbst[d]: might be worth wiring up as a real nir thing to optimize some shaders? might help
12:48 gfxstrand[d]: Also, bfi can be 3: shf, bmsk, plop3.
12:48 karolherbst[d]: yeah..
12:49 gfxstrand[d]: Because plop3 can do both the mask and and the OR
12:49 karolherbst[d]: right
12:50 gfxstrand[d]: karolherbst[d]: There is a nir thing but I'm not sure it's exactly what we want. We'll have to evaluate that.
12:50 karolherbst[d]: there is? haven't seen it.. at least it's not called "mask" 😄
12:56 chikuwad[d]: gfxstrand[d]: hmmm, okay
12:56 chikuwad[d]: I'll see what I can do :froge:
12:57 gfxstrand[d]: karolherbst[d]: Bfm
12:57 karolherbst[d]: ah
12:57 gfxstrand[d]: chikuwad[d]: We still need to report on images and buffers, though.
12:57 karolherbst[d]: yeah.. looks like it
12:58 karolherbst[d]: looks like it's wrapping BMSK
12:58 chikuwad[d]: gfxstrand[d]: yeah
12:59 chikuwad[d]: gotta report on all vulkan objects
13:01 karolherbst[d]: I shouldn't distract myself and land ldsm 😄
15:16 gfxstrand[d]: Oh, come on. You know you want to hack on bitfield ops. I'm sure you can come up with some reason why those are important for AI.
15:19 mohamexiety[d]: remember, you dont know where the bottleneck is now. we took care of all the easy things. it could be _anywhere_ :oaml:
15:21 karolherbst[d]: gfxstrand[d]: 😄
15:22 karolherbst[d]: here is the current version of the shader, if you find places to use it 😛 https://gist.github.com/karolherbst/c99cb27f6560b293fd45b7e5880d098f
15:22 karolherbst[d]: actually...
15:24 karolherbst[d]: yeah well.. mhh
15:24 karolherbst[d]: doubt
17:34 karolherbst[d]: checked how much of an impact all the opts have in my local branch: https://gist.githubusercontent.com/karolherbst/847843578a762d43606388e8be955c85/raw/add64e71294133b475ecba17a299f7f702c1f2dc/gistfile1.txt
17:35 karolherbst[d]: the load store vectorization helps a bit...
17:35 karolherbst[d]: and my opt_barrier hacks 😭
17:35 karolherbst[d]: getting rid of useless barriers apparently is the biggest improvement right after the compute MME stuff
17:36 karolherbst[d]: maybe I focus on that.. shouldn't be too terrible lol
17:38 karolherbst[d]: yeah.. it's like 5%
17:38 karolherbst[d]: mhhh
17:38 karolherbst[d]: it's one of those things that hides other improvements annoying..
17:39 karolherbst[d]: like on top of all the other patches it's like +10%, but on top of the compute MME stuff it's more like 6%
17:41 karolherbst[d]: maybe I should take a look at the push buffer and see if I can spot more easy 3D/compute opts
17:42 karolherbst[d]: mhhh actually
17:42 karolherbst[d]: let me try the subc stuff again.. combined with the compute MME it should help...
17:44 asdqueerfromeu[d]: I had to remove the NVIDIA Vulkan ICD in order for SDDM to not freeze
17:44 karolherbst[d]: mhh not really..
17:44 karolherbst[d]: yeah...
17:44 karolherbst[d]: it's nvidia being buggy
17:44 karolherbst[d]: just file bugs against them
17:45 asdqueerfromeu[d]: Also the R570 GSP doesn't solve the HDMI audio issues for me (so it might actually be some broken nouveau code instead)
17:51 karolherbst[d]: mhhh
17:52 karolherbst[d]: okay I see `NVC797_INVALIDATE_SAMPLER_CACHE_NO_WFI` and `NVC797_INVALIDATE_SHADER_CACHES_NO_WFI` being executed on 3D still...
17:53 karolherbst[d]: looks like something calls `nvk_cmd_buffer_begin_graphics` without reason?
17:54 mhenning[d]: I think we unconditionally begin_graphics at the beginning of the command buffer
17:55 karolherbst[d]: doesn't help that I'm seeing those `unknown method` things as well..
17:55 gfxstrand[d]: Yeah. We don't have compute-only yet
17:56 mhenning[d]: Yeah, compute-only is on my todo list after some of the current MRs land
17:56 gfxstrand[d]: mhenning[d]: , airlied[d] If the sync patches from Dave and I fix sync for copy queues, what are your thoughts on landing copy queue support? I kinda don't want to land it in NVK if all known shipping kernels are busted. But I also don't know how to gate it on a bugfix that we plan to backport.
17:57 karolherbst[d]: doesn't help in any case
17:57 karolherbst[d]: and it's the only 3D stuff left
17:57 karolherbst[d]: soooo...
17:57 gfxstrand[d]: gfxstrand[d]: Just ship the kernel patches, wait 3 months, and land the NVK bits and hope people got updates in that time?
17:58 gfxstrand[d]: Add a `NOUVEAU_GETPARAM_SYNC_ISNT_BUSTED`?
17:58 mhenning[d]: Yeah, I'm not really sure
17:58 karolherbst[d]: the other thing marysaka[d] wanted to look into, but I could do that as well... uploading the QMDs via `SET_INLINE_QMD`
17:58 gfxstrand[d]: I'm not coming up with a lot of great options here. 😂
17:59 karolherbst[d]: ehh `LOAD_INLINE_QMD_DATA`
17:59 karolherbst[d]: atm it's a host memcpy, right?
17:59 asdqueerfromeu[d]: gfxstrand[d]: Including 6.16?
18:00 mhenning[d]: asdqueerfromeu[d]: Yeah, patches haven't even hit the list yet
18:00 karolherbst[d]: yeah... using `LOAD_INLINE_QMD_DATA` should help a lot here...
18:00 mhenning[d]: but 6.16 will probably get it as a backport
18:01 mhenning[d]: gfxstrand[d]: I guess another option would be to gate transfer queues on 6.17+ even though that isn't strictly required
18:01 mhenning[d]: but yeah, I'm not sure. none of the options are great
18:07 mhenning[d]: actually maybe that isn't too bad. bump the interface version in 6.17 and require that for transfer, backport the fixes but not the version bump
18:14 karolherbst[d]: mhhh...
18:14 karolherbst[d]: the hw throws `SKEDCHECK22_INVALIDATE_ACTIVE_QMD`
19:03 karolherbst[d]: mhhh.. managed to get it working with the old `SEND_SIGNALING_PCAS_B` method...
19:09 gfxstrand[d]: mhenning[d]: Yeah, that's probably a decent plan.
19:11 karolherbst[d]: okay.. so those tests do launch 60 compute shaders per sub-benchmark..
19:11 karolherbst[d]: which also means 60 qmd uploads
19:12 karolherbst[d]: do we have a qmd dump env variable thingy?
19:12 gfxstrand[d]: no
19:12 karolherbst[d]: I wonder if those are actually different in any significant way
19:12 gfxstrand[d]: We don't even have a struct dumper
19:13 karolherbst[d]: mhh
19:13 gfxstrand[d]: We have a struct_parser.py but it's just for dumping rust files.
19:13 gfxstrand[d]: Wouldn't be too hard to add a parser to it, though.
19:13 karolherbst[d]: I wonder if I can reuse the same qmd memory thing and just update parts through the push buffer and get more perf
19:14 gfxstrand[d]: <a:shrug_anim:1096500513106841673>
19:14 karolherbst[d]: so I managed to ditch the memcpy at least
19:14 gfxstrand[d]: Did it help?
19:14 karolherbst[d]: unclear.. still have to figure out how it all works
19:15 karolherbst[d]: I suspect if you switch to a new qmd the hw invalidates everything anyway
19:15 gfxstrand[d]: AFAICT, the main benefit to inline QMDs is being able to decode them and the fact that the command streamer is really good at pulling stuff across PCI.
19:15 karolherbst[d]: right
19:15 gfxstrand[d]: karolherbst[d]: Yeah, I've never seen the blob push a partial QMD
19:16 karolherbst[d]: but was it using LOAD_INLINE_QMD_DATA? (or well any other push buffer based upload mechanism)
19:16 gfxstrand[d]: gfxstrand[d]: But now that we're placing them in VRAM, I think most of the GPU runtime benefit is already there.
19:16 gfxstrand[d]: karolherbst[d]: The blob does, yeah.
19:16 karolherbst[d]: I mean.. it helps if you reuse the same thing over and over again I figure
19:16 karolherbst[d]: probably
19:16 gfxstrand[d]: Not sure
19:17 karolherbst[d]: for now I want to check how different the QMD is
19:17 gfxstrand[d]: Depends on how the caching and state tracking works.
19:17 karolherbst[d]: if it's 60 times the same one...
19:17 karolherbst[d]: then I want to see what happens if it's only uploaded once
19:17 gfxstrand[d]: Fair
19:17 karolherbst[d]: it's like 60 compute dispatches back to back
19:17 gfxstrand[d]: Yeah, if it's the same QMD a bunch of times, we might as well use the same data
19:18 karolherbst[d]: hence me wanting to dump it 😄
19:20 gfxstrand[d]: Yeah, I'd love to have a struct dumper and a way to dump some stuff like QMDs. I'd also love to dump texture and sampler headers if we could figure out how when everything is bindless.
19:22 karolherbst[d]: uhhhh
19:22 karolherbst[d]: 2x perf
19:22 karolherbst[d]: 🙃
19:22 karolherbst[d]: so yeah...
19:22 karolherbst[d]: I uhm.. used a static variable to only upload it once
19:23 karolherbst[d]: and apparently the first sub-benchmark passes validation even
19:23 karolherbst[d]: the second fails
19:23 karolherbst[d]: but yeah...
19:23 karolherbst[d]: ~52TFLops -> 95TFlops for the first one
19:23 karolherbst[d]: let's write this up properly to figure out if it's real or not
19:24 karolherbst[d]: they change the workgroup size between tests
19:24 karolherbst[d]: and I think the shader also changes
19:29 airlied[d]: gfxstrand[d]: mhenning[d] we should land transfer queues behind an env var now I think, but I'm not against adding an getparam, it's just annoying as it's two fixes needed
19:31 karolherbst[d]: okay.. but it does impact perf a lot
19:31 karolherbst[d]: yeah, I need to write up a proper patch there
19:32 mhenning[d]: mhenning[d]: airlied[d] do you prefer a getparam over this?
19:33 airlied[d]: Interface version change means never benefitting from backported fixes
19:33 airlied[d]: And both the sync bugs are needed fixes
19:36 mhenning[d]: right, the sync bug fixes would be backported, only the transfer queue stuff would be gated on the interface version bump
19:43 gfxstrand[d]: airlied[d]: Where should these fence patches be sent? drm-misc-fixes?
19:43 airlied[d]: Yes need to add fixes and cc stable tags
19:43 gfxstrand[d]: airlied[d]: Also, is there any reason why you only added the dtor to ga100 and not ga102?
19:44 airlied[d]: I will recheck today, and send them both out with all tags
19:44 gfxstrand[d]: (IDK why they're separate. As far as I can tell, we should just make everything ga100 and delete ga102)
19:46 gfxstrand[d]: I'm trying to grok this IRQ storm patch so I can RB it once it hits the list
19:47 airlied[d]: At one point I said userspace, I meant DRM driver
19:55 gfxstrand[d]: I'm so looking forward to this bugfix landing. This damn thing has been plaguing us for so long...
19:57 gfxstrand[d]: airlied[d]: I think this is a good plan for now. mhenning[d] You okay with rebasing your MR and adding the ENV var? That way at least phomes_[d] and folks have it for testing and we can get a feel for when and where it helps. We can replace the ENV var with a kernel check once we've done the API bump.
19:57 gfxstrand[d]: Or maybe we just drop the checks alltogether in 6 months.
19:59 gfxstrand[d]: And once we land !37016, we can start exposing a compute queue on AmpereB+
20:26 mhenning[d]: gfxstrand[d]: you mean https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37016 and https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36823 and probably some other fixes
20:28 mhenning[d]: gfxstrand[d]: yeah, can do. want to review it?
20:34 gfxstrand[d]: mhenning[d]: MME for sure because that's the thing causing us to enable the 3D subc on compute queues.
20:34 gfxstrand[d]: mhenning[d]: Yeah, I can stick it in my short queue.
20:36 mhenning[d]: gfxstrand[d]: right, but I'm saying mme isn't enough
20:40 gfxstrand[d]: We need subc switching as well?
20:42 gfxstrand[d]: I guess we probably need to sort out a bunch of that barrier stuff either way. 😕
20:45 mhenning[d]: gfxstrand[d]: yes, we need to stop putting some of those barriers on the graphics queue. Paving the way for compute-only is actually the main reason I wrote that MR
20:55 airlied[d]: okay patches on the list
20:57 airlied[d]: or at least once RH outgoing smtp server does its job
20:58 gfxstrand[d]: mhenning[d]: That's fair. We'll need to give Blackwell a good hard think, though. I'm starting to get some ideas of how we can model it but my brain's not working great today.
20:59 gfxstrand[d]: I think I'm done trying to merge patches today. CI seems pretty hosed. 😭
21:02 ermine1716[d]: Time for tea
21:05 mohamexiety[d]: about blackwell actually
21:05 mohamexiety[d]: does anyone else find it weird how the compute MME MR got a 10x uplift on blackwell? :thonk:
21:05 mohamexiety[d]: like
21:06 mohamexiety[d]: I thought the subc switches were gone entirely. but clearly something isn't correct with that assumption
21:07 mohamexiety[d]: or is this controlled by the kernel and we're still doing it even when it's not needed?
21:08 karolherbst[d]: okay. mhh.. the QMDs are identical except a single address... and that's the first const buffer
21:16 karolherbst[d]: mhhhh
21:16 karolherbst[d]: I bet it's the root desc
21:17 karolherbst[d]: I have an iadea
21:18 karolherbst[d]: *idea
21:18 snowycoder[d]: For someone who doesn't know the hardware, what's the root desc?
21:18 karolherbst[d]: it's not a hardware thing
21:18 karolherbst[d]: it's just a buffer with stuff the shader pulls
21:19 karolherbst[d]: metadata basically
21:19 karolherbst[d]: metadata like addresses of other buffers or... image handles or other random things
21:19 karolherbst[d]: sooo...
21:20 karolherbst[d]: we could make the QMD identical in each launch, but we'd need to reuse the same buffer for 1. the root desc and 2. the qmd
21:20 karolherbst[d]: VA, not buffer
21:20 karolherbst[d]: root desc and qmd can be updated through pushbuffers instead of memcpy
21:20 mhenning[d]: yeah, root desc is how we find everything in memory
21:21 karolherbst[d]: maybe I manage to prototype it all out tomorrow, but I'd have to keep buffers/sub allocs around somewhere and not sure where exactly...
21:22 karolherbst[d]: maybe allocate storage for a root desc + qmd per shader?
21:22 karolherbst[d]: the qmd is trivially small
21:22 karolherbst[d]: root desc isn't huge either
21:22 karolherbst[d]: compared to the shader it probably doesn't even matter
21:23 karolherbst[d]: yeah... that's going to be interesting to see if the perf indeed increases by a significant amount or not.
21:42 karolherbst[d]: anybody else want to review https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36841 ? Otherwise I'll CTS is and merge it
21:42 karolherbst[d]: tomorrow or so
21:42 karolherbst[d]: it just deletes a bunch of code from nak 😄