15:01gfxstrand[d]: Yeah, we hit a lot of GSP locks. We can cut that in half by adding a device enumeration uAPI that doesn't require me to create a context just to get class versions.
15:04karolherbst[d]: mhhh.. right
15:04karolherbst[d]: couldn't you cache the context though?
15:05karolherbst[d]: but I guess it would be useful for multi GPU setups anyway
15:16gfxstrand[d]: Yeah, we could theoretically optimize a bit by letting the first nvk_device steal the context somehow.
15:21gfxstrand[d]: I've tried to avoid going there because doing it reliably is a bit tricky but we could.
15:32gfxstrand[d]: Alright... New sample mask shenanigans are looking pretty good...
15:32gfxstrand[d]: CTS run will take a bit but I think I'll merge VK_EXT_post_depth_coverage today
15:33gfxstrand[d]: Also, TIL that NIR will optimize `x * 2 + N` to `iadd3 x x N`
15:33gfxstrand[d]: "OMG! What are you doing NIR?!? Where did my 2 go?!? Oh...."
16:45tiredchiku[d]: :o
16:45tiredchiku[d]: mary just seems to have rebased her mesh shaders MR
16:45tiredchiku[d]: exciting
16:45tiredchiku[d]: will test it out on the desktop when I have it by the end of this week :saigeheart:
16:48marysaka[d]: I rebased it but be aware that task + mesh isn't done at all yet tiredchiku[d]
16:49marysaka[d]: that and final rebase broke gl_Layer so will need to dig into that more next week
16:49tiredchiku[d]: hm, I was thinking of throwing Alan Wake 2 at it
16:49tiredchiku[d]: but if you have things to work out I can do something else in the meanwhile too
16:49tiredchiku[d]: like actually set up my testing installation 😅
18:46gfxstrand[d]: FYI for those interested, https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29194 should give a minor perf boost for some MSAA apps. IDK which ones as the workaround I deleted was pretty specific, but it may help something.
18:47gfxstrand[d]: Okay, it's EXT_descriptor_buffer time!
18:53gfxstrand[d]: I think George might have had patches somewhere
18:54gfxstrand[d]: But IDK that he got it totally working
18:59redsheep[d]: I would have sworn there's an MR opened for the beginnings of that implementation but I do not see it, and it's not in his branches in gitlab afaict
19:00redsheep[d]: So I guess it was just something talked about here. I don't remember the specifics but are you going to be attempting the more clever path that tries to map nvidia hardware more closely to that extension, or are you thinking more along the lines of doing what nvidia does?
19:01redsheep[d]: IIRC that discussion was about having more or less indirection?
19:05gfxstrand[d]: Yeah, he was going to try for zero indirection
19:07redsheep[d]: Hopefully that is possible, it would be nice for vkd3d perf
19:07redsheep[d]: As it is today vkd3d games are generally faster than dxvk games, or whatever reason
19:08gfxstrand[d]: Yeah
23:02karolherbst[d]: gfxstrand[d]: not quite sure if that's always better on nvidia :ferrisUpsideDown:
23:02karolherbst[d]: actually... I don't think that's ever better
23:04karolherbst[d]: uhhh
23:04karolherbst[d]: it's complicated
23:04karolherbst[d]: it actually impacts wait counts
23:05karolherbst[d]: I think that's one of the optimizations you might be able to do, but it needs to be done in nak probably if you want to actually only do it if there is a benefit in doing so
23:06karolherbst[d]: and how the result is used, and where the sources come from
23:06karolherbst[d]: yes.. imad and iadd3 belong to different instruction groups
23:07karolherbst[d]: but there are also cases where `IMAD` would be faster than `MOV`, and it's the same reason