03:36fdobridge: <gfxstrand> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28756
03:36fdobridge: <gfxstrand> I don't hate it...
03:37fdobridge: <gfxstrand> It took me a while to come up with a wrapper around all of the struct messiness that I didn't mind too much and ultimately I think traits were the right call.
03:37fdobridge: <gfxstrand> I tried like 3 or 4 other attempts with macros and they were all a bit of a disaster.
04:08fdobridge: <gfxstrand> Okay, I've mostly worked around that. I'm now only importing the hardware header for the duration of the struct declaration. It still doesn't totally fix versions but it's a lot better.
04:11fdobridge: <gfxstrand> I kinda want to tweak the struct parser now to emit a module per struct or something.
04:11fdobridge: <gfxstrand> That would let me totally fix the problem such that you literally cannot get it wrong within the context of the little submodule there.
04:20fdobridge: <gfxstrand> I'm just not sure a submodule is the right way to organize it. Then again, I'm not sure that Rust has a better way.
07:55ad__: Lyude: posted logs you asked related to ada lovelace backlight. As always, take your time, that as maintainer i guess things are quite stressfull
08:31jumpingAI: 624+724+931-512 is step1 is 112+724+931 twice of the sum and then -512 is step2 is 112+624+724+724...1536-212-112-419=400+300+93 this added to step2 -512 is 512+624...+300+93..subtract step1 and 1*sum remove 2*1024 is 400 and 1024-400=624
08:32jumpingAI: if you want no duplicates increasing set, this is much more space friendly than bitset, but i have much more powerful sets, that are indexed as needed.
14:49fdobridge: <zmike.> @gfxstrand can you confirm something for me? if I do:
14:49fdobridge: <zmike.> * `grep sparse -hr ~/src/VK-GL-CTS/external/openglcts/data/gl_cts/data/mustpass/gl/khronos_mustpass/main/ >| sparse.txt`
14:49fdobridge: <zmike.> * `MESA_LOADER_DRIVER_OVERRIDE=zink deqp-runner run --deqp ./glcts --caselist sparse.txt --output sparse`
14:49fdobridge: <zmike.> I get a ton of hangs
14:50fdobridge: <zmike.> the issue seems to be running multiple versions of the same tests simultaneously
15:01fdobridge: <mhenning> @zmike. What kernel are you on? I believe 6.8.5 has a fix for a known sparse bug. (The patch is "nouveau/uvmm: fix addr/range calcs for remap operations")
15:01fdobridge: <zmike.> I'm on 6.8.5
15:24fdobridge: <gfxstrand> I can try after my current CTS attempt either finishes or dies in a fire.
15:51fdobridge: <zmike.> great
15:51fdobridge: <zmike.> also idk if you've seen but https://gerrit.khronos.org/c/vk-gl-cts/+/14349 is the first big test split CL
15:51fdobridge: <gfxstrand> Should be about another hour
15:51fdobridge: <zmike.> and it's amazing
15:52fdobridge: <zmike.> cuts down sparse* test runtimes from 3+ minutes to ~5s in deqp-runner
15:52fdobridge: <gfxstrand> \o/
15:52fdobridge: <gfxstrand> copy image and swizzle when?
15:53fdobridge: <zmike.> they're on the list
15:53fdobridge: <zmike.> the problem is our funding is only for 40h/week, and this is split 50/50 between porting kc and test splitting
15:53fdobridge: <zmike.> so both projects are moving slowly
15:54fdobridge: <gfxstrand> Yeah
15:54fdobridge: <zmike.> but I think by the end of the year we should be able to cut multiple minutes off glcts runtime for every ci job and also we'll actually be getting full coverage with public builds
15:54fdobridge: <gfxstrand> \o/
18:17fdobridge: <gfxstrand> `60/60 sessions passed, conformance test PASSED`
18:17fdobridge: <gfxstrand> Hell, yeah!
18:41fdobridge: <zmike.> !
18:41fdobridge: <zmike.> is such a thing even possible
18:44fdobridge: <zmike.> seriously though congrats
18:57fdobridge: <airlied> Was that with the kernel patch I posted or just good luck?
18:57fdobridge: <airlied> Also yay!
19:08fdobridge: <gfxstrand> It took a few tries because of random fails but we got there. Turing and Ada are running now.
19:09fdobridge: <gfxstrand> Luck.
19:22fdobridge: <gfxstrand> Really, though, most of the work has been figuring out that right combination of stupid things (like don't use Wayland) to get it to pass.
19:23fdobridge: <gfxstrand> That and a lot of waiting.
19:23fdobridge: <gfxstrand> Seriously. Way too much waiting.
19:23fdobridge: <airlied> This is why I have 10 machines in my office :-p
19:24fdobridge: <gfxstrand> And trying to not slowly go insane from not having my test machine available.
19:24fdobridge: <gfxstrand> Normally, I just have a million cores in my one machine. 🙃
19:25fdobridge: <!DodoNVK (she) 🇱🇹> What's the next goal? Rendering Cyberpunk 2077 accurately (without RT because RE is hard)? Performance optimizations (NVK is really slow in some weird scenarios)?
19:25fdobridge: <airlied> Modifiers :-p
19:31RSpliet: surely the thing to do is to focus on the performance of critical benchmarks... like vkgears
19:48fdobridge: <airlied> @gfxstrand I realised years ago id rather something take longer and not block me, than wait, I use a crappy nuc for my central desktop and currently 5 machines are in active use for projects
19:49fdobridge: <gfxstrand> Yeah, that's getting close to the top of the pile.
19:50fdobridge: <gfxstrand> Next project is to rework cbuf0 to use in-place updates because I think those call less internal stalling. I'm really hoping that helps perf a good bit on the big cards.
19:51fdobridge: <zmike.> @gfxstrand I gotta be honest you're putting me in a real dilemma here
19:51fdobridge: <zmike.> https://cdn.discordapp.com/attachments/1034184951790305330/1229881874990759956/image.png?ex=66314bc1&is=661ed6c1&hm=605162fa796ccb61fd1f1e823106607fc9fb1ac7f3a618a49ccf6ccf683ca4a6&
19:51fdobridge: <gfxstrand> Yeah, I really need to build out a second desktop. I've got the case and PSU and a wishlist on Amazon. I just need to buy the parts and do it.
19:52fdobridge: <gfxstrand> What is the funniest thing of all time in this case?
19:52fdobridge: <zmike.> NAK the NAK conformance submission ofc
19:54fdobridge: <gfxstrand> I mean, if you want to be the reason why your driver isn't conformant, you do you I guess. 🤷🏻♀️
19:55fdobridge: <zmike.> you have to look at it in the abstract
19:55fdobridge: <zmike.> obviously I won't, but imagine
19:55fdobridge: <gfxstrand> But you can't NAK NAK any harder than NAK has already been NAK'd.
19:56fdobridge: <zmike.> yeah now you're getting it
19:56fdobridge: <zmike.> this could be the final NAK
19:58fdobridge: <zmike.> so I assume this means I never have to run cts on nvk again
19:58fdobridge: <zmike.> or do any other form of testing
19:59fdobridge: <airlied> as long as you abuse your power to ensure cts gets no more tests
19:59fdobridge: <zmike.> ...what if I'm already abusing my powers to do the opposite
20:01fdobridge: <zmike.> I think it's not the same as vkcts though
20:01fdobridge: <zmike.> glcts can't withdraw versions
20:01fdobridge: <zmike.> if you pass on any version you're done forever
20:02fdobridge: <gfxstrand> Yup
20:02fdobridge: <gfxstrand> I mean, I may want to run on new hardware
20:02fdobridge: <gfxstrand> Or I might not care for GL because damn this has been a giant pain.
20:03fdobridge: <gfxstrand> I'm not even going to try for GLES right now. Too many EGL issues that aren't my fault.
20:03fdobridge: <airlied> also is a nvk/zink submission just a zink submission?
20:03fdobridge: <zmike.> I thought EGL was passing?
20:03fdobridge: <gfxstrand> Not last I checked. That's where GLES runs go to die
20:03fdobridge: <zmike.> :stressheadache:
20:03fdobridge: <gfxstrand> I'll submit as NVK+Zink
20:04fdobridge: <airlied> @zmike. did you get rules for conforming zink>
20:04fdobridge: <airlied> ?
20:04fdobridge: <gfxstrand> With a conformant products list. That's my plan, anyway.
20:04fdobridge: <zmike.> in what regard
20:04fdobridge: <zmike.> according to the current policy, a layered implementation is conformant after 2 submissions are conformant
20:04fdobridge: <zmike.> 1 of which can be software
20:05fdobridge: <airlied> ah okay, so is zink already passing that?
20:05fdobridge: <zmike.> this is #2
20:05fdobridge: <zmike.> img was #1
20:05fdobridge: <zmike.> though it's unclear to me how they managed to pass
20:05fdobridge: <gfxstrand> Has anyone gotten lavapipe+Zink passing?
20:05fdobridge: <zmike.> I think it should?
20:06fdobridge: <zmike.> I remember running it overnight a while ago and then fixing the fails it found
20:06fdobridge: <gfxstrand> Probably given that NVK passes now.
20:06fdobridge: <zmike.> nvk required a lot more core mesa work to fix fails than any other driver :fullheadache:
20:07fdobridge: <airlied> I'm not sure llvmpipe passes at the moment, I should burn a test box at some point
20:09fdobridge: <gfxstrand> Yes but none of the bugs were in NVK. I just have all the features. 😝
20:10fdobridge: <zmike.> stencil export :fullheadache:
20:10fdobridge: <gfxstrand> Oof. Low blow...
20:10fdobridge: <zmike.> also if you have a list of the EGL fails somewhere
20:10fdobridge: <gfxstrand> I don't know why Nvidia doesn't have that one. It's so bad.
20:10fdobridge:<zmike.> is resigned to running more znvk cts
20:10fdobridge: <zmike.> yes, stencil fallback is cancer
20:11fdobridge: <zmike.> I think @airlied and I have each fixed it multiple times now
20:11fdobridge: <gfxstrand> `cts-runner` for 5 min.
20:11fdobridge: <zmike.> hm
20:11fdobridge: <zmike.> does the base egl list have fails?
20:11fdobridge: <zmike.> last time I tried that it passed
20:11fdobridge: <!DodoNVK (she) 🇱🇹> What's the policy for adding new extensions in OpenGL?
20:11fdobridge: <zmike.> in what regard
20:11fdobridge: <gfxstrand> It's semi-random, iirc. I suspect a kernel thing but it's been a while.
20:12fdobridge: <zmike.> so...not a zink bug
20:12fdobridge: <zmike.> speaking of kernel, I'm guessing the sparse thing from earlier is kernel related
20:12fdobridge: <gfxstrand> Yeah, I really need to replace my horrible stencil resolve pass with something that writes stencil through the color pipe.
20:13fdobridge: <airlied> doesn't maxwell have stencil export?
20:13fdobridge: <zmike.> just use `util_blitter_stencil_fallback` !
20:13fdobridge: <airlied> I thought nvidia added it eventually, but I could be mixing it up
20:13fdobridge: <zmike.> I don't think so?
20:14fdobridge: <airlied> yeah I think I'm probably mixing it up with s8 or someat
20:15fdobridge: <gfxstrand> Nope. Not even Ada has out
20:16fdobridge: <gfxstrand> *it
20:16fdobridge: <zmike.> @gfxstrand also re: sparse, I haven't checked at all but you may want to look at porting !18507
20:16fdobridge: <zmike.> this massively optimizes the longest sparse buffer cts case (down to less than a second iirc)
20:16fdobridge: <zmike.> though maybe some of the tc work that happened around then optimized it from the frontend side...hard to remember
20:17fdobridge: <gfxstrand> You mean like https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/vulkan/nvk_queue_drm_nouveau.c?ref_type=heads#L114
20:18fdobridge: <zmike.> yeah cool
20:20fdobridge: <zmike.> even after the split there's still one sparse buffer test that takes 20-30s, so I was suspecting you might not be batching
20:20fdobridge: <zmike.> but I also didn't look any deeper
20:26fdobridge: <redsheep> This is exciting, I'd like to test that. If you are to the point of starting to look for low hanging fruit for performance @rinlovesyou found that currently NVK can spend quite a bit of time in CopyImage, around 20x as much as the prop driver
20:27fdobridge: <gfxstrand> Yeah, we're way over-synchronizing on copies.
20:27fdobridge: <rinlovesyou> here's nvk vs prop on dolphin
20:27fdobridge: <rinlovesyou> https://cdn.discordapp.com/attachments/1034184951790305330/1229890961258188800/image.png?ex=66315437&is=661edf37&hm=eea98389ee22b95fbb3aac7a555db56de0163564d97a87b5e3851919cd9a8547&
20:27fdobridge: <rinlovesyou> https://cdn.discordapp.com/attachments/1034184951790305330/1229890961572892692/image.png?ex=66315437&is=661edf37&hm=1b2c3077ef489cfa8530d2c0e015d6f7b789899f27b37d5ed9f2b18299a600e2&
20:27fdobridge: <gfxstrand> I don't think it's anyone's bottleneck right now but it's a known issue.
20:28fdobridge: <rinlovesyou> in minecraft with zink & shaders this can end up being >4ms
20:28fdobridge: <rinlovesyou> in minecraft with zink & shaders this can end up being >5ms (edited)
20:28fdobridge: <gfxstrand> Oof
20:28fdobridge: <rinlovesyou> https://cdn.discordapp.com/attachments/1034184951790305330/1229891150199128254/image.png?ex=66315464&is=661edf64&hm=83a99d0430950cb92e3e8bce7ce0df3ec7212f884e8d553464ebafa35c7ad0be&
20:28fdobridge: <redsheep> Yeah second from the top time sink in Minecraft is lots
20:28fdobridge: <gfxstrand> Feel free to file a bug with all that info.
20:29fdobridge: <rinlovesyou> EndRenderPass is another big timesink
20:29fdobridge: <gfxstrand> Unfortunately, I don't know how low that fruit hangs. I think the answer is to do async copies but then we have to figure out how to synchronize them.
20:29fdobridge: <rinlovesyou> in proprietary, it's so negligable it doesn't seem to even appear in the "Top pipelines"
20:30fdobridge: <redsheep> Ah okay
20:30fdobridge: <gfxstrand> We do basically nothing in EndRenderPass.
20:30fdobridge: <gfxstrand> Not unless it's MSAA.
20:30fdobridge: <redsheep> It wouldn't be for Minecraft shaders
20:31fdobridge: <rinlovesyou> it *is* in dolphin
20:31fdobridge: <gfxstrand> Yeah...
20:31fdobridge: <rinlovesyou> why is msaa such a hog on performance anyways
20:32fdobridge: <rinlovesyou> even nvidia proprietary struggles with it
20:32fdobridge: <gfxstrand> Memory banquet
20:32fdobridge: <gfxstrand> Memory bandwidth (edited)
20:32fdobridge: <rinlovesyou> it was causing some huge VR issues for me haha
20:32fdobridge: <rinlovesyou> game would sometimes slow to a crawl in certain situations, MSAA off fixed that
20:32fdobridge: <rinlovesyou> it was causing some huge VR issues on proprietary for me haha (edited)
20:33fdobridge: <gfxstrand> Another project on the short list is to figure out color compression. I suspect it's not too hard but there's some R/E work to do there.
20:33fdobridge: <gfxstrand> That should help MSAA a lot
20:33fdobridge: <airlied> yeah msaa without compression gonna suck
20:34fdobridge: <redsheep> Yeah evidently 1 TB/s memory bandwidth still isn't enough when I turn on msaa
20:47RSpliet: wait... does NVIDIA store the unresolved MSAA render target in DRAM?
20:50fdobridge: <redsheep> Based on my tile based rasterization testing that seems to depend on generation and whether you have enough cache to manage it. In reality I'd expect even with heavy msaa it's sitting in l2 cache for me but I don't have any way to fully confirm that yet
20:51fdobridge: <redsheep> Ada is the exception though, probably every other generation usually ends up having them in vram when tbr is off
20:51RSpliet: right, but you end up writing the unresolved MSAA RT out to DRAM (which may or may not stay in your cache hierarchy), and then after that you do a separate resolve pass... that sounds expensive :-P
20:51RSpliet: no wonder NVIDIA wants you to do FXAA
20:51RSpliet: or whatever their software AA is called, I think FXAA is like 25 years ago now :D
20:51fdobridge: <ahuillet> *DLAA
20:52RSpliet: DL == Deep Learning?
20:52fdobridge: <ahuillet> yeah. FXAA hasn't been the thing for a while now
20:52fdobridge: <rinlovesyou> yeah, just like DLSS
20:52fdobridge: <redsheep> It's native resolution dlss
20:53RSpliet: sorry, I tend to ignore AI-based solutions. I'm sure their clever, but... yeah
20:53fdobridge: <leopard1907> As long as result is good, does it really matter?
20:53RSpliet: for the end-user no :-)
20:54RSpliet: I'm just an old school technician with a knee-jerk reaction to AI, wouldn't read anything into it other than that I'm probably angily yelling at a cloud
20:55RSpliet: *angrily
20:55fdobridge: <ahuillet> https://www.youtube.com/watch?v=d5knHzv0IQE is worth watching
20:55fdobridge: <Sid> https://tenor.com/view/old-man-yells-at-cloud-yelling-old-man-news-the-simpsons-gif-17741451
20:55fdobridge: <ahuillet> not AA-specific but similar concept.
20:55fdobridge: <redsheep> I mean until someone writes a hand tuned algorithm that does better ML seems like the solution
20:55fdobridge: <redsheep> Fsr 2.2 still ain't it
20:56RSpliet: are we talking AMD's marketing FSR, or Khronos' fragment shading rate?
20:56fdobridge: <Sid> in 3.1 we trust
20:56fdobridge: <Sid> AMD's upscaler
20:58RSpliet: ahuillet: already thrown a neural net at attachment rate FSR/VRS?
21:02airlied: RSpliet: the unresolved msas is normally compressed
21:02airlied: msaa
21:03airlied: so less memory bw at least
21:03RSpliet: airlied: framebuffer compression, or something more special for MSAA? ... I guess that's the NVIDIA secret sauce :-)
21:06fdobridge: <karolherbst🐧🦀> big news coming in soon :ferrisBongo:
21:24airlied: RSpliet: I assume it keeps a second plane with an index of areas where the samples are the same
21:25airlied: at least amd had a plane per sample, and a lookup table that decided if you needed to access them
22:48fdobridge: <gfxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1229926345803567114/IMG_20240416_171434805_AE.jpg?ex=6631752b&is=661f002b&hm=65054255c3f22bc47c3831c6f4567ec9982654f57d661ea2936bd0a808ed1c87&
22:51fdobridge: <!DodoNVK (she) 🇱🇹> Is this the only NVK mug?
22:54fdobridge: <gfxstrand> Currently, yes. I'm going to post a link for others to get them, too, but I'm waiting for samples so that I'm reasonably sure that people aren't buying crap.
22:55fdobridge: <karolherbst🐧🦀> :ferrisBongo:
22:55fdobridge: <gfxstrand> For instance, everything is a bit weirdly pixelated and washed out on this mug. It looks better in the picture than in person. I've tweaked the image I uploaded and should get a 2nd, hopefully better mug in a couple weeks.
22:56fdobridge: <gfxstrand> The pixelated part is fine
22:56fdobridge: <gfxstrand> The way it causes things to wash out is less fine
22:56fdobridge: <gfxstrand> But the NVK logo really is perfect for mugs. :triangle_nvk:
23:05fdobridge: <nanokatze> the middle of that triangle is too grey
23:06fdobridge: <nanokatze> someone used unorm instead of srgb
23:07fdobridge: <nanokatze> the middle of that triangle is too dark (edited)
23:08fdobridge: <mohamexiety> yeah the logo is really nice for mugs and shirts imo
23:08fdobridge: <mohamexiety> NVK merch will be based
23:22fdobridge: <zmike.> I wear my shirt regularly
23:50fdobridge: <bylaws> There are nvk shirts?
23:53fdobridge: <bylaws> Or better put, where can you get an nvk shirt?
23:55fdobridge: <zmike.> Build a time machine and go to XDC 2022