03:15 tiredchiku[d]: gfxstrand[d]: so when you say we have to do the clamp in the shader, do I have to edit the shader itself or can that be done by the driver?
03:22 gfxstrand[d]: It means a compiler pass, probably in NVK that modifies the side to add the saturate.
03:23 airlied[d]: do nvidia use a shader variant?
03:25 skeggsb9778[d]: gfxstrand[d]: btw, i found this, which probably explains better than I did: https://github.com/NVIDIA/open-gpu-doc/blob/master/manuals/turing/tu104/dev_usermode.ref.txt#L114
03:30 tiredchiku[d]: gfxstrand[d]: wahoo, time to poke into NAK
03:34 gfxstrand[d]: airlied[d]: Not sure what they do for pipelines. But you can't use variants with ESO.
03:34 tiredchiku[d]: oh this appears to be more of a codegen thing
03:34 gfxstrand[d]: skeggsb9778[d]: Oh, nice!
03:35 tiredchiku[d]: hope I'm looking at the right part of code:
03:35 tiredchiku[d]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/codegen/nv50_ir_from_nir.cpp#L1087-1134
03:36 gfxstrand[d]: NVK doesn't use codegen
03:37 gfxstrand[d]: I really should delete the codegen support. 🤔
03:40 gfxstrand[d]: You should be looking at a combination of nvk_shader.c and NAK.
03:40 tiredchiku[d]: okie
03:40 gfxstrand[d]: We probably want to put it in NVK as part of FS output lowering.
03:42 mhenning[d]: if anyone wants to tinker with a possible perf improvement: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31498
03:55 gfxstrand[d]: mhenning[d]: I saw that. Thanks a bunch! There's a lot of work we need to do in NAK yet to get it to where it needs to be and, without actually reading the code yet, that looks like a badly needed piece of that puzzle.
03:57 tiredchiku[d]: okay, I'll have to sit down with a clear head to figure out where to prod around in NAK to get this extension going
03:57 tiredchiku[d]: turns out staring at code half an hour after you wake up is not a good idea!
05:23 redsheep[d]: mhenning[d]: Wow -29% code size is incredible
06:36 redsheep[d]: Current mesa main vs 31498
06:36 redsheep[d]: Horizon zero dawn benchmark 4843 score > 4928
06:36 redsheep[d]: Talos 1 75 fps > 76
06:36 redsheep[d]: The witness 97 fps > 97
06:36 redsheep[d]: Maybe most of my shaders are stuck cached and so not showing improvement? Or maybe these are just somehow bound somewhere else?
06:40 tiredchiku[d]: you should be able to clear the cache fairly trivially
06:41 redsheep[d]: Yeah, I will have to clear to test later. I would have expected reduced size to improve utilization and therefore perf
06:41 tiredchiku[d]: rm -rf ~/.cache/mesa_shader_cache*
13:55 mhenning[d]: redsheep[d]: That might be correct. What we're seeing in shaderdb is that certain shaders see huge improvements but others see only a 3 percent or less improvement - it likely depends on how the control flow in the shader is structured
13:55 mhenning[d]: So a lot of apps will see more modest improvements in code size
13:58 mhenning[d]: on top of that, any app that's strictly memory bound won't see fps improvements from these kinds of improvements, which makes it even harder to observe an actual improvement in frame times
16:11 tiredchiku[d]: hmm how would I check if depthClampZeroOne and depth
16:12 tiredchiku[d]: our rasterization state doesn't have any members for those extensions, like it does for depth_clamp_enable
16:19 gfxstrand[d]: `nvk_device::vk::enabled_extensions` and `nvk_device::vk::enabled_features`.
16:22 tiredchiku[d]: is there a difference between them being enabled and them being specifically requested?
16:23 tiredchiku[d]: gfxstrand[d]: also this appears to be rust, I'm poking around nvk_cmd_draw.c
16:23 gfxstrand[d]: Not with this extension, no.
16:23 gfxstrand[d]: It's all one big global device enable. 😩
16:24 tiredchiku[d]: unfun
16:24 tiredchiku[d]: so can I assume they're always requested :D
16:25 gfxstrand[d]: No
16:25 gfxstrand[d]: Because the behavior actually changes based on whether or not they're requested
16:25 tiredchiku[d]: or, well, always enabled
16:25 tiredchiku[d]: oh
16:25 tiredchiku[d]: right, yeah
16:25 gfxstrand[d]: It also changes based on the depth format
16:25 tiredchiku[d]: hmm
16:29 tiredchiku[d]: basically I'm trying to fix it in the pipeline itself, instead of messing with the compiler to modify the shader
16:29 gfxstrand[d]: That might work
16:30 tiredchiku[d]: https://pastebin.com/WVNd4btF
16:30 tiredchiku[d]: is what I have so far
16:30 tiredchiku[d]: but I'm struggling to figure out how to actually initialize the bools I defined correctly based on whether the extensions are specifically requested
16:35 tiredchiku[d]: probably just missing something obvious that tells me how we do extension state tracking
16:36 gfxstrand[d]: There's no state. You just need to fish out the device and look at the enables.
16:38 tiredchiku[d]: aha
16:38 tiredchiku[d]: got you, thanks
16:47 tiredchiku[d]: <a:angr:1022261683332321350>
16:48 tiredchiku[d]: depthRangeUnrestricted is never advertised as a device feature in any driver
16:58 tiredchiku[d]: anyway, fished what I needed out of dev and pdev, let's see if those tests still fail
17:21 tiredchiku[d]: oh
17:27 gfxstrand[d]: tiredchiku[d]: Yeah it is.
17:28 gfxstrand[d]: We advertise it in NVK
17:28 gfxstrand[d]: Well, we advertise the extension. There is no feature bit
17:29 tiredchiku[d]: yeah, that's what I meant
17:29 gfxstrand[d]: It's an old extension from back when we thought skipping the enable bit if there was only one thing to enable was a good idea
17:29 tiredchiku[d]: anyway, I think I've got it. the tests pass
17:30 tiredchiku[d]: just gonna do a clean up and run the tests for every depth* case, then push
17:30 tiredchiku[d]: no compiler pass needed either :D
17:30 gfxstrand[d]: 😄
17:30 gfxstrand[d]: Go ahead and push it and I can kick off a run in another hour or so
17:35 tiredchiku[d]: that's okay, I can do the run too
17:35 tiredchiku[d]: it can happen in the background while I work on something else
17:37 tiredchiku[d]: oh I borked it in the clean up 🤣
17:37 tiredchiku[d]: let's see what I changed
17:38 tiredchiku[d]: huh
17:38 tiredchiku[d]: why is z_clamp evaluating to false
17:39 tiredchiku[d]: `const bool z_clamp = dyn->rs.depth_clamp_enable;`
17:39 gfxstrand[d]: Yeah, but `*depth*` may be a lot of tests. If it takes more than an hour or so, let me know.
17:39 tiredchiku[d]: only one way to find out 😅
17:40 tiredchiku[d]: tiredchiku[d]: is depth_clamp not enabled when depth_clamp_zero_one is?
17:40 gfxstrand[d]: They're not quite the same thing
17:41 tiredchiku[d]: I'm guessing I can't set depth_clamp as enabled if depth_clamp_zero_one is requested either then
17:43 gfxstrand[d]: There are too many ways to clamp depth. 😂
17:43 tiredchiku[d]: <a:Foxy_Sweat_Nervous:945142720270311454>
17:43 tiredchiku[d]: far too many
17:44 tiredchiku[d]: oh
17:44 gfxstrand[d]: I *think* this is effectively just a 0/1 version of the depth_clamp_control extension which was recently merged.
17:45 gfxstrand[d]: Which I'm now realizing may also be wrong. 🤡
17:45 tiredchiku[d]: OH I am stupid, the tests pass because I set unrestricted range to false to test my ternary logic 🤣
17:45 tiredchiku[d]: still need to figure out how to get instance extensions
17:45 gfxstrand[d]: Instance extensions?
17:46 gfxstrand[d]: It had better not be an instance extension
17:46 tiredchiku[d]: ..am I understanding the terminology wrong
17:46 gfxstrand[d]: They should both be device extensions
17:47 tiredchiku[d]: if an app requests depth_range_unrestricted, is that not an instance extension in the sense that it is requested by the vulkan instance
17:48 gfxstrand[d]: The app requests it as part of creating the device
17:49 gfxstrand[d]: Instance extensions are pretty rare. They're only used in cases where you need to do something before device creation.
17:50 tiredchiku[d]: ahhh, so it creates a device, not an instance, got it
17:50 tiredchiku[d]: understood the terminology wrong
17:51 gfxstrand[d]: Well, it first creates an instance, then enumerates all available physical devices and then creates a device based on one of those physical devices.
17:51 tiredchiku[d]: so that's the missing piece of the puzzle
17:51 tiredchiku[d]: physical device is not the same as vulkan device
17:51 gfxstrand[d]: Most extensions are device extensions because they depend on the card you have plugged in and the driver. Instance extensions are only for stuff that's really system-wide.
17:52 tiredchiku[d]: right, got it
17:52 tiredchiku[d]: gfxstrand[d]: which MR is today's test run for :3
17:53 gfxstrand[d]: I'm currently double-checking !25576
17:57 tiredchiku[d]: oh, neat
17:57 tiredchiku[d]: I keep forgetting that nvk is not just src/nouveau/vulkan
17:57 tiredchiku[d]: but also src/vulkan (since we use vulkan common code)
18:04 lingm[d]: Offtopic but not sure where or whom else to ask, Faith: What's the MSRV of deqp-runner?
18:04 tiredchiku[d]: where's our enabled_vk_instance_extension_table
18:17 asdqueerfromeu[d]: tiredchiku[d]: Having almost ten copies of X11/Wayland WSI code isn't ideal though (so it definitely makes sense)
18:30 gfxstrand[d]: lingm[d]: I don't know that we have one. Honestly, I'm happy to bump that pretty aggressively. It's only used by CI systems and developers.
18:31 lingm[d]: Great, that's what I hoped to hear.
19:24 tiredchiku[d]: I'm gonna cry
19:25 tiredchiku[d]: tiredchiku[d]: I was looking for ways to check if it's enabled everywhere
19:25 tiredchiku[d]: and I was SO close with checking it via pdev
19:25 tiredchiku[d]: tiredchiku[d]: .-.
19:26 tiredchiku[d]: the answer was in front of me the whole time
19:29 tiredchiku[d]: I did dev->vk.enabled_features.depthClampZeroOne;
19:29 tiredchiku[d]: but it didn't strike me that I could also just do dev->vk.enabled_extensions.EXT_depth_range_unrestricted;
19:29 tiredchiku[d]: :vanpalm:
19:35 tiredchiku[d]: gfxstrand[d]: pushed
19:35 tiredchiku[d]: in case your CTS run hasn't begun
19:35 tiredchiku[d]: full *depth\* run breaks for me due to some gsp shenanigans
19:35 tiredchiku[d]: [13192.890556] nouveau 0000:01:00.0: deqp-vk[357032]: channel 304 killed!
19:35 tiredchiku[d]: [13192.902519] nouveau 0000:01:00.0: gsp: mmu fault queued
19:35 tiredchiku[d]: [13192.904248] nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:304 type:31 scope:1 part:233
19:38 tiredchiku[d]: ..nvm it broke other tests
19:38 tiredchiku[d]: kekw
19:50 tiredchiku[d]: hmm
19:50 tiredchiku[d]: so I'm not handling what to do when both are enabled
20:04 tiredchiku[d]: yeah, I'll look at this tomorrow
20:04 tiredchiku[d]: I broke more than I fixed :P
20:34 tiredchiku[d]: ok, the logic is right
20:35 tiredchiku[d]: it's just that
20:35 tiredchiku[d]: `const bool clamp_zero_one = dev->vk.enabled_features.depthClampZeroOne;` is always enumerating to 1
20:44 tiredchiku[d]: oh, that's not it
20:44 tiredchiku[d]: only the input_negative tests are failing
20:44 tiredchiku[d]: interesting
22:24 gfxstrand[d]: redsheep[d]: Discord deleted all of those. 😭
22:29 valentineburley[d]: they are still up for me
22:30 redsheep[d]: Yeah they still load for me
22:31 redsheep[d]: If you have some other way you want me to send them I can
22:32 redsheep[d]: gfxstrand[d]: It's probably just bugged for you, discord doesn't seem to overly enjoy 450 MB of video in one message
22:33 redsheep[d]: For some reason despite me setting it to fixed rate cpu encoding at really high quality the Talos principle clip is absolutely massive and is 2/3rds of it alone
22:34 redsheep[d]: Ran out of time to debug what obs did there
22:35 gfxstrand[d]: okay. I'll refresh
22:36 redsheep[d]: redsheep[d]: To be clear, the clips were only supposed to be 6 Mbps, I mean I set the encoder to the slowest value allowed
22:37 gfxstrand[d]: I got them now
22:37 gfxstrand[d]: Can you tell me the names of the games? I know some of them but not all
22:39 redsheep[d]: gfxstrand[d]: Supertuxkart, Horizon Zero Dawn, The Talos Principle, Cyberpunk 2077, Deep Rock Galactic, The Witness
22:41 redsheep[d]: It was uh... Not exactly easy to try to come up with something interesting to look at
22:43 redsheep[d]: If you want to know settings, all these were high or ultra native 4k except cyberpunk which was doubled from 1080p to 4k with fsr2
22:44 redsheep[d]: Not that it made a difference, cyberpunk barely sped up at all going from 4k to 1080p
22:45 redsheep[d]: I can't wait to see what the performance data says. Some of these games not speeding up when changing certain settings just makes no sense.
22:49 redsheep[d]: Something super bizarre has to be going on for cyberpunk perf not to be very very strongly correlated to resolution
22:52 redsheep[d]: There's probably something dominating the frame time that has nothing to do with output resolution, but I have no idea what
23:28 gfxstrand[d]: That can happen if we're stalling too much, doing something bad in a VS, or if there are a bunch of compute shaders taking time.
23:28 gfxstrand[d]: Certainly gives me something to look at, though.
23:38 gfxstrand[d]: It at least means we're not FS-bound