04:25fdobridge: <theevilskeleton> https://social.treehouse.systems/@TheEvilSkeleton/111401331877473698
04:25fdobridge: <theevilskeleton> > Update 2: 24 hours later, I'm happy with where this is going. Here's an update:
04:25fdobridge: <theevilskeleton> >
04:25fdobridge: <theevilskeleton> > - Tables have clearer borders
04:25fdobridge: <theevilskeleton> > - High contrast options for light and dark styles (painful)
04:25fdobridge: <theevilskeleton> > - Drop shadow under the navigation bar
04:25fdobridge: <theevilskeleton> > - Dark style is slightly lighter
04:25fdobridge: <theevilskeleton> >
04:25fdobridge: <theevilskeleton> > (Please forgive me for using Chrome. Firefox has a bug that gets stuck in high contrast after enabling it, which makes it really difficult and annoying to test)
04:25fdobridge: <theevilskeleton> https://social.treehouse.systems/@TheEvilSkeleton/111401331877473698 (edited)
04:27fdobridge: <clangcat> Table looks a lot nicer
04:29fdobridge: <theevilskeleton> Indeed
06:43fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> What could `[ 893.792720] nouveau 0000:59:00.0: fifo:PBDMA0: CTXNOTVALID chid:2` mean? :nouveau:
07:56fdobridge: <karolherbst🐧🦀> userspace tries to submit to a context which doesn't exist (anymore)
08:03fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://gitlab.freedesktop.org/nouveau/mesa/-/issues/84
10:45fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> How easy would it be for nouveau to support dynamic MUX switching/Advanced Optimus? :nouveau:
10:49fdobridge: <karolherbst🐧🦀> easy
10:49fdobridge: <karolherbst🐧🦀> the pain is on the userspace side
10:50fdobridge: <karolherbst🐧🦀> from the kernel perspective it's simple: you disable `eDP1` and enable `eDP2` and tell the kernel to flip the mux, done
10:50fdobridge: <karolherbst🐧🦀> it's just userspace being broken
10:50fdobridge: <karolherbst🐧🦀> like
10:50fdobridge: <karolherbst🐧🦀> completely
10:50fdobridge: <karolherbst🐧🦀> every major compositor needs a rewrite
10:50fdobridge: <karolherbst🐧🦀> soooooo....
12:30fdobridge: <mohamexiety> wait it doesn't work even with prop driver? 😮
12:31fdobridge: <karolherbst🐧🦀> how could it?
12:31fdobridge: <karolherbst🐧🦀> nvidia wrote a proposal though to add more KMS properties for it
12:32fdobridge: <karolherbst🐧🦀> but the issue is really how compositors are handling their compositing contexts
12:32fdobridge: <karolherbst🐧🦀> by default they start do the compositing on the iGPU
12:32fdobridge: <karolherbst🐧🦀> and if you disable all displays on your iGPU and have some on the dGPU, they still compose on the iGPU
12:32fdobridge: <karolherbst🐧🦀> making it all very pointless to switch
12:33fdobridge: <karolherbst🐧🦀> compositors really have to use one rendering context per display/gpu
12:33fdobridge: <karolherbst🐧🦀> and move things between them
12:33fdobridge: <karolherbst🐧🦀> so that if you move your desktop from an iGPU to a dGPU display, that you also move the compositing from the iGPU to the dGPU
12:34fdobridge: <karolherbst🐧🦀> and then you also need working PSR in order to deal with flickering while migrating
12:34fdobridge: <karolherbst🐧🦀> but that's part of the KMS proposal nvidia made
12:34fdobridge: <karolherbst🐧🦀> but for now, moving to the other eDP display would decrease performance
12:35fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Is it possible to make them compose on both iGPU and dGPU (across different outputs)?
12:35fdobridge: <mohamexiety> I see. I didn't know how it worked here. thanks!
12:35fdobridge: <karolherbst🐧🦀> well.. yes
12:35fdobridge: <karolherbst🐧🦀> they just have to rework their code
12:35fdobridge: <karolherbst🐧🦀> could lead to a big rework
12:35fdobridge: <karolherbst🐧🦀> could be small
12:35fdobridge: <karolherbst🐧🦀> don't know
12:35bencoh: I think I remember reading about some compositors starting to implement some kind of gpu hotplug support to tackle that issue
12:35fdobridge: <karolherbst🐧🦀> hotplugging is unrelated
12:35bencoh: (wayland compositors)
12:36fdobridge: <karolherbst🐧🦀> it has to account for the fact that the GPU _might_ go away, sure
12:36fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I wonder what Windows does when you connect a display that's wired to the dGPU while iGPU is being used for the internal display
12:36fdobridge: <karolherbst🐧🦀> but hotplugging isn't the reason it's needed
12:36fdobridge: <karolherbst🐧🦀> it just makes it harder to get right
12:36fdobridge: <pac85> They only have to export and import the dmabufs for surfaces right?
12:36fdobridge: <karolherbst🐧🦀> well
12:36fdobridge: <pac85> What else would they share among contexts
12:37fdobridge: <karolherbst🐧🦀> from an API pov it's really just a robustness context and that context gets killed
12:37fdobridge: <karolherbst🐧🦀> applications could move from one to the other GPU
12:37fdobridge: <karolherbst🐧🦀> apple did/does this via rendering context resets
12:37fdobridge: <karolherbst🐧🦀> similar to robustness
12:38fdobridge: <karolherbst🐧🦀> so if they want to migrate an app, they kill the context and the new one gets allocated on the other GPU
12:38fdobridge: <pac85> How do most apps not die
12:38fdobridge: <karolherbst🐧🦀> they have to opt in
12:38fdobridge: <pac85> Uh makes sense
12:38fdobridge: <karolherbst🐧🦀> otherwise they always run on the dGPU
12:38fdobridge: <karolherbst🐧🦀> soooo
12:38fdobridge: <karolherbst🐧🦀> apps are kinda forced to support it
12:38fdobridge: <karolherbst🐧🦀> and toolkits handle it anyway
12:38fdobridge: <karolherbst🐧🦀> well.. apples at least
12:39fdobridge: <karolherbst🐧🦀> but yeah.. they've implemented it like 8? years ago
12:39fdobridge: <pac85> ~~ having more than one process accessing the gpu and displaying stuff fullscreen like in the voodoo was a mistake ~~
12:39fdobridge: <karolherbst🐧🦀> and apps started on a dGPU display were rendered on the dGPU
12:40fdobridge: <karolherbst🐧🦀> and migrated to the iGPU once you move the window over
12:40fdobridge: <karolherbst🐧🦀> or disconnected the display
12:40fdobridge: <karolherbst🐧🦀> or something
12:40fdobridge: <karolherbst🐧🦀> appls could prevent the dGPU from shutting down (e.g. games)
12:40fdobridge: <pac85> Uhm
12:40fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I think they had advanced Optimus since 2010
12:40fdobridge: <karolherbst🐧🦀> there was even this menubar widget to configure it 😄
12:41fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://youtu.be/8L77ARC4ENI?t=28
12:41fdobridge: <karolherbst🐧🦀> https://gfx.io/switching.html
12:42fdobridge: <karolherbst🐧🦀> guess it's older
12:42fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Here's it going wrong because of a broken dGPU
12:42fdobridge: <karolherbst🐧🦀> but on older macbooks you had to reboot or restart your session
12:42fdobridge: <karolherbst🐧🦀> so they gradually moved to a dynamic mux thing
12:42fdobridge: <karolherbst🐧🦀> but yeah....
12:43fdobridge: <karolherbst🐧🦀> we kinda have to fix this :ferrisUpsideDown:
12:43fdobridge: <karolherbst🐧🦀> and it's going to be a lot of work
12:43fdobridge: <karolherbst🐧🦀> I think gamescope is doing it properly
12:43fdobridge: <karolherbst🐧🦀> at least the compositing part
12:43fdobridge: <pac85> Uh
12:44fdobridge: <karolherbst🐧🦀> Well.. that's at least how I understood Joshie on that matter
12:45fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Gamescope's Wayscope support is poor though
12:46fdobridge: <karolherbst🐧🦀> yeah... might be, dunno 🙂
12:48fdobridge: <pac85> Wayscope?
12:50fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Wayland support in gamescope
13:41fdobridge: <theevilskeleton> One of my friends made the fd.o logo adapt to dark style 👌
13:41fdobridge: <theevilskeleton> https://cdn.discordapp.com/attachments/1034184951790305330/1173618753792135188/2023-11-13_08-41-07.mp4?ex=65649ca3&is=655227a3&hm=075ed3afda94cb7039aa15621e5d8343e24d3c99245a1e741159fd4eb447a3b0&
13:52fdobridge: <karolherbst🐧🦀> ohh, the test :ferrisUpsideDown:
13:52fdobridge: <karolherbst🐧🦀> for 10 seconds I was like "what's the difference?"
13:52fdobridge: <theevilskeleton> lmao
13:53fdobridge: <pac85> The text?
13:53fdobridge: <theevilskeleton> Yeah, that's what he means
13:54fdobridge: <karolherbst🐧🦀> ohh, the text :ferrisUpsideDown: (edited)
14:44fdobridge: <dadschoorse> ffmaz/fmulz in NAK when?
14:48fdobridge: <gfxstrand> Those are the ones with 0*inf=0 behavior?
14:49fdobridge: <gfxstrand> I think that's just a bit on the instruction so shouldn't be too hard to wire up. Are there tests somewhere?
14:50fdobridge: <dadschoorse> 0*anything (including inf and NaN)
14:50fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I remember DXVK enabling those instructions just for NVK (so I guess it's codegen-exclusive)
14:51fdobridge: <dadschoorse> I don't think there are tests given that it's produced by an algebraic opt and not an extension
14:55fdobridge: <gfxstrand> Even if it's just an amber file that has a bit of SPIR-V and tests a handful of cases so I know it's working properly, I think that's enough.
14:58fdobridge: <pendingchaos> vkrunner fmulz test: https://pastebin.com/raw/2QzTzX5g
15:11fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Does Vulkan CTS have any tests for storage on 3D images (which is what triggers an assert in NVK)? :triangle_nvk:
15:14fdobridge: <gfxstrand> Possibly but we may not have enabled the feature.
15:14fdobridge: <gfxstrand> Though I thought I did....
15:15fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> You're still doing the `1 << 11` assert in the NAK branch
15:16fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Which trips on the Overwatch 2 menu screen for some reason (if I remove it the characters have weird lighting issues on both codegen and NAK)
15:17fdobridge: <gfxstrand> RIght, yeah. That has nothing to do with 3d
15:17fdobridge: <gfxstrand> I'll be getting rid of that today, I think.
15:18fdobridge: <gfxstrand> That's one of my planned post-merge activities.
15:18fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> The comment mentions storage on 3D images (so maybe there was another assert that was removed?)
15:18fdobridge: <gfxstrand> Quite possibly. I wired up a couple more extensions in that area during XDC.
15:19fdobridge: <gfxstrand> Oh, right... I see what you mean now.
15:19fdobridge: <gfxstrand> No, the assertion is there because of codegen's batshit handling of 3D storage images. NAK doesn't do that nonsense so we can just use the image handles as-is.
15:19fdobridge: <gfxstrand> It's not because 3D is unimplemented it's just crazy
15:21fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I wonder what could explain this glitchy rendering on NAK then: https://discord.com/channels/1033216351990456371/1034184951790305330/1169302273000755371
15:22fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I'll retest OW2 with the latest NAK code to see if that issue is still there
15:29fdobridge: <gfxstrand> So, I'm not actually sure which bits I want to set to get fmulz...
15:29fdobridge: <gfxstrand> I don't think we want it to flush denorms
15:29fdobridge: <karolherbst🐧🦀> I think that's implicit
15:29fdobridge: <karolherbst🐧🦀> ehh wait
15:29fdobridge: <karolherbst🐧🦀> might not be
15:29fdobridge: <karolherbst🐧🦀> yeah, seems like maxwell has two bits here
15:29fdobridge: <karolherbst🐧🦀> nvm me
15:30fdobridge: <karolherbst🐧🦀> mhhh
15:30fdobridge: <karolherbst🐧🦀> though.. `.FMZ` does imply `.FTZ`
15:31fdobridge: <gfxstrand> Wait, FMZ implies FTZ?
15:31fdobridge: <gfxstrand> Ugh.
15:31fdobridge: <karolherbst🐧🦀> at least in the assembler 🙂
15:31fdobridge: <karolherbst🐧🦀> not sure if the hardware thinks the same
15:32fdobridge: <karolherbst🐧🦀> but it probably does
15:32fdobridge: <gfxstrand> I guess I can add more cases to the vkrunner test
15:39fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> It still is
15:39fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1173648370775117934/Screenshot_20231113_173825.png?ex=6564b838&is=65524338&hm=96ad944326f1190221b344c5e1706aabe65221810269fd857a16185716a54413&
15:40fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> But the performance is quite something 🚀
15:40fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1173648626736705617/Screenshot_20231113_173411.png?ex=6564b875&is=65524375&hm=8b0b1394d3668729781bfd8ae6f42d025ceed166dbfd16820953085e9e23729f&
15:41fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> This is the only error that appears: `[154260.407508] nouveau 0000:01:00.0: gr: DATA_ERROR 00000004 [INVALID_VALUE] ch 4 [00ffa3f000 Overwatch.exe[158385]] subc 3 class 902d mthd 0224 data eaf70f40`
15:41fdobridge: <karolherbst🐧🦀> mhhh
15:42fdobridge: <karolherbst🐧🦀> let's see
15:42fdobridge: <gfxstrand> Yup, it flushes... Ugh. 😩
15:42fdobridge: <karolherbst🐧🦀> so.. offset is `eaf70f40`
15:42fdobridge: <karolherbst🐧🦀> it might be that the offset is signed 🙃
15:42fdobridge: <gfxstrand> So, yeah, we can't flip on the optimization unless we set FTZ on everything.
15:42fdobridge: <karolherbst🐧🦀> ehh wait
15:43fdobridge: <karolherbst🐧🦀> it's a 64 bit address
15:43fdobridge: <karolherbst🐧🦀> the heck...
15:43fdobridge: <karolherbst🐧🦀> ohh
15:43fdobridge: <karolherbst🐧🦀> alignment
15:43fdobridge: <karolherbst🐧🦀> yeah.. so something is up with the alignment I guess because it can't really be anything else here
15:47fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Is there a way to figure out where the alignment is wrong? Could push_dump logs help?
15:48fdobridge: <karolherbst🐧🦀> mhhhh
15:48fdobridge: <karolherbst🐧🦀> I think we need something new
15:49fdobridge: <karolherbst🐧🦀> @gfxstrand dakr: I think we need a small addition to the UAPI: add a flag so that a synced command submission also returns an error to userspace if a non fatal error is hit, like `INVALID_VALUE` ones, so we can easier identify and debug submissions with slightly faulty stuff
15:49fdobridge: <gfxstrand> Yeah, that's not a bad idea.
15:55fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I'm just looking at tinier details because incorrect rendering in Overwatch 2 is kind of annoying me :triangle_nvk:
15:57fdobridge: <dadschoorse> that is strange, I don't see how it makes the hw simpler 🐸
15:57fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> That Overwatch 2 issue and weird GR class errors when running SuperTuxKart OpenGL or DXVK v1.10+ games with GSP are probably my top 2 nouveau annoyances right now :nouveau:
16:00fdobridge: <dadschoorse> flushing by default is what amd does. d3d11/12 flushes 32bit denorms by default too.
16:01fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> But NAK suddenly became performant thanks to one MR that mhennning made so that's very good NVK progress (and it definitely should be merged 💪)
16:04fdobridge: <gfxstrand> Does DXVK enforce that via float controls?
16:04fdobridge: <dadschoorse> d3d11 dxvk does
16:06fdobridge: <dadschoorse> just to be clear: radv's default is preserve 16bit/64bit denorms, 32bit ftz. d3d is the same
16:07fdobridge: <gfxstrand> Right. If the CUDA docs are to be believed, FTZ only works on 32-bit anyway
16:07dakr: karolherbst: Makes sense. Need to find a way to properly propagate those errors up and connect them to the corresponding job though.
16:07fdobridge: <gfxstrand> So I think we can get the optimization and behavior we want, we might just need to make it float-controls-aware somehow.
16:07fdobridge: <karolherbst🐧🦀> I think it's fair to not use `.FTZ` when 32 bit denorms are to be preserved explicitly
16:08fdobridge: <karolherbst🐧🦀> ehh
16:08fdobridge: <karolherbst🐧🦀> `.FMZ`
16:08fdobridge: <karolherbst🐧🦀> if neither is specified, does it have to be consistent within a shader invocation?
16:09karolherbst: dakr: yeah.. not quite sure how easy that is. I think there are some global error masks for it. Worst case it's a context global config.
16:09fdobridge: <gfxstrand> Without float controls, things are loosy-goosey enough that we can probably do that but I'd rather not.
16:09karolherbst: and uhhh.. we kinda have to pipe that through
16:09karolherbst: or just set it on the first submission
16:11fdobridge: <dadschoorse> fwiw, d3d9 dxvk does not use any float control settings, and that's the use case for fmulz
16:13dakr: karolherbst: I think it could be quite tricky. The only thing I can think of at a first glance is trying to fetch the sequence number of the fence context, but that might be racy..
16:13karolherbst: yeah.....
16:14karolherbst: sooo
16:14karolherbst: maybe we can whack it from userspace
16:14karolherbst: _if_ we can set the error mask of the faults I think..
16:14karolherbst: but I never really understood what bit does what and it kinda depends on the GPU arch
16:15karolherbst: but "make all errors fatal" is just enabling/disabling all bits, so should be fine
16:15fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> You've already whacked MMIO registers from userspace so that isn't new to you
16:15karolherbst: that's why I had the idea :D
16:15karolherbst: but not sure which MMIO reg to poke...
16:16karolherbst: I know that nvgpu added some related regs to their list of allowed ones
16:39fdobridge: <gfxstrand> I guess if we make FTZ the default for 32-bit, we could still do it.
16:44fdobridge: <gfxstrand> Annoying that nvidia unnecessarily tied the two flags, though.
16:45fdobridge: <mhenning> fwiw, codegen has FTZ as the default for non-compute shaders. although I'm not entirely sure why - I'm not aware of us having a perf penalty for handling denorms
16:46fdobridge: <gfxstrand> Probably because that's what nvidia does. 🤷🏻♀️
16:46fdobridge: <gfxstrand> I mean, I haven't actually verified that that's what they do but they probably do
16:47fdobridge: <karolherbst🐧🦀> just ignore the float controls for `fmulz` :ferrisCluelesser:
16:47fdobridge: <karolherbst🐧🦀> nobody is ever going to file a bug for it anyway
16:48fdobridge: <gfxstrand> Nope. I'm not going to make a buggy compiler and hope no one notices.
16:48fdobridge: <karolherbst🐧🦀> is it buggy if it's working as expected?
16:48fdobridge: <karolherbst🐧🦀> 😛
16:49fdobridge: <mhenning> yeah, randomly changing denorm behavior is pretty terrible
16:49fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Or a slow one 🐢
16:51fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Hopefully MR !53 will be merged soon (if it doesn't cause CTS regressions) :ferris:
17:03fdobridge: <gfxstrand> So if d3d9 is the only thing that cares about fmulz and if it wants denorm flushing anyway... Maybe we can make `nir_op_fmulz` flush as its canonical behavior?
17:12fdobridge: <mhenning> The thing is, we currently generate fmulz from pattern matching in opt_algebraic, and I don't know that we can tell that we're running under d3d9 and want flushing
17:15fdobridge: <gfxstrand> We can with enough hackery
17:15fdobridge: <gfxstrand> It already checks that SignedZeroPreserve isn't set
17:15fdobridge: <gfxstrand> We can make the check a bit smarter
18:42fdobridge: <dadschoorse> that would be painful for amd because fmulz uses the global mode there
18:54fdobridge: <gfxstrand> Merged. Thanks!
18:54fdobridge: <gfxstrand> Oh, well that's annoying.
18:55fdobridge: <gfxstrand> I mean, our "global mode" is going to be "add a flag to all the instructions" it's just that FMZ implies FTZ so we can't do it when the global mode isn't the right thing.
18:55fdobridge: <gfxstrand> I'll think some more about it.
18:56fdobridge: <gfxstrand> We can finagle the flags somehow. I just have to figure out how. 🙃
19:08fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Typo moment: `nouveau: Add initial headers and meson for the new compoiler`
19:11fdobridge: <esdrastarsis> Codegen runs the Vulkan Scene example (Sasha Willems) at 75 fps and NAK runs at 1100 fps 😱
19:11fdobridge: <esdrastarsis> Before it ran at around 20 fps
19:11fdobridge: <esdrastarsis> Before it ran at around 20 fps iirc (edited)
19:13fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> This is what I observed too (so it isn't new for me)
19:14fdobridge: <karolherbst🐧🦀> weird 😄
20:18fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Same method as Overwatch 2: `[171236.420478] nouveau 0000:01:00.0: gr: DATA_ERROR 00000004 [INVALID_VALUE] ch 2 [00ffe41000 gta3.exe[205065]] subc 3 class 902d mthd 0224 data ed537540`
20:18fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I'm not sure that's how the game is supposed to look 🐸
20:18fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1173718635169206332/Screenshot_20231113_221549.png?ex=6564f9a9&is=655284a9&hm=7527dcdd04d320802890d6e548641ca8484ba0149ec1c9939eef6a159a40536d&
20:25fdobridge: <clangcat> I haven't played gta3 sine I was a kid but I think it looks about normal I don't remember so much fog but that might just be cause of the mission.
20:28fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> This definitely isn't normal though (I played this save file a lot while testing WineD3D and various other stuff)
20:28fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1173721031123402883/Screenshot_20231113_222624.png?ex=6564fbe4&is=655286e4&hm=35c55a38b2efc0554c6ffbbe8702e682dfbaff88640c173b2a2f83418ca45ec6&
20:29fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I wonder if it's related to the fmulz thing 🤔
20:30fdobridge: <mhenning> The ability to use fmulz is just an optimization - we should still produce correct output without it
20:35fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> NFS Most Wanted 2012 has big gains with NAK (previously it was a massive slideshow with skipping audio at 1080p highest settings without SSAA and AO)
20:35fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1173722899438383215/Screenshot_20231113_223240.png?ex=6564fda1&is=655288a1&hm=c5ece1bcbd7b81cab19a37733638dd8e50d6d9842b3ca5b9071e3e0b0bfd8749&
20:46fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Here's codegen for reference
20:46fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1173725703989117059/Screenshot_20231113_224617.png?ex=6565003e&is=65528b3e&hm=ac937f164dcf5ef3597ab7cb8843c12e1589dbcd4221e0a1a51aeb840529d6da&
20:48fdobridge: <karolherbst🐧🦀> I wonder if scheduling is trashed or something...
20:48fdobridge: <karolherbst🐧🦀> ohh
20:48fdobridge: <karolherbst🐧🦀> codegen is faster
20:48fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I mean there's a weird fog issue on NAK
20:50fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> NAK is now always faster than codegen BTW
20:50fdobridge: <karolherbst🐧🦀> ahh
20:56fdobridge: <gfxstrand> It's the UBO caching settings.
20:57fdobridge: <gfxstrand> It'll get even faster once we get bindless UBOs hooked up and maybe a little cbuf lowering for older gens.
20:59fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> This fog issue definitely happens inside a pixel shader because it only happens with `NVK_USE_NAK=fs`
21:08fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Should I send the SPIR-V shaders dumped by DXVK? 🤔
21:18fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Here's `NAK_DEBUG=print NVK_USE_NAK=fs` :ferris:
21:18fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1173733772051628062/gta3_nak.log?ex=656507c2&is=655292c2&hm=f13d1250a55cc91f95583504c196175bec7ea27d9d063070eb68a5d18653d299&
21:24fdobridge: <esdrastarsis> what about zcull?
21:36fdobridge: <mhenning> Do you think you could get a renderdoc vulkan trace? I might have time to hack at this later
21:36fdobridge: <mhenning> @asdqueerfromeu ^
21:38fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I haven't tried RenderDoc before (but I might try to make a trace with it later)
22:13fdobridge: <gfxstrand> https://tenor.com/view/mario-movie-peach-here-we-go-mario-kart-lets-go-gif-4834955107827977209
22:23fdobridge: <esdrastarsis> In Rust we trust
22:56fdobridge: <esdrastarsis> I think NVK has improved a little...
22:56fdobridge: <esdrastarsis> https://cdn.discordapp.com/attachments/1034184951790305330/1173758280082923571/13-11-23-195405-SCREEN.jpg?ex=65651e95&is=6552a995&hm=cf2cd99a1d9bcc53b8c874ef996f81ff4d5703d9aa4dc067e5bbcba818aab677&
22:57fdobridge: <karolherbst🐧🦀> :ferrisBongo:
23:05fdobridge: <mohamexiety> wow, 67 FPS
23:05fdobridge: <mohamexiety> that's actually very playable 😮
23:05fdobridge:<mohamexiety> nice!
23:12fdobridge: <esdrastarsis> lol
23:12fdobridge: <esdrastarsis> https://cdn.discordapp.com/attachments/1034184951790305330/1173762250306228284/13-11-23-201154-SCREEN.jpg?ex=65652247&is=6552ad47&hm=e59e6383a7c47d8ad17fa1d0fd6ed2c3defbd92df29efd5501be78f027ca5334&
23:27fdobridge: <gfxstrand> Ooh, alpha blending bug from the looks of it
23:27fdobridge: <gfxstrand> Could be the same issue as the other game, TBH.
23:28fdobridge: <gfxstrand> Go ahead and file a bug. As long as you're on Turing+, I don't mind seeing app bugs now. RenderDoc traces exhibiting the bug would be preferred.
23:28fdobridge: <gfxstrand> Hrm... I have an idea, actually.
23:29fdobridge: <gfxstrand> Does it do that with codegen?
23:33fdobridge: <esdrastarsis> yes
23:33fdobridge: <esdrastarsis> https://cdn.discordapp.com/attachments/1034184951790305330/1173767645582065704/13-11-23-203258-SCREEN.jpg?ex=6565274e&is=6552b24e&hm=3b95e84f5b9978683ab437da681f597e49a673125d222b4897bed203bbb4e8be&
23:35fdobridge: <karolherbst🐧🦀> I kinda wonder why codegen is _that_ slow, or maybe NAK just have the better pipeline.. does NAK use actual UBOs yet?
23:36fdobridge: <karolherbst🐧🦀> looks like something isn't transparent even though it should be
23:38fdobridge: <gfxstrand> Nope but NAK does now use the Constant caching mode for UBOs
23:42fdobridge: <gfxstrand> Really, like 70% of our perf delta is just UBOs.
23:42fdobridge: <karolherbst🐧🦀> ahh
23:42fdobridge: <gfxstrand> I know what needs to be done but it requires even more compiler work. We'll get there.
23:42fdobridge: <karolherbst🐧🦀> so just the `.CONSTANT` thing?
23:42fdobridge: <gfxstrand> Yup
23:42fdobridge: <karolherbst🐧🦀> impressive
23:42fdobridge: <karolherbst🐧🦀> but well.. caches are there for a reason 😄
23:43fdobridge: <karolherbst🐧🦀> I'll guess you'll start with bindless UBOs first?
23:45fdobridge: <gfxstrand> Yeah
23:45fdobridge: <gfxstrand> That's the plan.
23:46fdobridge: <gfxstrand> Though using `.CONSTANT` massively improves things in the mean time.
23:46fdobridge: <gfxstrand> And we'll eventually want something better than `.CONSTANT` for pre-Turing but I want to get UBOs working first so I know how it'll fit in better.
23:47fdobridge: <karolherbst🐧🦀> I'm surprised it's that much of a difference already
23:48fdobridge: <karolherbst🐧🦀> but I think HdkR also said, that on newer hardware .CONSTANT is entirely special
23:51HdkR: It's good stuff
23:51fdobridge: <gfxstrand> Yeah, I wouldn't be surprised if .CONSTANT on Turing+ isn't pretty similar to a bindless UBO. The difference being that a bindless UBO can be referenced from an ALU op directly whereas .CONSTANT is a separate load op.
23:52fdobridge: <karolherbst🐧🦀> the caching will also work differently
23:52fdobridge: <karolherbst🐧🦀> the bindless ubo address contains the size
23:52HdkR: yea, encoding directly in the instruction is a small win still
23:52fdobridge: <karolherbst🐧🦀> so the hardware could actually pull more into the cache already
23:52fdobridge: <karolherbst🐧🦀> or less...
23:52fdobridge: <karolherbst🐧🦀> depending on things I guess
23:54fdobridge: <gfxstrand> The hardware is probably going to be pulling either 4K pages or 64B lines.
23:54fdobridge: <gfxstrand> Or maybe a different line size. IDK. I wouldn't be surprised if NV GPUs don't have like a 384B line or something odd like that.
23:55fdobridge: <gfxstrand> In any case, as long as it's no more than a page, it won't fault regardless of bounds checking.
23:55fdobridge: <karolherbst🐧🦀> yeah... something something
23:55fdobridge: <gfxstrand> for bound UBOs, I think they DMA into the cache up-front or something crazy like that.
23:56fdobridge: <karolherbst🐧🦀> yeah, they should
23:56fdobridge: <karolherbst🐧🦀> otherwise it would be kinda pointless