10:52fdobridge: <nanokatze> who can I poke about nouveau's vm_bind
10:52fdobridge: <nanokatze> who can I poke with questions about nouveau's vm_bind (edited)
10:52fdobridge: <nanokatze> the async case specifically
10:53fdobridge: <Sid> https://dontasktoask.com
10:53fdobridge: <Sid> just ask, whoever knows will likely answer
10:54fdobridge: <nanokatze> eh sure whatever
10:55fdobridge: <nanokatze> does vm_bind in nouveau block the caller while waiting for in-syncobjs or is that somehow avoided?
10:55fdobridge: <nanokatze> does vm_bind in nouveau block the caller while waiting for in-syncobjs? (edited)
10:59fdobridge: <nanokatze> @mohamexiety I see your sparse MR uses sync vm_bind, but would you happen to know the answer to above still?
11:00fdobridge: <nanokatze> oops nvm I didn't read the MR fully
11:11fdobridge: <mohamexiety> that's a good question actually. I know the op itself is blocked, but not sure about the caller
11:16fdobridge: <airlied> No the caller isnt blocked for async binds
11:18fdobridge: <nanokatze> thanks, cool
15:10fdobridge: <gfxstrand> That's codegen falling over. What GPU do you have?
15:10fdobridge: <pavlo_it_115> GP 107
15:10fdobridge: <pavlo_it_115> GP 107 😢 (edited)
15:11fdobridge: <pavlo_it_115> GP107 😢 (edited)
15:11fdobridge: <gfxstrand> Ah, okay. Yeah, that's old enough that NAK doesn't work yet.
15:16fdobridge: <pavlo_it_115> @gfxstrand What can you say about it? Are there any guesses why "cpu overhead" occurs.
15:16fdobridge: <pavlo_it_115> https://discord.com/channels/1033216351990456371/1034184951790305330/1209855970029740063
15:16fdobridge: <pavlo_it_115> https://discord.com/channels/1033216351990456371/1034184951790305330/1209953180444663878
15:16fdobridge: <pavlo_it_115> https://discord.com/channels/1033216351990456371/1034184951790305330/1209955958680322109
15:16fdobridge: <pavlo_it_115> Is it somehow related to NAK?
15:22fdobridge: <gfxstrand> CPU overhead has nothing to do with NAK.
15:25fdobridge: <gfxstrand> Also, without debug symbols for NVK, it's really hard to tell what's going on in those.
15:27fdobridge: <pavlo_it_115> So it's going to be really hard to figure out what's going on with zink?
15:31fdobridge: <redsheep> Is there documentation somewhere on building with debug symbols?
15:32fdobridge: <redsheep> I'd like to be able to see what's going on with CPU side performance, though insight into GPU side perf is probably more important.
15:35fdobridge: <!DodoNVK (she) 🇱🇹> Just use the debug buildtype in Meson I think
15:36fdobridge: <zmike.> debugoptimized*
15:39fdobridge: <!DodoNVK (she) 🇱🇹> I kind of forgot about -Og
15:39fdobridge: <karolherbst🐧🦀> that's not debug optimized
15:40fdobridge: <karolherbst🐧🦀> for CPU profiling you want to compile at the same optimization level as your release build anyway
15:40fdobridge: <karolherbst🐧🦀> you just want to have debug information
15:43fdobridge: <!DodoNVK (she) 🇱🇹> So it's just -O2 -g then? 🐸
15:43fdobridge: <karolherbst🐧🦀> basically
15:44fdobridge: <karolherbst🐧🦀> I don't know what's the deal with Og, because everytime I used it, it didn't help with debugging
15:44fdobridge: <!DodoNVK (she) 🇱🇹> Maybe it prevents the `<optimized out>` stuff in gdb?
15:45fdobridge: <karolherbst🐧🦀> it doesn't
15:51fdobridge: <karolherbst🐧🦀> yeah, I honestly have no idea what's the point of )g
15:51fdobridge: <karolherbst🐧🦀> yeah, I honestly have no idea what's the point of Og (edited)
15:55fdobridge: <!DodoNVK (she) 🇱🇹> -Os does actually reduce Wine size quite a bit (but it also caused a weird bug once)
17:30fdobridge: <gfxstrand> @karolherbst So, for bound textures we just stuff them in a CB?
17:30fdobridge: <gfxstrand> Am I reading this right?
17:32fdobridge: <karolherbst🐧🦀> correct
17:33fdobridge: <karolherbst🐧🦀> there are methods on the channel to select which CB
17:34fdobridge: <karolherbst🐧🦀> with Volta+ you define the CB in the instruction encoding
17:40fdobridge: <gfxstrand> The 14 bits of index... Is that an offset or offset / 4?
17:41fdobridge: <karolherbst🐧🦀> / 4
17:41fdobridge: <gfxstrand> cool
17:41fdobridge: <gfxstrand> That makes everything easy then
18:59fdobridge: <zmike.> now that I know nvk doesn't have BAR it's making fixing bugs a lot easier
19:00fdobridge: <Sid> about that
19:01fdobridge: <Sid> if you prevent nouveau from binding to the GPU at boot
19:01fdobridge: <Sid> and modprobe it later
19:01fdobridge: <Sid> it *does* show the full BAR size
19:02fdobridge: <Sid> whether or not it uses BAR is a different question, but I poked around this a while ago and found out that if nouveau is initialized before the kernel pci driver is done doing its thing, nouveau only exposes the default bar size
19:15fdobridge: <gfxstrand> @karolherbst Does the hardware have a way to provide different texture and sampler handles?
19:16fdobridge: <karolherbst🐧🦀> yes
19:16fdobridge: <karolherbst🐧🦀> you can have indirects
19:16fdobridge: <karolherbst🐧🦀> so you push one part (or both in the case of bindless) into a register
19:17fdobridge: <karolherbst🐧🦀> though I think in the indirect case you can only have the sampler be pulled from a reg
19:17fdobridge: <karolherbst🐧🦀> but there is also this independent vs not thing
19:18fdobridge: <karolherbst🐧🦀> ohh.. looks like with uniform registers you can have both indirect
19:18fdobridge: <karolherbst🐧🦀> it's using a vec2 then
19:19fdobridge: <karolherbst🐧🦀> uniform reg bindless still has an 8 bit offset encoded
19:20fdobridge: <karolherbst🐧🦀> (for the tex header)
19:20fdobridge: <karolherbst🐧🦀> the mixed bindless form are like bindless, just that the sampler gets pulled from the bindless handle
19:21fdobridge: <karolherbst🐧🦀> @gfxstrand or did you mean if both are direct?
19:22fdobridge: <karolherbst🐧🦀> the immediate is 32 bit, and 19:0 is the tex, 31:20 the sampler
19:23fdobridge: <karolherbst🐧🦀> in mixed bindless it's 19:0 for the tex only, not sure if the immediate is also smaller then
19:27fdobridge: <redsheep> How close do you think we are to being able to run a whole session? Or is that already expected to be working?
19:29fdobridge: <zmike.> dunno
19:29fdobridge: <zmike.> try and see
19:30fdobridge: <redsheep> I tried a couple weeks ago and it hard locked my system
19:30fdobridge: <redsheep> I know there was an issue where you said you had a working gnome session
19:31fdobridge: <redsheep> I'll try again, maybe the zink updates I have now will be enough
19:35fdobridge: <airlied> I think nvk needs modifiers to run a session
19:35fdobridge: <redsheep> That was my impression, and I still don't know how to get the pieces together to test turning that on
20:01fdobridge: <redsheep> WOOO it works now! Nice!
20:02fdobridge: <redsheep> It's laggy but it works. Modifiers aren't required then? Not sure how this is functioning but it is, I can confirm my apps see zink
20:04fdobridge: <zmike.> are you sure it's zink+nvk and not zink+lavapipe?
20:04fdobridge: <redsheep> If it's lavapipe then it's impressive that my cpu can run minecraft at 90 fps
20:04fdobridge: <zmike.> I meant the display server
20:05fdobridge: <redsheep> It says zink Vulkan 1.3(AD102 (MESA_NVK)) in my testing
20:05fdobridge: <redsheep> Any way I can see what the display server is using?
20:10fdobridge: <redsheep> Looks like it is nvk to me
20:10fdobridge: <redsheep>
20:10fdobridge: <redsheep> ```Feb 22 13:00:19 jared-linux kwin_x11[1141]: OpenGL renderer string: zink Vulkan 1.3(AD102 (MESA_NVK))
20:10fdobridge: <redsheep> Feb 22 13:00:20 jared-linux kwin_x11[1141]: MESA: error: zink: display server doesn't support DRI3 modifiers and driver can't handle INVALID<->LINEAR!
20:10fdobridge: <redsheep> ```
20:12fdobridge: <redsheep> Nothing in journalctl mentions lavapipe or llvmpipe from this boot
20:15fdobridge: <redsheep> Maybe I will try zmike/test later to see if desktop performance is better over there
20:17fdobridge: <redsheep> Hmm. Things might not be quite so peachy, There's a slowly growing corruption happening in discord when running like this
20:19fdobridge: <redsheep> https://cdn.discordapp.com/attachments/1034184951790305330/1210320061730922566/Screenshot_20240222_131922.png?ex=65ea2165&is=65d7ac65&hm=f4d453a3cb0d17a29fa3b6801c0982fef5ddcd02cd503139b8c137ba768b67cc&
20:21fdobridge: <pac85> New server logo?
20:21fdobridge: <redsheep> New alternative to cursed gears
20:27fdobridge: <gfxstrand> New emoji incoming
20:29fdobridge: <gfxstrand> Ugh... codegen is incomprehensible...
20:53fdobridge: <gfxstrand> So do non-bindless with an offset use the same opcode as bindless but without `.B`?
20:55fdobridge: <benjaminl> my recollection is yes
20:55fdobridge: <benjaminl> the encoding is different, but codegen represents them the same in the IR
20:56fdobridge: <gfxstrand> Yeah, codegen is a mess. 😩
20:56fdobridge: <gfxstrand> But emit_gv100.c always sets .B when there's an indirect
20:57fdobridge: <zmike.> I think a lot of the glcts buffer mapping tests are broken
20:58fdobridge: <zmike.> principal-skinner.meme
20:58fdobridge: <redsheep> Is that likely the cause of the corruption?
20:58fdobridge: <!DodoNVK (she) 🇱🇹> ~~Could Karol help comprehend it?~~
20:58fdobridge: <gfxstrand> I've given up on comprehending it. I just try to comprehend the ISA
21:00fdobridge: <redsheep> If zink can run the session now codegen is basically done being a part of anything properly supported with gsp, right?
21:01fdobridge: <gfxstrand> yeah
21:10fdobridge: <karolherbst🐧🦀> mixed bindless also uses .B
21:10fdobridge: <gfxstrand> Figures...
21:10fdobridge: <gfxstrand> So what bits control what mode you're in? Is it all different opcodes or does the opcode just control whether or not there's an indirect bit?
21:10fdobridge: <karolherbst🐧🦀> except you use the uniform register encoding
21:11fdobridge: <karolherbst🐧🦀> I don't know
21:11fdobridge: <gfxstrand> Oh, well that's not all that useful...
21:11fdobridge: <karolherbst🐧🦀> it also doens't help that there is a "GL" and a "D3D" mixed bindless mode
21:11fdobridge: <gfxstrand> 🤡
21:12fdobridge: <karolherbst🐧🦀> the D3D one is weird
21:14fdobridge: <karolherbst🐧🦀> the D3D one pulls the tex desc from the encoded CB location + and offset specified in the encoding
21:15fdobridge: <karolherbst🐧🦀> but I think it has less bits for the location?
21:15fdobridge: <karolherbst🐧🦀> the GL one is the one with a 14 bit CB address, the d3d one uses 6 bit for the pull and 8 bit for an offset
21:19fdobridge: <!DodoNVK (she) 🇱🇹> ~~NouveauD3D when?~~
21:36fdobridge: <redsheep> Obviously a joke, but I'm sure in the long run if extra performance is needed then special paths through dxvk and vkd3d will get the directx perf to where it should be
21:37fdobridge: <redsheep> That was being discussed for descriptors in vkd3d iirc
21:46fdobridge: <!DodoNVK (she) 🇱🇹> Will Pascal ever achieve good vkd3d-proton performance though?
21:46fdobridge: <Sid> lmao
21:49fdobridge: <redsheep> I mean no, but pascal won't ever achieve good performance period
21:56fdobridge: <redsheep> Also, with regards to vkd3d specifically it's no great loss if that never works well even if the firmware situation somehow got resolved. All of the really new dx12 only games run like garbage on pascal, even on the 1080ti
21:56fdobridge: <!DodoNVK (she) 🇱🇹> Even on Windows?
21:56fdobridge: <redsheep> Yes
21:56fdobridge: <redsheep> Alan Wake 2 is a disaster on pascal
21:59fdobridge: <rhed0x> Alan Wake 2 is a special case because it practically requires Mesh shaders
22:00fdobridge: <rhed0x> their fallback is really really slow
22:00fdobridge: <rhed0x> the game literally has a warning popup if you start it on a gpu that doesnt support mesh shaders
22:00fdobridge: <redsheep> There's that. Still, very few games are requiring dx12 and they're all crazy intensive
22:00fdobridge: <rhed0x> > very few games are requiring dx12
22:00fdobridge: <rhed0x> huh?
22:00fdobridge:<huntercz122> few
22:01fdobridge: <rhed0x> practically everything released in the last 3 years has been d3d12
22:01fdobridge: <huntercz122> every aaa release nowadays uses dx12
22:01fdobridge: <Sid> every AAA, yeah
22:01fdobridge: <redsheep> Lots have fallbacks, dx12 in the system requirements is a pretty new development
22:03fdobridge: <Sid> I'd say that was true in 2020
22:03fdobridge: <Sid> around the release of Cyberpunk
22:03fdobridge: <Sid> since then more and more AAA have been releasing with dx12 only
22:04fdobridge: <rhed0x> big releases in the last couple of years that only support d3d12
22:04fdobridge: <rhed0x> Dead Space Remake
22:04fdobridge: <rhed0x> RE4 Remake
22:04fdobridge: <rhed0x> Last of Us Remake
22:04fdobridge: <rhed0x> Jedi Survivor
22:05fdobridge: <rhed0x> Alan Wake 2
22:05fdobridge: <rhed0x> Uncharted 4
22:05fdobridge: <rhed0x> Avatar Frontiers of Pandora
22:05fdobridge: <rhed0x> Anno 1800
22:05fdobridge: <rhed0x> Hifi Rush
22:05fdobridge: <rhed0x> The Finals
22:05fdobridge: <rhed0x> Lies of P
22:05fdobridge: <rhed0x> Cyberpunk
22:05fdobridge: <rhed0x> Spiderman
22:05fdobridge: <rhed0x> Horizon Zero dawn
22:05fdobridge: <rhed0x> Halo Infinite
22:05fdobridge: <rhed0x> Diablo 4
22:05fdobridge: <rhed0x> Elden Ring
22:05fdobridge: <rhed0x> Forza Horizon 5
22:05fdobridge: <rhed0x> Hogwarts Legacy
22:05fdobridge: <rhed0x> Deathloop
22:05fdobridge: <rhed0x> Death Stranding
22:05fdobridge: <rhed0x> RE8
22:05fdobridge: <rhed0x> Plague Tale Requiem
22:05fdobridge: <rhed0x> sooo yeah
22:05fdobridge: <rhed0x> not rare :P
22:05fdobridge: <rhed0x> big releases in the last couple of years that only support d3d12
22:05fdobridge: <rhed0x> Dead Space Remake
22:05fdobridge: <rhed0x> RE4 Remake
22:05fdobridge: <rhed0x> Last of Us Remake
22:05fdobridge: <rhed0x> Jedi Survivor
22:05fdobridge: <rhed0x> Alan Wake 2
22:05fdobridge: <rhed0x> Uncharted 4
22:05fdobridge: <rhed0x> Avatar Frontiers of Pandora
22:05fdobridge: <rhed0x> Anno 1800
22:05fdobridge: <rhed0x> Hifi Rush
22:05fdobridge: <rhed0x> The Finals
22:05fdobridge: <rhed0x> Lies of P
22:05fdobridge: <rhed0x> Cyberpunk
22:05fdobridge: <rhed0x> Spiderman
22:05fdobridge: <rhed0x> Horizon Zero dawn
22:05fdobridge: <rhed0x> Halo Infinite
22:05fdobridge: <rhed0x> Diablo 4
22:05fdobridge: <rhed0x> Elden Ring
22:05fdobridge: <rhed0x> Forza Horizon 5
22:05fdobridge: <rhed0x> Hogwarts Legacy
22:05fdobridge: <rhed0x> Deathloop
22:05fdobridge: <rhed0x> Death Stranding
22:05fdobridge: <rhed0x> RE8
22:05fdobridge: <rhed0x> Plague Tale Requiem
22:06fdobridge: <rhed0x> Callisto Protocol
22:06fdobridge: <rhed0x> Diablo 2 Resurrected (edited)
22:06fdobridge: <redsheep> There's like 90k games on steam right? Also again most of those are heavy enough to perform poorly on a 1080ti
22:06fdobridge: <rhed0x> yeah like i said, last 3 years...
22:08fdobridge: <Sid> breaking news: latest AAA releases perform poorly on ~7 year old hardware
22:08fdobridge: <Sid> e-e
22:08fdobridge: <dadschoorse> imagine not having 2012 hw that can support dx12 binding model with great performance
22:08fdobridge: <rhed0x> a lot of those run okay on Pascal on Windows
22:12fdobridge: <dadschoorse> kind of funny how amd went from basically d3d11 descriptors in hw to what is still one of the most flexible gpu descriptor model in one gen, while nvidia had a lot more change over the years
23:10fdobridge: <redsheep> Testing more now with zink on nvk and the performance is kind of bizarre. The cursor seems to update every frame, and my main performance test in the talos principle only drops from 74 fps to 65, but the frame delivery is working so poorly that I can count every actual delivered frame, and it's about 5 fps visually.
23:18fdobridge: <redsheep> To be clear: That is with further testing of zink running the entire session. I know that talos is a vulkan game.
23:29fdobridge: <redsheep> Dropping to one monitor didn't fix it, though dropping resolution kind of does
23:35fdobridge: <redsheep> Frame capping the game to try to give some headroom didn't change anything, seems to me like kwin itself is the thing running at 5 fps and nothing really changes that
23:46fdobridge: <redsheep> @zmike. Just an update on that corruption, the only application I have found to be doing that is discord, and only on the server selection bar. So far I haven't found any other issues with a zink session outside of performance, and when running a normal session discord will still corrupt on zink, so it is not unique to having it run the entire session.