02:06fdobridge: <butterflies> Tegra 3 was quite crappy even at release
02:06fdobridge: <butterflies> and I don't think Tegra 4 ever got OGLES3.0?
02:06fdobridge: <butterflies> cemetery might be the best place for those parts :/
02:39fdobridge: <orowith2os> the urge to see if nvk can get barely reasonable performance without gsp is immense
02:39fdobridge: <orowith2os> that might actually give enough of an incentive to optimize the hell out of it too :p
02:53fdobridge: <airlied> optimise what though?
02:53fdobridge: <airlied> like we'd already be optimising shaders as much we possibly could
02:53fdobridge: <airlied> it would make no sense not to
03:13fdobridge: <orowith2os> (clueless)
03:14fdobridge: <orowith2os> what even *is* there to optimize?
03:14fdobridge: <orowith2os> once gsp is ready and the nvidia driver can be used as a reference point for performance
03:17fdobridge: <airlied> lots of things to optimize, but nothing that would remotely help if you can't reclock
03:18fdobridge: <orowith2os> ah
03:18fdobridge: <orowith2os> so in any event, without reclocking, you'd have nothing to work upt o anyways
03:19fdobridge: <orowith2os> no magically getting 10 FPS in Halo
09:20soyoungface: Cryptography ways to solve those tasks are very many, I intended to demonstrate only the opening moves and base opening strategy, like imagined how much chess first amounts of moves influence the match, those moves in related programming open the door of wide range of hashed ways to execute code, their maintaining team does same amount of work as for now, cause there are so many ways to layout banks on top of vulkan, opengl, directx spec
09:59fdobridge: <!DodoNVK (she) 🇱🇹> I can see a NAK branch for 8-bit/16-bit srorage support :ferris:
09:59fdobridge: <!DodoNVK (she) 🇱🇹> I can see a NAK branch for Vulkan 8-bit/16-bit storage support :ferris: (edited)
10:31fdobridge: <rhed0x> 1.1 hype
15:48fdobridge: <gfxstrand> Yup. There's going to need to be some tweaking before 8/16-bit arithmetic will be competent but it should work, at least for integers. Fp16 is going to take more work but this is a nice first step.
15:50youngfaceatold: So event loop is for only input devices, there's only so few keys, you can index them however you want, and carry them in, they are part of the global state, so if SDL_KEYDOWN is accessed, it is given to be filled in by it's global state index, in other words if it really was pressed, the branch is taken too, In the future somewhere, there's a transition pass from baremetal key buffer, to focal key index, that test can always be
15:50youngfaceatold: performed cause it costs few, it's just the pass at runtime, it's all global state is indexed indirect, global state is in the hash indexed by PC in some range, and is always tested in uniform fashion.
15:52youngfaceatold: Global state has read and write passes.
15:53fdobridge: <rhed0x> @gfxstrand Bit of a random question:
15:53fdobridge: <rhed0x> NV seemingly implements descriptor buffers using indices rather than writing descriptors directly. The extra indirection is probably not great for VKD3D-Proton perf. Do you think they did it like that just because that was probably easiest with their existing code base or is there some incompatibility with the extension and NV HW?
15:55fdobridge: <gfxstrand> Oh, the extension is unimplementable on NV as is.
15:55fdobridge: <gfxstrand> That's just not how NV hardware works.
15:56fdobridge: <rhed0x> :(
15:57fdobridge: <gfxstrand> The fact that NVIDIA was forced to implement it by vkd3d is kinda sad, IMO. It's basically VK_EXT_amd_descriptors.
15:57fdobridge: <gfxstrand> Not that we don't need something but, short of a major hardware shift on the NVIDIA side, that's not it.
15:58fdobridge: <rhed0x> It's not that different from d3d12 which seems to work well for Nvidia hw
15:58fdobridge: <rhed0x> anything in particular that's especially problematic?
15:58fdobridge: <gfxstrand> And, yeah, that's frustrating. There's work ongoing to come to with something that sucks less but it's a hard problem.
15:59fdobridge: <gfxstrand> It's very different from D3D12.
16:00fdobridge: <rhed0x> its different in that the descriptor heaps are buffers instead of something more opaque, aside from that the major differences are how it interacts with the rest of vulkan from what I can tell
16:00fdobridge: <rhed0x> like the push constant bits for example
16:01fdobridge: <gfxstrand> Well, that's another problem... There are enough different feature/prop bits that the extension can take multiple very different shapes. One of those shapes looks like an awkward version of D3D12. Others look more like a console API. The shape you get very much depends on the implementation. 😫
16:02fdobridge: <pac85> Maybe we need VK_EXT_dx12_descriptor_model
16:02fdobridge: <gfxstrand> There are reasons why Microsoft made it opaque...
16:02fdobridge: <rhed0x> VK\_MESA\_dx12\_descriptor\_model 🙃
16:02fdobridge: <rhed0x> okay thanks a lot
16:03fdobridge: <rhed0x> disappointing, so we probably wont fix the performance ranging from meh to terrible in d3d12 games any time soon :(
16:04fdobridge: <pac85> On NV you mean?
16:04fdobridge: <gfxstrand> The other thing is that D3D12 doesn't have `vkCmdBlitImage()` which means a lot less places where the driver needs its own descriptors. That makes exposing internal heaps more palatable.
16:05fdobridge: <gfxstrand> Are they meh to terrible on the proprietary driver?
16:05fdobridge: <rhed0x> yup
16:05fdobridge: <rhed0x> last games I played on linux were Dead Space (2023) and Returnal
16:05fdobridge: <gfxstrand> Ugh...
16:05fdobridge: <rhed0x> Dead Space ran at 60% windows perf
16:05fdobridge: <rhed0x> returnal at 50%
16:05fdobridge: <gfxstrand> Yuck!
16:05fdobridge: <rhed0x> thats when I quit and ran back to windows :(
16:05fdobridge: <rhed0x> lack of VRR support or fractional scaling was another reason
16:05fdobridge: <pac85> And what us the speef on amd? How much if that is attributed to descriptors?
16:05fdobridge: <pac85> And what is the speed on amd? How much if that is attributed to descriptors? (edited)
16:06fdobridge: <rhed0x> perf on AMD seems to be generally competitive
16:06fdobridge: <rhed0x> if its slower its like 90% of Windows as far as i know
16:06fdobridge: <rhed0x> also the prop driver shits the bed with Xid 109 in a lot of D3D12 games
16:06fdobridge: <rhed0x> also the prop NV driver shits the bed with Xid 109 in a lot of D3D12 games (edited)
16:07fdobridge: <gfxstrand> I mean, it might be solvable. I have different opinions about driver design than the Nvidia folks and may make better/different choices. Time will tell. I'm probably 6 months away from seriously looking at descriptors.
16:07fdobridge: <rhed0x> yeah I was just curious about the descriptor buffer extension
16:08fdobridge: <rhed0x> kind of annoying that khronos didnt land on a design that works on NV hardware too but I know you cant talk about that
16:09fdobridge: <gfxstrand> It's less that I can't talk about it and more that there isn't much to say. 🤷🏻♀️
16:09fdobridge: <rhed0x> d3d12 translation aside, descriptor buffers, D3D12 descriptor heaps and metal argument buffers would make for a nice common abstraction in a renderer
16:10fdobridge: <pac85> So the problem with descriptor buffer is thay it allows you to bind buffers as descriptors at any time and write into them explicitly, whereas on nvidia you'd have to write both the heap and the buffer with the indices into the heap right?
16:11fdobridge: <gfxstrand> One of the big problems we have in Khronos is that each engineer only knows a small piece of the problem and they're all kinda cagey so it's hard to find globally optimal solutions.
16:11fdobridge: <rhed0x> that makes sense
16:11fdobridge: <gfxstrand> AMD and NVIDIA both have really competent hardware designs but they're so different that it's nearly impossible to agree on an abstraction that doesn't suck for someone.
16:12fdobridge: <rhed0x> d3d12 seems to have managed
16:13fdobridge: <gfxstrand> D3D12 only cares about 2 hardware vendors
16:13fdobridge: <rhed0x> yeah fair
16:13fdobridge: <gfxstrand> (No, they don't actually care about Intel. 😂)
16:13fdobridge: <pac85> I was about to ask
16:14fdobridge: <pac85> But now they do arm laptops so maybe they need to start caring?
16:14fdobridge: <rhed0x> Qualcomm seems to do alright with d3d12 too from what I've heard
16:14fdobridge: <gfxstrand> I mean, they do. They care about Intel working. But also Intel is just knock-off NVIDIA and they can push Intel around easier than the other two.
16:15fdobridge: <gfxstrand> Descriptor buffer is basically what Qualcomm has.
16:15fdobridge: <triang3l> It hasn't, the actual shader-visible heap size limit can't be queried (except via handling the result of CreateDescriptorHeap maybe), AMD doesn't need the distinction between shader-visible and non-shader-visible at all (except for copy source usage purposes), and buffers probably have padding to texture descriptor size
16:15fdobridge: <rhed0x> d3d12 was designed for high end AAA games back then and Intels smol iGPUs werent gonna run those anyway. so I guess if Intel had to do some hardware design work anyway why cater to them?
16:15fdobridge: <gfxstrand> https://www.gfxstrand.net/faith/blog/2022/08/descriptors-are-hard/
16:16fdobridge: <rhed0x> i know that blog post, its great
16:17fdobridge: <triang3l> + 🇫🇫🇫🟦🇫🇫🇫 soon™️ :frog_gears:
16:17fdobridge: <triang3l> + 🇫 🇫 🇫 🟦 🇫 🇫 🇫 soon™️ :frog_gears: (edited)
16:17fdobridge: <gfxstrand> Yeah, the D3D12 model isn't really optimal for AMD, either. 🫤 It's okay but not amazing.
16:17fdobridge: <rhed0x> low level apis were a mistake 🙃
16:17fdobridge: <gfxstrand> Lol
16:17fdobridge: <pac85> Does it work like amd? I never went deep into it but that code in turnip looks strikingly similar to the descriptor code in radv
16:17fdobridge: <rhed0x> return to cozy slot binding
16:17fdobridge: <triang3l> (maybe F/D for SSBOs)
16:18fdobridge: <triang3l> (maybe F/D for SSBOs, though actually D on top of one F) (edited)
16:18fdobridge: <!DodoNVK (she) 🇱🇹> Return to OpenGL :frog_gears:
16:18fdobridge: <rhed0x> oh god no
16:18fdobridge: <gfxstrand> GPUs were a mistake.
16:19fdobridge: <gfxstrand> Or maybe trying to abstract them was? 😂
16:19fdobridge: <triang3l> Rendition Vérité and Intel Larrabee weren't
16:19fdobridge: <gfxstrand> We should just use Metal and NNAPI everywhere and everyone else should just pick one and implement it.
16:20fdobridge: <gfxstrand> (No, I'm not actually suggesting that. It would be terrible.)
16:20fdobridge: <triang3l> Who needs Vulkan and Direct3D 12 when there are DRM and D3DKMT
16:20fdobridge: <gfxstrand> But... Ugh... IDK... Job security? 🤷🏻♀️
16:20fdobridge: <pac85> Uh so according to this on adreno it's ~amd but instead of loading the descriptor from the shader the shader just points hw to it. Makes sense that the code to write descriptor is similar to radv then
16:21fdobridge: <rhed0x> whats nnapi again?
16:21fdobridge: <rhed0x> im sure you dont mean android neural network api 🙃
16:21fdobridge: <rhed0x> im sure you dont mean androids neural network api 🙃 (edited)
16:21fdobridge: <triang3l> Nintendo Nvidia something?
16:21fdobridge: <pac85> I mean, we have more APIs than hw so driving hw directly would leas to less fragmentation /s
16:21fdobridge: <rhed0x> thats NVN
16:21fdobridge: <gfxstrand> Yeah, Qualcomm and recent Mali are both basically VK_EXT_descriptor_buffer in hardware only without all the "make it not super suck on NVIDIA" garbage.
16:22fdobridge: <gfxstrand> I meant NVN
16:22fdobridge: <rhed0x> its so annoying that console SDKs are behind NDAs
16:22fdobridge: <triang3l> And Mantle?
16:22fdobridge: <rhed0x> the hardware is borderline identical anyway
16:22fdobridge: <rhed0x> gimme the docs, who cares
16:22fdobridge: <gfxstrand> We should just use Metal and NVN everywhere and everyone else should just pick one and implement it. (edited)
16:23fdobridge: <gfxstrand> We should just use Mantle and NVN everywhere and everyone else should just pick one and implement it. (edited)
16:23fdobridge: <gfxstrand> Yes
16:23fdobridge: <triang3l> And Metal when you want nice pixel-local storage
16:23fdobridge: <gfxstrand> Yeah....
16:23fdobridge: <rhed0x> with the buffer object very slowly going away in vulkan, we're not that far off from mantle anyway
16:24fdobridge: <mohamexiety> damn I thought poor NV vkd3d perf was something that could be fixed \:o
16:24fdobridge: <mohamexiety> that's a bit sad
16:24fdobridge: <gfxstrand> Sure, if you want to run on that hardware. 🤷🏻♀️
16:24fdobridge: <rhed0x> i mean, maybe there's a different reason for it
16:24fdobridge: <gfxstrand> It can, in time.
16:24fdobridge: <triang3l> Where is it going away? 👀 You at least need one to choose the memory type
16:25fdobridge: <rhed0x> newer extensions just take in VKmemory + offset often
16:25fdobridge: <rhed0x> descriptor buffer is one of those iirc
16:25fdobridge: <rhed0x> newer extensions just take in VkMemory + offset often (edited)
16:25fdobridge: <triang3l> 🐧
16:25fdobridge: <gfxstrand> I think this is one area where open-source probably has a serious edge over proprietary because we can talk to each other better.
16:25fdobridge: <mohamexiety> yeah
16:25fdobridge: <rhed0x> VK\_MESA\_D3D12\_binding\_model
16:25fdobridge: <rhed0x> *slaps roof of vkd3d-proton* this guy can fit so many descriptor code paths in it 🙃
16:25fdobridge: <pac85> If only we made hw...
16:26fdobridge: <mohamexiety> I mean, you _can_
16:26fdobridge: <rhed0x> VK\_MESA\_D3D12\_binding\_model
16:26fdobridge: <rhed0x> *slaps roof of vkd3d-proton*, this guy can fit so many descriptor code paths in it 🙃 (edited)
16:26fdobridge: <mohamexiety> but $$$$$$$$
16:26fdobridge: <triang3l> I hope DXVK can for UAVs…
16:27fdobridge: <pac85> But it seems it would be challenging because vulkan requires to bind stuff implicitly?
16:27fdobridge: <rhed0x> wdym?
16:27fdobridge: <triang3l> counters of all views in one binding instead of 8/64 🙃
16:28fdobridge: <rhed0x> uav counters are weird
16:36fdobridge: <dadschoorse> AMD has special hardware for them, but doesn't use it 🐸
16:36fdobridge: <rhed0x> is that hardware still there on rdna2/3?
16:36fdobridge: <rhed0x> i figured theyd get rid of it
16:38fdobridge: <dadschoorse> no GDS is still there, but there a lot of bugs in the kernel/fw and it might get corrupted in some lower power states
16:40fdobridge: <pac85> Wait does dx expose ordered append?
16:40fdobridge: <dadschoorse> no
16:40fdobridge: <dadschoorse> but GDS can be used for global counters too
16:40fdobridge: <pac85> Mmm
16:41fdobridge: <dadschoorse> and I think at least r600 does that for opengl atomic counters
16:56fdobridge: <gfxstrand> Yeah, as far as I know the only hardware where HW atomic counters are useful is r600. On GCN, they made memory atomics fast enough that the HW atomics aren't really worth it.
16:57fdobridge: <gfxstrand> No one else even has them. They all went "atomics? Okay, those are something that happens on memory, right?"
16:57fdobridge: <gfxstrand> I so wish we could retroactively delete HW atomics from GL.
16:58fdobridge: <dadschoorse> I so wish we could retroactively delete GL.
16:58fdobridge: <gfxstrand> @karolherbst so, with 8/16-bit starting to happen in NAK, I really want a way to run CL through it...
17:00fdobridge: <karolherbst🐧🦀> mhh yeah, fair
17:00fdobridge: <!DodoNVK (she) 🇱🇹> So will you start hacky patches for rusticl support? 🤔
17:26fdobridge: <gfxstrand> I was hoping @karolherbst would write them. 😅
17:27fdobridge: <gfxstrand> Depending on how invasive it is, we could merge something that conditionally builds based on a flag (so we don't add a rust dep for GL) and is hidden behind an ENV var. Or, if it's simple enough, we can just have a patch we pass around.
17:29fdobridge: <karolherbst🐧🦀> I can take a look next week and see how much work it would be to wire up nak in gallium
17:30fdobridge: <dadschoorse> why gallium instead of nvk with zink?
17:30fdobridge: <karolherbst🐧🦀> mhhhhh
17:30fdobridge: <karolherbst🐧🦀> actually
17:30fdobridge: <karolherbst🐧🦀> good question
17:30fdobridge: <karolherbst🐧🦀> 😄
17:30fdobridge: <karolherbst🐧🦀> @gfxstrand just use zink(tm)
17:33fdobridge: <gfxstrand> Zink can't do scratch competently? 🤷🏻♀️
17:39fdobridge: <gfxstrand> But, yeah, I guess I could try Zink. I was just hoping for less noise than that
17:46fdobridge: <karolherbst🐧🦀> it can
17:46fdobridge: <karolherbst🐧🦀> I passed the CL CTS on radv
17:46fdobridge: <karolherbst🐧🦀> the scratch lowering is even not terrible
17:47fdobridge: <karolherbst🐧🦀> @gfxstrand https://gitlab.freedesktop.org/mesa/mesa/-/commit/700a2dc648a5e73c20e456b5144418a8b405f985
17:47fdobridge: <karolherbst🐧🦀> fixed scratch with that
17:49fdobridge: <karolherbst🐧🦀> tldr: as long as the scratch access is only on one bit size it's all optimal
18:07fdobridge: <gfxstrand> woo
18:56fdobridge: <karolherbst🐧🦀> @gfxstrand the only thing it really needs for conformance is that shared memory layout thing and float controls
18:57fdobridge: <karolherbst🐧🦀> but I think nvk only misses the latter atm, no?
18:57fdobridge: <karolherbst🐧🦀> but that only matters for float precision stuff...
18:57fdobridge: <karolherbst🐧🦀> and zink is broken if it comes to multithreading...
18:58fdobridge: <karolherbst🐧🦀> but ehh.. don't do that then
19:11fdobridge: <!DodoNVK (she) 🇱🇹> How many explosions did you encounter while developing NAK? :ferris:
19:18fdobridge: <gfxstrand> What do you mean by "explosions"?
19:19HdkR: "How many explosions?" Yes.
19:29fdobridge: <pac85> Let's say she meant hangs
19:34HdkR: Luckily I also label crashes as explosions so I understand the question
19:35fdobridge: <gfxstrand> @asdqueerfromeu This should make testing easier: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26359
19:35fdobridge: <gfxstrand> I mean, you get a LOT of hangs when bringing up a new compiler.
19:40fdobridge: <!DodoNVK (she) 🇱🇹> With 1.0 set in the meson file the usefulness of this is limited
19:41fdobridge: <!DodoNVK (she) 🇱🇹> Severe crashes probably (I was referencing another thing that's also called NaK that can literally explode)
20:16fdobridge: <pac85> How does nvidia handle hangs? Like on amd the driver often has to reset the whole device losing all contexts as far as I understand (softer resets do exist but they can't always be used for reasons I don't know).
20:16fdobridge: <pac85> Does nvidia have better facilities to recover?
20:19fdobridge: <airlied> it's pretty good, most things recover, but sometimes it does still take out the whole card
20:20fdobridge: <pac85> I wonder how that is achieved, I know in some GPUs the hw is robust enough that when an hang happens the kernel driver only has to retire submissions and kill the faulty context
22:00fdobridge: <gfxstrand> Oh... Hrm... Maybe we should bump that back to 1.3...
22:07fdobridge: <!DodoNVK (she) 🇱🇹> I notably encountered weird segfaults in RenderDoc and DXVK when the meson build file uses 1.0
22:10fdobridge: <gfxstrand> 🫤
22:10fdobridge: <gfxstrand> I guess we could expose an instance version of 1.3 and keep the device locked lower.
22:11fdobridge: <gfxstrand> Sucks that the version goes on the ICD json