00:45fdobridge: <airlied> I think something else we fixed in nil or nvk in the meantime might have fixed it
03:00fdobridge: <gfxstrand> Could be
03:03fdobridge: <gfxstrand> And... looks like fp16 is busted on Turing.
03:03fdobridge: <gfxstrand> It's probably `hfma`. I bet the modifiers encoding is different on Turing. Or maybe it doesn't support a swizzle in src2. 🤷🏻♀️
03:15fdobridge: <gfxstrand> And here I was happily assuming that Turing and Ampere were basically the same... 😩
06:19fdobridge: <!DodoNVK (she) 🇱🇹> What are the fails and crashes?
06:32fdobridge: <marysaka> I thought the same, I guess I will have to put the Turing card on the bench more then :vReiAgony:
09:28fdobridge: <magic_rb.> Can nouveau do wayland somewhat? At least display out on a hybrid graphics laptop. I want to test kde plasma somewhat
09:35fdobridge: <!DodoNVK (she) 🇱🇹> That's what I'm using right now
11:32fdobridge: <samantas5855> Won't this just run on your iGPU?
11:32fdobridge: <samantas5855> Also Ive been running gnome wayland on a gt710 with nouveau
11:32fdobridge: <magic_rb.> my hdmi's are hooked into the nvidiia gpu
11:32fdobridge: <magic_rb.> my hdmi's are hooked into the nvidia gpu (edited)
11:32fdobridge: <samantas5855> it has some graphical glitches
11:32fdobridge: <magic_rb.> all of them, usbc included
11:32fdobridge: <samantas5855> but it can even run minecraft
11:33fdobridge: <magic_rb.> yeah its working fine on x11, but i seem to recall dri3 not being in place for nouveau
11:35fdobridge: <magic_rb.> but ive no clue, the whole graphics is a mystery to me still
11:38fdobridge: <rhed0x> interesting, turning on the DXVK HUD breaks renderdoc captures with NVK
11:38fdobridge: <rhed0x> if I make a capture without the HUD, it works
11:38fdobridge: <rhed0x> if I make one with the HUD, it doesnt
11:47fdobridge: <magic_rb.> im going to attempt to bump to mesa master again, hopefully it wont break everything
12:52fdobridge: <zmike.> does this mean your kc-cts MR is no longer needed
13:09fdobridge: <marysaka> I think Faith had issues on Ada for those tests, the Turing issue is scheduling related :blobcatnotlikethis:
13:10fdobridge: <marysaka> (there was also a bug around HFMA2 source 2 when using immediate that I fixed a bit earlier)
13:11fdobridge: <pavlo_kozlenko> Can nouveau know how to swap vram?
13:11fdobridge: <Sid> ..why would you want to do that
13:12fdobridge: <pavlo_kozlenko> My vram is clogged, what should I do?
13:13fdobridge: <pavlo_kozlenko> If I'm not mistaken, amd implements this
13:13fdobridge: <triang3l> ~~bindless was a mistake~~
13:14fdobridge: <pavlo_kozlenko> If I'm not mistaken, AMD implements this (edited)
13:15fdobridge: <triang3l> I'm not sure about Nouveau specifically, but I think every non-meme driver should be able to automatically demote allocations to system memory at least in OpenGL in non-bindless-resources scenarios
13:16fdobridge: <pavlo_kozlenko> Windows analogy
13:16fdobridge: <pavlo_kozlenko> https://cdn.discordapp.com/attachments/1034184951790305330/1230869654906470451/image.png?ex=6634e3b2&is=66226eb2&hm=12f6ec5ed4a6190173f00a6258aa656b5efb0ea68f5975c39b6b665784e6c60f&
13:17fdobridge: <pavlo_kozlenko> https://forums.developer.nvidia.com/t/non-existent-shared-vram-on-nvidia-linux-drivers/260304
13:17fdobridge: <triang3l> on Vulkan that's the responsibility of applications unfortunately, as there are many ways to reference buffers and textures without explicitly individually binding them, and often many resources are placed in one allocation
13:18fdobridge: <triang3l> on Vulkan that's the responsibility of applications unfortunately, as there are many ways to reference buffers and textures without explicitly individually binding them, and often applications place many resources in one allocation (edited)
13:19fdobridge: <triang3l> on Vulkan that's the responsibility of applications unfortunately, as there are many ways to reference buffers and textures without explicitly individually binding them, and often applications place many textures/buffers in one allocation (edited)
13:20fdobridge: <triang3l> I'm not sure about Nouveau specifically, but I think every non-meme kernel driver should be able to automatically demote allocations to system memory at least in OpenGL in non-bindless-resources scenarios (edited)
13:22fdobridge: <gfxstrand> I disabled fp16 for my runs. I think we have a swizzle bug but Mary couldn't repro so I'm going to try and look at it.
13:22fdobridge: <gfxstrand> One of the MRs definitely is needed because it fixes the build on F40.
13:23fdobridge: <zmike.> ok
13:23fdobridge: <gfxstrand> Those are all CTS bugs
13:24fdobridge: <triang3l> `nouveau_gem.c` appears to have the VRAM/GART domain placement preference logic typical of a DRM driver, I think unless something is broken, demotion should work
13:25fdobridge: <gfxstrand> Yes it can.
13:25fdobridge: <gfxstrand> Bindless has nothing to do with it.
13:26fdobridge: <gfxstrand> Except on Intel DG2. :blobcatnotlikethis:
13:28fdobridge: <gfxstrand> The kernel moves BOs back and forth and patches up the page tables accordingly. The GPU never notices.
13:29fdobridge: <triang3l> It still needs specific work from the application's side (calling glMakeTextureHandleNonResidentARB) for residency to be tracked more granularly than just "everything is needed"… or not?
13:29fdobridge: <gfxstrand> Unless you're Intel DG2 where color compression only works if it's pinned to VRAM for all of time. 🤡
13:30fdobridge: <gfxstrand> Nope. It'll just move stuff to system RAM and map that I've the GPU address space. The descriptor doesn't have to change.
13:30fdobridge: <triang3l> something like textures of fully occlusion-culled objects though are still a waste of usable VRAM of course though
13:31fdobridge: <gfxstrand> Nope. It'll just move stuff to system RAM and map that into the GPU address space. The descriptor doesn't have to change. (edited)
13:31fdobridge: <triang3l> even if they're bound explicitly
13:31fdobridge: <triang3l> something like textures of fully occlusion-culled objects are still a waste of usable VRAM of course though (edited)
13:31fdobridge: <triang3l> something like textures of fully occluded objects are still a waste of usable VRAM of course though (edited)
13:33fdobridge: <triang3l> Yes, but the BO of a bindless texture will be assumed to be referenced in every submission using bindless textures at all unless you explicitly specify it as non-resident if I understand correctly
13:35fdobridge: <gfxstrand> The only thing that unbind/residency does is give more of a hint that it's not going to be used so the kernel should evict those first. It doesn't affect whether or not the kernel can evict it. As long as the page tables support mapping system RAM into the GPU address space (they all do), the kernel can evict anything it wants and no one will notice except that access to things in system RAM will be expensive.
13:36fdobridge: <triang3l> Yeah, just without explicit binding, eviction is less aware of what's really important, that's what I wanted to say
13:37fdobridge: <triang3l> Yeah, just without explicit binding, and with one VkDeviceMemory shared between resources without bothering about things like spatial/temporal locality in the game's world, eviction is less aware of what's really important, that's what I wanted to say (edited)
13:38fdobridge: <triang3l> Yeah, just without explicit binding, and even if everything was bound explicitly, but one BO was shared between resources without bothering about things like spatial/temporal locality in the game's world, eviction is less aware of what's really important, that's what I wanted to say (edited)
13:38fdobridge: <triang3l> Yeah, just without explicit binding, and even if everything was bound explicitly, but one BO (VkDeviceMemory) was shared between resources without bothering about things like spatial/temporal locality in the game's world, eviction is less aware of what's really important, that's what I wanted to say (edited)
13:39fdobridge: <gfxstrand> On some older GPUs, there are limits on the size of the address space or on the amount of system RAM that can be mapped. With those GPUs, explicit BO tracking is necessary to ensure things stay within those limits, even if the application has more allocated. Intel GM45, for instance, has a limit of something like 512MB at a time, even if you have 4GB of RAM. On modern GPUs, it's "You have a 40 bit address space. Have fun!"
13:40fdobridge: <triang3l> i really want to make `prefersDedicatedAllocation = VK_TRUE` for everything in my bindless-incapable driver 🍩 🤰
13:40fdobridge: <gfxstrand> You're welcome to do that
13:41fdobridge: <triang3l> I'm not because that'd ruin render target aliasing possibly in many games and that's probably more important in memory-constrained situation 🙃
13:41fdobridge: <gfxstrand> And it kinda makes sense for bindless-incapable drivers.
13:41fdobridge: <triang3l> I'm not because that'd ruin render target aliasing possibly in many games and that's probably be more important in memory-constrained situation 🙃 (edited)
13:41fdobridge: <triang3l> I'm not because that'd ruin render target aliasing possibly in many games and that's probably be more important in memory-constrained situations 🙃 (edited)
13:41fdobridge: <triang3l> I'm not because that'd ruin render target aliasing possibly in many games and that's probably more important in memory-constrained situations 🙃 (edited)
13:42fdobridge: <gfxstrand> Especially if you have a 32-bit address space and depend on kernel patching of addresses.
13:42fdobridge: <gfxstrand> Hasvk is very much in that boat. 🚢
13:42fdobridge: <triang3l> But can disable it for render targets and storage images/buffer specifically
13:43fdobridge: <triang3l> and get one 128 MB VMA allocation with only a few megabytes actually used by some RTs :xenia_sob:
13:44fdobridge: <triang3l> But can disable it for render targets and storage images/buffers specifically (edited)
14:17fdobridge: <triang3l> On that Evergreen ship?
15:39fdobridge: <gfxstrand> Ugh... Trying to remember which one is MaxwellB...
15:41fdobridge: <marysaka> Maxwell B = GTX 9XX (SM52) / Tegra X1 (SM53)
15:41fdobridge: <marysaka> or GM2XX
15:41fdobridge: <gfxstrand> Right...
15:41fdobridge: <gfxstrand> Not 750 TI. I really need to put labels on things. 😅
15:41fdobridge: <marysaka> yeah 750 Ti is GM10X or something so Maxwell Gen 3
15:41fdobridge: <marysaka> yeah 750 Ti is GM10X or something so Maxwell Gen 1 (edited)
15:42fdobridge: <marysaka> ~~Gen 3 is Pascal in my heart~~
15:42fdobridge: <mohamexiety> ~~mine too~~
15:42fdobridge: <mohamexiety> the changes were really so small lol
15:43fdobridge: <marysaka> not really right? There is shader interlock I believe and the new texture headers
15:44fdobridge: <marysaka> oh you mean for Pascal
15:44fdobridge: <mohamexiety> yep. maxwell 2 -> pascal
15:47fdobridge: <gfxstrand> Now the real question: Why did I buy such giant GPUs when I was back-filling my collection? This 980 TI is a beast and it can't even reclock. 😂
16:00RSpliet: gfxstrand: I think I obtained an NVA0 back when I was still working on reclocking. I had to make the case bulge and slide it through the HDD bracket to make it fit...
16:00fdobridge: <gfxstrand> Totally believable
16:01fdobridge: <gfxstrand> This 980 TI is a solid inch longer than my big 2060 which is already a full-sized card.
16:01fdobridge: <gfxstrand> It's no 4090 but still...
16:02RSpliet: If I had a banana and I knew I had the card here in my flat I'd send you a picture. Think it's in the Netherlands though...
16:07fdobridge: <redsheep> There's no such thing as a video card that's too big :p
16:08fdobridge: <zmike.> my-spoon-is-too-big.meme
16:10fdobridge: <redsheep> 4 slot cards make me unreasonably happy
16:11RSpliet: Oh no no, way too much girth for me
16:12fdobridge: <redsheep> I buy big cases fully expecting that within a few years I'll use the space, and I've always been right
16:12RSpliet: redsheep: the most powerful computer in my house is this https://dlcdnimgs.asus.com/websites/global/products/tt1jzvjitkwxzo6s/img/main/saving_s.jpg
16:13fdobridge: <triang3l> Seeing this makes me want the Xi3 Piston
16:13fdobridge: <redsheep> I bet the 5090 will be massive, and I'll probably be picking one up for testing as soon as they go up for sale
16:13fdobridge: <redsheep> "testing"
16:15fdobridge: <redsheep> Hmm. Now that we are doing GSP the bring up for new generations should be pretty rapid, right? Just need to make the kernel use new enough firmware
16:16fdobridge: <Sid> not that easy
16:16fdobridge: <redsheep> Though I suppose the compiler work might be a big bigger if Blackwell makes some big changes
16:16RSpliet: and provided the firmware is released in time
16:16fdobridge: <Sid> because no stable ABI between GSP versions
16:17fdobridge: <redsheep> RSpliet: We just use the firmware from nvidia's open module, no waiting around
16:18fdobridge: <redsheep> Also like airlied mentioned at one point, the abi not being stable doesn't mean it actually did change. Afaict the harder problem there is solving the multiple firmware versions thing that it sounds like was already being discussed to address with adding 555 firmware
16:26fdobridge: <redsheep> Wanting to not fill up everyone's boot partition and all that
16:36fdobridge: <mtijanic> The hardest part of unstable ABI is that you need the infra to handle it, regardless of whether it's a 1% change or a 50% change
16:37fdobridge: <mtijanic> Actual changes in runtime communication are probably going to be 99% _API_ compatible (e.g. new fields in the structure)
16:38fdobridge: <mtijanic> (this is a consequence of our internal source control where breaking the API requires a lot of complex changes and rebasing so it's usually very rare. But ABI breaks are 'free')
16:38fdobridge: <mtijanic> The one exception to the above is the actual bootup flow, which tends to change from chip family to family.
16:38fdobridge: <mtijanic> The one exception to the above is the actual GSP bootup flow, which tends to change from chip family to family. (edited)
16:42fdobridge: <redsheep> I see, thanks for the explanation. That makes more sense to me now.
17:10Lyude: airlied: btw regarding your responses - yeah that makes sense, I kind of figured it might end up needing to be a special path
17:45fdobridge: <gfxstrand> That's pretty standard for windows-focused driver dev models. In the Windows world, the only ABIs that are sacrosanct are the ones provided by Microsoft and everyone ships UMD+KMD+FW together. As long as they work together, no one cares if things change.
17:46fdobridge: <gfxstrand> The linux world is the weird one here where we have a backwards compatible UAPI guarantee
17:46fdobridge: <gfxstrand> In theory, KMD+FW dev could be done in parallel but that would require us having the FW sources checked into the Linux tree and that ain't gonna happen. 😅
18:20fdobridge: <gfxstrand> @airlied @ahuillet Looks like S8 is in good shape all the way back to MaxwellB. I'm doing one more full run on Ampere since I reworked all the PTE kinds but I'm 99.9% sure it'll be fine.
18:20fdobridge: <gfxstrand> @airlied @ahuillet Looks like S8 is now in good shape all the way back to MaxwellB. I'm doing one more full run on Ampere since I reworked all the PTE kinds but I'm 99.9% sure it'll be fine. (edited)
18:20fdobridge: <gfxstrand> I pulled in the hwref headers from OGK and we're using those in NIL now so no more magic hex values. 😁
18:26fdobridge: <ahuillet> _ => NV_MMU_PTE_KIND_PITCH,
18:26fdobridge: <ahuillet> still pretty sure this is wrong for at least the one format I saw it hit for
18:27fdobridge: <gfxstrand> In the Maxwell function, I think we could replace that with a panic
18:28fdobridge: <gfxstrand> On Turing, that's where all color formats end up
18:28fdobridge: <gfxstrand> Arguably, those should maybe be `GENERIC_MEMORY`
18:30fdobridge: <gfxstrand> However, the moment `pte_kind != 0`, we have to sparse bind it so I'd rather avoid that unless there's a perf reason to do so.
18:31fdobridge: <gfxstrand> Unless buffers can be `GENERIC_MEMORY`, too.
18:33fdobridge: <ahuillet> with the blob? they're definitely not supposed to be pitch leni
18:33fdobridge: <ahuillet> with the blob? they're definitely not supposed to be pitch linear though (edited)
18:33fdobridge: <ahuillet> "generic" is also what gets you the compression IIRC
18:34fdobridge: <ahuillet> if you're really using pitch linear for image data then there's an obvious perf reason but I'm sure you're aware of that
18:34fdobridge: <gfxstrand> Yeah, there's `GENERIC_MEMORY`, `GENERIC_MEMORY_COMPRESSIBLE`, and `GENERIC_MEMORY_COMPRESSIBLE_DISABLE_PLC`.
18:34fdobridge: <gfxstrand> Oh, we're very much not. It's tiled. But tiling works with `PTE_KIND_PITCH` on Turing+
18:35fdobridge: <gfxstrand> That or the kernel is smashing everything to `GENERIC_MEMORY` under the hood.
18:35fdobridge: <gfxstrand> If it is, that might actually explain some of the kepler issues....
18:35fdobridge: <gfxstrand> @airlied ^^
18:36fdobridge: <ahuillet> (we call it "block linear" btw)
18:37fdobridge: <airlied> Yeah there is something about 0xfe i think on kepler
18:38fdobridge: <gfxstrand> But there's nothing "linear" about it. 😂
18:38fdobridge: <ahuillet> it is linear, but per block!
18:38fdobridge: <ahuillet> or the blocks are linear I don't know :)
18:39fdobridge: <gfxstrand> An image is a linear 3D array of blocks which are a linear 3D array of GOBs which are a linear 3D array of pixels.
18:39fdobridge: <gfxstrand> It's still better than Intel's crazy Yf/Ys formats, though. 😅
18:39fdobridge: <gfxstrand> And don't get anyone started on AMD...
18:40fdobridge: <gfxstrand> NVIDIA is downright sane!
18:41fdobridge: <marysaka> *nods* I will call that GOB linear now 😕
18:41fdobridge: <gfxstrand> 🌶️
18:42fdobridge: <gfxstrand> And then there's Intel's W Tiling Format...
18:43fdobridge: <ahuillet> pre-NV50 (I think) there was another "tiled" format which wasn't called block linear, I think it was called swizzled (?)
18:44fdobridge: <gfxstrand> Oof
18:44fdobridge: <gfxstrand> Because no one will ever get "swizzled" confused with texture swizzles...
18:44fdobridge: <gfxstrand> (Though that might predate the GL texture swizzle extension"
18:45fdobridge: <gfxstrand> I think we called them swizzles at Intel some. I know another HW vendor or two calls it that.
18:45fdobridge: <ahuillet> same problem with "compression" and GL texture compression
18:45fdobridge: <gfxstrand> *sigh*
18:45fdobridge: <gfxstrand> Yes...
18:47fdobridge: <gfxstrand> Back to the topic at the top, though... I'm passing 0 to the kernel and block-linear is fine with that. IDK if that means 0 works or if that means the kernel is smashing to 6 behind my back.
18:58fdobridge: <ahuillet> understood. I'll take a look at what the blob does see if I can advise on what is best.
18:58fdobridge: <gfxstrand> Oh, joy... nouveau.ko does all sorts of kind remapping from the look of things
19:00fdobridge: <djdeath3483> Yeah wtf
19:01fdobridge: <gfxstrand> I *think* the remapping is only there to let nouveau.ko force-disable compression. That's the way it appears from a skim.
19:02fdobridge: <gfxstrand> Why the KMD is doing that and whether or not it's safe to do that behind my back without me doing a resolve or something, I don't know.
19:02fdobridge: <gfxstrand> Maybe it's disabling compression when it evicts to system ram? That would make sense.
19:06fdobridge: <gfxstrand> Whether or not those tables are correct, though... That I'm much less sure about. 😅
19:06fdobridge: <gfxstrand> Like, this Maxwell table I'm looking at is missing a LOT of entries
19:06fdobridge: <ahuillet> if you evict to system RAM you're doing some DMA transfer to sysmem
19:06fdobridge: <ahuillet> writes to sysmem can't be compressed so this will be decompressed at that time I think (in my mental model anyway)
19:07fdobridge: <gfxstrand> Yeah
19:07fdobridge: <ahuillet> some with MSAA resolves if that's what you're talking about - there's nothing to resolve, the storage is there compression or not, it's just what travels on the bus vs. what's in L2 that changes
19:07fdobridge: <gfxstrand> So I think that part's fine and the remap table exists to ensure the PTEs we fill out referencing system RAM don't have any compression kinds set.
19:08fdobridge: <ahuillet> though I wonder if you can disable compression on pages after the fact.
19:08fdobridge: <gfxstrand> I doubt you can without a resolve. But I have seen resolve things in the method headers
19:08fdobridge: <gfxstrand> IDK what you call them at Nvidia. We called them resolves at Intel.
19:08fdobridge: <ahuillet> what do you call resolves at Intel then?
19:09fdobridge: <gfxstrand> Oh, those are also resolves. 😂
19:10fdobridge: <gfxstrand> The 3 hardest problems in computer science are naming things and counting things.
19:10fdobridge: <ahuillet> resolve for us is MSAA resolves/downsampling, I don't think it's used for anything else
19:10fdobridge: <ahuillet> resolve for us is MSAA/VCAA resolves/downsampling, I don't think it's used for anything else (edited)
19:11fdobridge: <gfxstrand> But yeah, I think I saw some sort of method for in-place decompression
19:11fdobridge: <gfxstrand> `NVC579_DECOMPRESS_SURFACE`
19:11fdobridge: <ahuillet> let me switch vim over to a non-secret method file before I answer :D
19:12fdobridge: <gfxstrand> hehe
19:12fdobridge: <ahuillet> yeah, you can use this for color or depth (separate method)
19:13fdobridge: <gfxstrand> I'm not sure why we would want to decompress but having the option is nice.
19:21fdobridge: <rhed0x> @gfxstrand do you have 5 minutes? I'd like to talk about the subgroup rotate MR
19:22fdobridge: <redsheep> VCAA? How is there yet another antialiasing scheme I've never heard of?
19:25fdobridge: <ahuillet> that may not be a public name actually. oops. old stuff anyway
19:27fdobridge: <ahuillet> https://developer.download.nvidia.com/SDK/9.5/Samples/DEMOS/Direct3D9/src/CSAATutorial/docs/CSAATutorial.pdf CSAA I think
19:28fdobridge: <redsheep> Ah that makes sense. Not surprised the names get warped, the marketing around AA methods has often been kind of annoying
19:32fdobridge: <redsheep> I don't imagine any engineer came up with dlss, it's not in any real sense a super sampling scheme. If anything it's the opposite.
19:33fdobridge: <redsheep> *came up with the name dlss
19:35fdobridge: <mohamexiety> vcaa is among the NV profile inspector settings iirc so it's public/old but I haven't heard much about it outside of that
19:35fdobridge: <redsheep> Google turns up nothing, though that doesn't mean much anymore.
19:39RSpliet: Wait... do I understand CSAA correctly in that if one of the non-shaded sample points happens to sample (only) a primitive that's not covered by any of the shaded sample points, you... just kind of make up which of the two primitives covered by the shaded points it puts its weight behind?
19:40RSpliet: (sorry, I said two primitives because I was thinking about the 2+2 mode in my head)
19:42RSpliet: or I guess you can make that non-shaded sample point just kind of not count, which is easy with 2+2, and not so easy with 8+8
20:13fdobridge: <gfxstrand> Sure. Sorry, was doing dishes for a bit.
20:16fdobridge: <gfxstrand> I'm backnow
20:16fdobridge: <gfxstrand> I'm back now (edited)
20:21fdobridge: <rhed0x> basically I was looking into doing it in nak like you proposed in your review comment
20:21fdobridge: <rhed0x> but i dont see how
20:23fdobridge: <rhed0x> basically the mask is only used to compute the maxLane and thats just used to reject it if it goes oob
20:23fdobridge: <rhed0x> i have an example in the MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27863#note_2380815
20:46fdobridge: <gfxstrand> Yeah, you're right. It's only taken into account the way I though in `.idx` mode.
20:46fdobridge: <gfxstrand> Bummer
20:47fdobridge: <gfxstrand> That opcode is so dang complicated...
20:48fdobridge: <moonykay> im curious, are there docs (RE'd or otherwise) lying around for the architecture
20:48fdobridge: <moonykay> im curious, are there docs (RE'd or otherwise) lying around for the architecture in question (edited)
20:48fdobridge: <moonykay> (I collect instruction sets)
20:51fdobridge: <rhed0x> i dont see how it would handle wraparound in idx mode for us either (except for doing what the lowering pass does)
20:51fdobridge: <rhed0x> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#ptx-instructions
20:52fdobridge: <gfxstrand> We typically go off the PTX docs for most of this stuff.
20:52fdobridge: <moonykay> ah- the native ISA is typically pretty close to PTX?
20:52fdobridge: <gfxstrand> PTX and the hardware are typically pretty close
20:52fdobridge: <moonykay> neat
20:53fdobridge: <gfxstrand> There's some difference but when it comes to the fiddly bits like we were just discussing, the PTX behavior is usually exactly the HW behavior.
20:53fdobridge: <mtijanic> It's about as close to the hardware as the amd64 ISA is close to the actual microops on an intel CPU.
20:54fdobridge: <moonykay> okay but *that* is a rather massive gap
20:54fdobridge: <moonykay> the impression i get here is PTX's gap isnt quite *that* big,
20:54fdobridge: <moonykay> the impression i get here is PTX's gap isnt quite *that* big. (edited)
20:55fdobridge: <gfxstrand> I mean there is a decent gap but the mapping is typically fairly direct. It's designed for the hardware after all. It'd be pretty silly to design a huge PTX/HW mismatch.
20:55fdobridge: <moonykay> yeh
20:55fdobridge: <mohamexiety> there are research papers that try to poke at things btw @moonykay
20:55fdobridge: <mohamexiety> e.g., https://arxiv.org/pdf/1804.06826.pdf
20:55fdobridge: <moonykay> (im basically saying AMD64 has a pretty massive ISA/HW mismatch)
20:56fdobridge: <rhed0x> does CUDA/PTX do frequent breaking changes or has Nvidias ISA just been mostly stable for ages?
20:56fdobridge: <gfxstrand> The over-all theory of operation hasn't changed massively in a while.
20:57fdobridge: <gfxstrand> The details shift from generation to generation.
20:58fdobridge: <gfxstrand> Put another way, the delta between NV ISA and AMD GCN for instance is 10 or 100x bigger than the delta between NV ISA and PTX.
21:00fdobridge: <gfxstrand> Nothing between Fermi and Ada requires a fundamentally different compiler. Some instructions are added or removed. Semantics change a bit here and there. Later HW has a "smarter" encoding scheme. But nothing super fundamental.
21:02fdobridge: <mohamexiety> yeah. the only exception/potential breakage I heard of is some weirdness with Hopper where there's a `sm_90` target and a `sm_90a` target and future `sm_9x` HW isn't guaranteed to support `sm_90a` features (Hopper satisfies both). the functionality in question is `wgmma` iirc. but this may not matter much for normal gfx cards
21:05fdobridge: <mohamexiety> > Target architectures with suffix “a”, such as sm_90a, include architecture-accelerated features that are supported on the specified architecture only, hence such targets do not follow the onion layer model. Therefore, PTX code generated for such targets cannot be run on later generation devices. Architecture-accelerated features can only be used with targets that support these features.
21:08fdobridge: <rhed0x> any more beginner issues? :)
21:11fdobridge: <valentineburley> This has the easy tag: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10981
21:12fdobridge: <valentineburley> And this can be closed now for real @gfxstrand https://gitlab.freedesktop.org/mesa/mesa/-/issues/10686
21:30fdobridge: <pavlo_kozlenko> +
21:34fdobridge: <redsheep> Appears that's probably also true of https://gitlab.freedesktop.org/mesa/mesa/-/issues/9659 and https://gitlab.freedesktop.org/mesa/mesa/-/issues/9658
21:36fdobridge: <redsheep> Are the missing features for zink real? I can't imagine they are if it passes cts now
21:36fdobridge: <redsheep> https://gitlab.freedesktop.org/mesa/mesa/-/issues/9477
21:46fdobridge: <gfxstrand> Fragment shader interlock still needs to be done if we want the relevant GL extension
21:46fdobridge: <gfxstrand> Everything else looks fine
21:46fdobridge: <redsheep> So zink doesn't need descriptor buffer?
21:47fdobridge: <triang3l> The 🐷 test for sample interlock assumes guarantees the API doesn't provide btw, be careful
21:47fdobridge: <triang3l> But maybe on Nvidia it'd work
21:48fdobridge: <triang3l> no, not sample interlock, but pixel interlock with sample shading https://gitlab.freedesktop.org/mesa/piglit/-/issues/93
21:50fdobridge: <gfxstrand> It doesn't need it, no.
21:50fdobridge: <zmike.> not correct
21:51fdobridge: <zmike.> your feature support level will be crippled if you don't have DB
21:51fdobridge: <zmike.> a lot of the precompile handling requires it
21:51fdobridge: <gfxstrand> What do you mean by "feature support level"?
21:52fdobridge: <zmike.> full feature support means you support all versions of GL and you have great perf and you don't get constant shader compile stuttering
21:52fdobridge: <zmike.> right now you don't get the last part
21:52fdobridge: <gfxstrand> Okay, fair.
21:52fdobridge: <gfxstrand> What's different with descriptor buffer, though?
21:53fdobridge: <zmike.> non-db handling is far more complex and I wasn't about to add another layer of complexity to the already-incomprehensible layering of precompilation
21:54fdobridge: <zmike.> I mean, have you looked at the non-db descriptor code in zink? it's nightmarish
21:55fdobridge: <Sid> I mean, have you looked at ~~the non-db descriptor~~ code ~~in zink~~? it's nightmarish
21:55fdobridge: <Sid> :P
21:55fdobridge: <zmike.> computers were a mistake
21:55fdobridge: <gfxstrand> Yes
21:56fdobridge: <redsheep> Somebody was saying there's hope for implementing descriptor buffer, right?
21:57fdobridge: <gfxstrand> I think @georgeouzou has been hacking on it
21:57fdobridge: <gfxstrand> Unfortunately, typed buffers are pretty much a disaster with DB on NVIDIA
21:58fdobridge: <zmike.> what is "typed buffers" in this contemxt
21:58fdobridge: <zmike.> what is "typed buffers" in this context (edited)
21:58fdobridge: <gfxstrand> `VkBufferView`
21:59fdobridge:<gfxstrand> has too much terminology in her head
21:59fdobridge: <zmike.> ah you mean like texel buffers
22:00fdobridge: <gfxstrand> Yes and storage image buffers or whatever you want to call those things
22:00fdobridge: <gfxstrand> storage texel buffers. That's the term.
22:01fdobridge: <zmike.> nobody likes these anyway
22:02fdobridge: <gfxstrand> And then AMD has to be all weird and implement them as something that isn't an image.
22:02fdobridge: <nanokatze> we just need hw for nir_format_convert
22:03fdobridge: <zmike.> if it makes you feel better I wasted an hour today examining how to do shader variants for texel buffer swizzles on A/L/I formats
22:03fdobridge: <zmike.> and got nowhere
22:04fdobridge: <gfxstrand> Wait, A/L/I formats are allowed in texel buffers?!?
22:05fdobridge: <zmike.> compat contexts~~~~~
22:05fdobridge: <zmike.> there are even piglit tests
22:06fdobridge: <zmike.> I may go back to saying fuckit because this is excruciating to figure out without killing cpu perf
22:06fdobridge: <redsheep> So apple wasn't just being lazy when they split gl to put compat in a corner?
22:06fdobridge: <gfxstrand> VK_EXT_buffer_view_swizzle?
22:06fdobridge: <gfxstrand> (Just say no...)
22:06fdobridge: <zmike.> that's the other option, but iirc some hardware can't do it anyway
22:06fdobridge: <zmike.> I think it was even discussed at some point
22:07fdobridge: <gfxstrand> Intel, NV, and AMD should be able to
22:07fdobridge: <zmike.> and given how vanishingly rare this is (I've literally never seen it), the value proposition is...
22:07fdobridge: <gfxstrand> IDK about IMG
22:07fdobridge: <gfxstrand> Yeah
22:07fdobridge: <zmike.> outside of A8_UNORM anyway, which is native
22:07fdobridge: <gfxstrand> Yeah
22:07fdobridge: <zmike.> but having so many piglit fails from one missing feature sucks
22:07fdobridge: <zmike.> even if the feature sucks
22:07fdobridge: <gfxstrand> This doesn't seem worth having a Vulkan extension
22:08fdobridge: <zmike.> not unless I find some high performance game (ab)using it probably not
22:16fdobridge: <moonykay> what even is A/L/I
22:17fdobridge: <gfxstrand> Alpha, Luminance, and Intensity. They're grayscale formats.
22:17fdobridge: <gfxstrand> Luminance, for instance, is a single channel that samples as `(x, x, x, 1)`
22:18fdobridge: <moonykay> I mean, I've seen A8_UNORM get used
22:18fdobridge: <moonykay> but thats it
22:19fdobridge: <gfxstrand> Yeah, luminance and intensity formats are pretty uncommon
22:19fdobridge: <moonykay> (and only as a source buffer for text)
22:19fdobridge: <moonykay> (and only as a source buffer for text glyphs) (edited)
22:19fdobridge: <moonykay> (and only as a source texture for text glyphs) (edited)