03:48 fdobridge: <g​fxstrand> First Ampere conformance run going. Starting with x86_64
04:14 fdobridge: <a​irlied> dakr, gfxstrand : I talked a bit with Maria today about v3d and gpuvm but also she mentioned userptr ideas
04:23 fdobridge: <b​enjaminl> okay so figuring out `SHFL.UP` was pretty simple after I actually read the PTX docs, but now I'm confused how it passes the CTS on turing
04:23 fdobridge: <b​enjaminl> the lower 5 bits are a lower-limit on the source lane in `SHFL.UP` and an upper-limit for all the other ops
04:24 fdobridge: <b​enjaminl> I'm betting they changed the behavior for turing and ptxas is transforming it
04:30 fdobridge: <b​enjaminl> nope... ptxas emits the same value on both sm50 and sm75...
04:33 fdobridge: <b​enjaminl> oh... it probably passes the CTS because we're not advertising `VK_SUBGROUP_FEATURE_SHUFFLE_BIT`, so the CTS is just skipping all those tests
04:33 fdobridge: <b​enjaminl> I had to add that to get it to run the tests on maxwell
04:59 fdobridge: <b​enjaminl> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26202 MR against mesa/mesa to fix it, since this applies to SM75
05:00 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> This is the only extension Wine uses for WoW64 support (there's map_memory_placed support for Wine in some branch though)
05:11 fdobridge: <g​fxstrand> Sounds plausible. Yeah, I got stuck on some control-flow stuff and that's why I didn't get all of subgroups wired up.
05:12 fdobridge: <g​fxstrand> That's my next task once I get back to actually writing code and not just moving stuff around. 😅
05:23 HdkR: map_memory_placed? What's this?
05:24 HdkR: Sounds interesting to my needs and google isn't giving me anything
05:30 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> HdkR: An extension that allows mapping memory in a specific address: https://github.com/KhronosGroup/Vulkan-Docs/pull/1906
05:31 HdkR: ah neat. Fixes the problem of wanting to allocate in to their allocator space without punching a hole in the mapping and introducing bugs
05:31 HdkR: Doesn't quite fit my needs, but cool to see.
06:36 fdobridge: <b​enjaminl> got vkcube running on SM50 🙂
06:37 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Is this Shader Model 5? /s
06:40 fdobridge: <b​enjaminl> maxwell
06:40 fdobridge: <b​enjaminl> no idea how things map to directx
06:42 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> NVIDIA and Microsoft both using SM makes things confusing :nouveau:
09:43 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Here's my attempt at rebasing the pipeline cache MR (it compiles but I'm not sure how well it works but at least DXVK doesn't crash) 🐸
09:43 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1174283543699013663/nvk-pipeline-cache.patch?ex=656707c5&is=655492c5&hm=0a13704556e9f137f85e4733fff284dfca257a951e59fb23d8d568b2d8d47fef&
10:29 fdobridge: <p​homes_> Thank you for the rebase. I appreciate it. The refactoring that just landed requires that the whole MR is redone a fair bit and I want to perhaps do some things in a different way. I plan to work on that next week
11:35 fdobridge: <p​rop_energy_ball> https://www.youtube.com/watch?v=npxsD7u52qk
11:36 fdobridge: <p​rop_energy_ball> Holy shit, Hat in Time running pretty well :)
11:36 fdobridge: <p​rop_energy_ball> Awesome work
11:37 fdobridge: <p​rop_energy_ball> Oh damn ♻️... I'm late. Sorry!
11:50 fdobridge: <m​ohamexiety> check this out too @prop_energy_ball :3
11:52 fdobridge: <p​rop_energy_ball> Niice
11:56 fdobridge: <m​arysaka> :hatkid:
11:58 fdobridge: <d​adschoorse> would run better with fmulz in NAK 🐸
12:01 fdobridge: <p​rop_energy_ball> DXVK should just pre-emptively enable the opencoded proper version now :demon:
12:37 fdobridge: <a​irlied> https://youtu.be/LipsVK5d_vM?t=19576 (my nouveau talk from yesterday, in case anyone is interested)
12:50 RSpliet: airlied: IIRC just sticking the firmware files in /boot is exactly what arch does? Or this is what I think I learned a few days ago when someone accidentally DD'd over the first gigabyte of his SSD and had to regenerate his partition table, boot and EFI partitions.
12:54 fdobridge: <d​adschoorse> inb4 nvidia refuses to optimize it because of the denorm issue
12:55 fdobridge: <d​adschoorse> @prop_energy_ball dxvk enabled it a while back for nvk
12:55 fdobridge: <d​adschoorse> I wrote that patch when codegen was used and we didn't know about the implied ftz yet
12:56 fdobridge: <d​adschoorse> so technically nak regressed dxvk
13:25 fdobridge: <e​sdrastarsis> Dota Underlords (low settings)
13:25 fdobridge: <e​sdrastarsis> https://cdn.discordapp.com/attachments/1034184951790305330/1174339298422689883/20231115_10h24m09s_grim.jpeg?ex=65673bb2&is=6554c6b2&hm=1d67b18aaa8fb41345b226ae8e966cced9908f063d39bbb55e360b9ebdd702ef&
13:25 fdobridge: <e​sdrastarsis> On Wayland
13:32 fdobridge: <k​arolherbst🐧🦀> :ferrisBongo:
14:32 fdobridge: <e​sdrastarsis> No Man's Sky... at least it boots :ferris_happy:
14:32 fdobridge: <e​sdrastarsis> https://cdn.discordapp.com/attachments/1034184951790305330/1174356240432902195/20231115_11h30m48s_grim.jpeg?ex=65674b7a&is=6554d67a&hm=ad76fddc52f2342fdc17066d26a6a68033c7b7ecee1be7ef714372f4bfd4b870&
14:33 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> With an old DXVK version?
14:36 fdobridge: <r​hed0x> no mans sky uses vulkan directly
14:37 fdobridge: <r​hed0x> who needs depth testing anyway 🐸
14:42 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> It's fairly rare to see Windows games using Vulkan
14:43 fdobridge: <e​sdrastarsis> Yeah, I'm using the pcgamingwiki's list
15:31 gfxstrand: dakr: I think we might have a fence roll-over issue or similar
15:35 fdobridge: <g​fxstrand> Thanks! Sorry for shifting the sand out from underneath you but I think caching will be a lot easier now.
15:38 fdobridge: <m​arysaka> @gfxstrand did something changed around sample locations with the 2 MR you merged? I have 14 failures and I'm out of idea about what might cause them...
15:38 fdobridge: <m​arysaka> https://cdn.discordapp.com/attachments/1034184951790305330/1174372865563439166/image.png?ex=65675af5&is=6554e5f5&hm=ef27f2be81e8adff47fccdbe8df1c789f9ea155ee6554cde76cbf74c21d8371e&
15:39 fdobridge: <g​fxstrand> Quite possibly...
15:39 fdobridge: <g​fxstrand> I didn't intend to change anything around sample locations
15:39 fdobridge: <g​fxstrand> But the FS input interp rework may have had a bug.
15:40 fdobridge: <m​arysaka> Will push that somewhere and tomorrow I will try to diff my old branch vs the new one on the shader code output
15:40 fdobridge: <g​fxstrand> Also, I re-ordered the universe so that may have affected things, too.
15:40 fdobridge: <g​fxstrand> sounds good
15:40 fdobridge: <m​arysaka> I tried to remove the ordering and remove flags for both my new intrinsic and ipa_nv but that didn't changed a thing sadly...
15:43 dakr: gfxstrand: What did you observe?
15:46 fdobridge: <m​arysaka> @gfxstrand did something changed around sample locations with the 2 MR you merged? I have 14 failures and I'm out of idea about what might cause them... (msaa_interpolate_at_offset / msaa_interpolate_at_sample but only for perspective) (edited)
15:46 fdobridge: <m​arysaka> https://cdn.discordapp.com/attachments/1034184951790305330/1174372865563439166/image.png?ex=65675af5&is=6554e5f5&hm=ef27f2be81e8adff47fccdbe8df1c789f9ea155ee6554cde76cbf74c21d8371e&
15:51 gfxstrand: dakr: Trying to do a full CTS run on Ampere last night. It failed twice at exactly the same test. It was a synchronization test. In both cases, the kernel thought it timed out and killed my context.
15:51 gfxstrand: Unfortunately, I can't repro with a single test. If I run dEQP-VK.synchronization.timeline_semaphore.*, they all pass.
15:52 gfxstrand: So there's some way in which we're sort of setting up for the bug.
15:53 gfxstrand: I have trouble believing that it's a userspace bug but it's hard to tell
15:53 gfxstrand: Fences wrapping is just a blind guess, BTW.
15:53 gfxstrand: A full CTS run is enough fences that it could hit a u32 overflow bug if we had one.
16:00 gfxstrand: dakr: So, yeah, kind-of a wild guess, unfortunately, but there's a bug of some sort in there.
16:01 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> I guess I'm going to push this patch into my AUR package then
16:04 gfxstrand: dakr: Yeah, I all of dEQP-VK.synchronization.* passis if I don't run the rest of the CTS before it. :sob:
16:38 fdobridge: <e​sdrastarsis> I think the other patches need a rebase too
16:38 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Other ones seem to be fine
16:50 fdobridge: <e​sdrastarsis> The patches are not applying here
16:51 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> It's only the pipeline caching one (let me quickly push the updated patch)
16:52 fdobridge: <p​homes_> no problem. The code is much nicer to work with now 🙂
16:55 fdobridge: <p​homes_> VK_EXT_color_write_enable is done. That is one (small) step closer to zink
17:34 olderfacebaby: What I was talking about was the simpler algorithm , you can envision that as you sideload the to values so each value compressed is sided with it's cancelation distance from where index is taken, it acts as scaffold, because those are controlling what content to eliminate, they themselves get eliminated by subtract which has their content outside of the mask buffer, in other words they always get eliminated through subtract, but the one you
17:34 olderfacebaby: canceled by stuffing an index to mask the value entirely, comes back as from result as cancelation distance plus index, so if you know that this exactly ended up as clipping the focal value, the value at interest, then you can conclude, that to recover the Vai value at interest, you need no algebra anymore, but only have the values stored without scaffold subtracted from the values where scaffolds + index was eliminated i.e cut/subtracted out
17:34 olderfacebaby: , so I understood that there's no need to take any half values like divide by two at all :( it's lot simpler.
17:35 olderfacebaby: But I have suspicion that it's similar to Elias fano.
18:08 olderfacebaby: Linear algebra is too much to say, cause delta like that is only single subtract in compiler, you eliminate values then, delta that collides with distance to clipping the values from scaffold coefficients, and just as such you eliminate the remainder of the scaffold like another delta, and the indexed value should pop up.
18:19 olderfacebaby: So easiest is to distribute the logic through libre office calc spreadsheet, cause gnome-calculator has only logs which can not be saved, I assume tomorrow I can distribute the file.
18:21 olderfacebaby: Then you can implement that logic if you want in any language, like c rust c++ or whatever
19:31 fdobridge: <g​fxstrand> Ugh.. Titan V's are expensive...
19:31 fdobridge: <g​fxstrand> @karolherbst, @airlied Do either of you have an actual Volta card?
19:32 fdobridge: <g​fxstrand> I don't really want to drop $600 on a GPU that's never going to be fast.
19:33 fdobridge: <g​fxstrand> Pascals are like $100 but Voltas are expensive
19:33 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> I have a GPU with Volta NVENC 🐸
19:33 fdobridge: <a​irlied> I have a GV100 in my bag sitting beside me
19:33 fdobridge: <a​irlied> I was going to give it to Robert Foss to ship to Karol
19:33 fdobridge: <g​fxstrand> Heh
19:34 fdobridge: <g​fxstrand> Really, as long as someone has one, that's the important thing.
19:34 fdobridge: <a​irlied> I just gave ajax a volta also yesterdayt
19:34 fdobridge: <g​fxstrand> I can write the Volta MR but I don't have the HW to test it
19:34 fdobridge: <k​arolherbst🐧🦀> I can do it once I have the GPU 😄
19:34 fdobridge: <k​arolherbst🐧🦀> but yeah...
19:35 fdobridge: <k​arolherbst🐧🦀> on tha ISA side it's mostly like Turing just without uniform registers/predicates
19:35 fdobridge: <g​fxstrand> Yup
19:35 fdobridge: <g​fxstrand> I'm reworking NAK to use the Turing encodings for Volta
19:35 fdobridge: <k​arolherbst🐧🦀> I think the 3d/compute stuff is a bit different...
19:35 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> What other niche NVIDIA cards do you have laying around?
19:35 fdobridge: <k​arolherbst🐧🦀> volta still uses the old sph?
19:35 fdobridge: <k​arolherbst🐧🦀> something like that?
19:35 fdobridge: <k​arolherbst🐧🦀> dunno
19:37 fdobridge: <a​irlied> I have all the niche nvidia cards now, just not in my bag sitting beside me
19:37 fdobridge: <a​irlied> I gave away a jetson tk1 and tx1 yesterday
19:37 fdobridge: <k​arolherbst🐧🦀> the only niche one I have is the titan one :ferrisUpsideDown:
19:38 fdobridge: <a​irlied> I think I gave ajax the Titan, this one is the quadro
19:38 fdobridge: <k​arolherbst🐧🦀> @mupuf also still has a couple of GPUs
19:39 fdobridge: <g​fxstrand> Most of them aren't too hard to get your hands on. It's just that Volta only came in like 3 really high-end GPUs so they're stupid $$ on eBay.
19:39 mupuf: A couple :D That's a gross misrepresentation :D
19:39 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> How about the NV1?
19:40 fdobridge: <a​irlied> I think I have an nv3 but the caps are blown
19:40 fdobridge: <k​arolherbst🐧🦀> yeah....
19:40 fdobridge: <k​arolherbst🐧🦀> my NV4x gpus are also half trash
19:40 fdobridge: <k​arolherbst🐧🦀> though I never know if driver or hardware issue
19:41 fdobridge: <m​arysaka> I have ton of Maxwell GPUs around (most of them GM20B), some Kepler, some Fermi, and two Ampere
19:41 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> If they aren't SMD then they should be easy to replace 👩‍🔧
19:41 fdobridge: <m​arysaka> ~~I will not count the two Geforce 6600LE cards I have~~
19:42 fdobridge: <m​arysaka> I have ton of Maxwell GPUs around (most of them GM20B), 3 Pascal, some Kepler, some Fermi, and two Ampere (edited)
19:43 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> I had a NV96 until 2014
19:43 fdobridge: <m​arysaka> that remind me I need to open the issue about tegra's regression and the sync issue on prior versions
19:44 fdobridge: <g​fxstrand> @karolherbst Once you get the GPU, mind giving this a test? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26212
19:44 fdobridge: <m​arysaka> I would be nice to have the X1 be a nice target as it is quite faster than regular dGPU for Maxwell testing
19:44 fdobridge: <g​fxstrand> Or anyone else who has an actual Volta.
19:44 fdobridge: <k​arolherbst🐧🦀> that's going to be fun
19:44 fdobridge: <k​arolherbst🐧🦀> 🙂
19:44 fdobridge: <g​fxstrand> Shouldn't be too spicy
19:44 fdobridge: <k​arolherbst🐧🦀> well
19:44 fdobridge: <k​arolherbst🐧🦀> I wouldn't be surprised if we have a few `TURING_A` checks too many
19:45 fdobridge: <g​fxstrand> Yeah....
19:45 fdobridge: <k​arolherbst🐧🦀> or well.. if some needs to be `VOLTA_A` instead
19:45 fdobridge: <g​fxstrand> That's going to be the tricky bit
19:45 fdobridge: <k​arolherbst🐧🦀> run the CTS and check dmesg 😄
19:45 fdobridge: <g​fxstrand> It should only take like a day to find them all
19:45 fdobridge: <k​arolherbst🐧🦀> yeah...
19:45 fdobridge: <k​arolherbst🐧🦀> shouldn't be too hard
19:46 fdobridge: <m​ohamexiety> I have 1 Ampere and 1 Pascal. _may_ get Ada in the future but really not sure
19:46 fdobridge: <k​arolherbst🐧🦀> seems like Volta doesn't use the shader heap anymore, so that's good
19:47 fdobridge: <m​arysaka> random question: do anyone have any re documentations around MME macros used by the proprietary driver?
19:47 fdobridge: <k​arolherbst🐧🦀> @gfxstrand yeah soo... the new SPH stuff is Turing+ 🥲
19:47 fdobridge: <k​arolherbst🐧🦀> I _hope_ the alignment stuff checks out for volta
19:47 fdobridge: <k​arolherbst🐧🦀> as the header is 0x50 in size
19:48 fdobridge: <g​fxstrand> That's not a big deal. It only really impacts shader upload
19:48 fdobridge: <m​arysaka> I think I check for 75
19:48 fdobridge: <m​arysaka> I think I check for sm75 so should be fine (edited)
19:48 fdobridge: <g​fxstrand> For the compiler side, we just assume the full size header and it's fine.
19:48 fdobridge: <g​fxstrand> We should maybe assert that the high bits are 0 or something
19:48 fdobridge: <k​arolherbst🐧🦀> yeah.. it's the same as previous gens
19:48 fdobridge: <k​arolherbst🐧🦀> just the alignment matters I think
19:48 fdobridge: <k​arolherbst🐧🦀> _should_ be fine
19:50 fdobridge: <g​fxstrand> Really, someone needs to spend a day with it to get the CTS passing, do a conformance submission, and then we can forget Volta exists.
19:50 fdobridge: <g​fxstrand> There's so few of them...
19:50 fdobridge: <k​arolherbst🐧🦀> yeah
19:50 fdobridge: <k​arolherbst🐧🦀> mood
19:51 fdobridge: <m​arysaka> there is Tegra but yeah not even sure there is much product using the Xavier serie tbh...
19:51 fdobridge: <k​arolherbst🐧🦀> @gfxstrand uhm... let me check instruction support
19:52 fdobridge: <k​arolherbst🐧🦀> huh
19:52 fdobridge: <k​arolherbst🐧🦀> volta has `FADD32I` whatever that is
19:52 fdobridge: <k​arolherbst🐧🦀> oh wait.. that's just an alias...
19:53 fdobridge: <k​arolherbst🐧🦀> Volta doesn't have `FOOTPRINT`
19:54 fdobridge: <g​fxstrand> Yeah, FOOTPRINT is a Turing feature
19:54 fdobridge: <g​fxstrand> Volta seems to be where they tried out the fancy new ISA and then Turing was when they added the featrues.
19:54 fdobridge: <g​fxstrand> Volta seems to be where they tried out the fancy new ISA and then Turing was when they added the features. (edited)
19:54 fdobridge: <k​arolherbst🐧🦀> volta doens't have saturating `I2I`
19:54 fdobridge: <g​fxstrand> Oh fun....
19:54 fdobridge: <g​fxstrand> Wait, does turing?
19:54 fdobridge: <k​arolherbst🐧🦀> yes
19:54 fdobridge: <g​fxstrand> We're not using it yet
19:55 fdobridge: <g​fxstrand> Probably should
19:56 fdobridge: <k​arolherbst🐧🦀> volta doesn't have `IDE` which is uhm.. something weird
19:56 fdobridge: <k​arolherbst🐧🦀> only relevant for shader debugging
19:56 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> `SATA` is better anyway /s
19:57 fdobridge: <k​arolherbst🐧🦀> @gfxstrand volta doesn't have `IMNMX`
19:58 fdobridge: <g​fxstrand> Oh, well that's annoying.
19:59 fdobridge: <g​fxstrand> Maxwell has IMNMX
19:59 fdobridge: <g​fxstrand> WTH
19:59 fdobridge: <k​arolherbst🐧🦀> mhhh
19:59 fdobridge: <k​arolherbst🐧🦀> maybe the name is different?
19:59 fdobridge: <k​arolherbst🐧🦀> `MUFU.TANH` is turing+
20:00 fdobridge: <k​arolherbst🐧🦀> seems like we don't use `IMNMX` for turing in codegen ;')
20:00 fdobridge: <k​arolherbst🐧🦀> RIP
20:01 fdobridge: <k​arolherbst🐧🦀> `GV100LegalizeSSA::handleIMNMX` 🥲
20:02 fdobridge: <g​fxstrand> RIP
20:03 fdobridge: <k​arolherbst🐧🦀> I think that's more or less the biggest stuff
20:03 fdobridge: <k​arolherbst🐧🦀> it's kinda a pain to compare 😄
20:04 fdobridge: <m​arysaka> that's ood how do you debug stuffs huh
20:07 fdobridge: <k​arolherbst🐧🦀> you just trust the c++ compiler
20:08 fdobridge: <d​adschoorse> I guess you just have to use a plain text editor instead? 🐸
20:09 fdobridge: <k​arolherbst🐧🦀> yeah, what else are you using on your HPC cluster anyway
20:30 fdobridge: <g​fxstrand> Okay, I've actually merged that one because it's a rename and conflicts with a bunch of the SM50 work. I've also rebased the sm50 branch on top of it.
20:32 fdobridge: <g​fxstrand> The new Volta MR lives at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26214 and I've assigned it to @karolherbst . Feel free to add patches to the nvk/volta branch to fix whatever you find and let me know when it's working. Or you can tell me about a bug and ask me to type a patch. That works, too. I just don't have the HW. I put the new branch in nouveau/mesa so it's a bit easier for us to both push to it.
20:35 fdobridge: <k​arolherbst🐧🦀> cool
20:35 fdobridge: <k​arolherbst🐧🦀> nah, I can write some code once in a while :ferrisUpsideDown:
20:46 fdobridge: <a​irlied> Well either I will find robert in next hour or I will do Volta support when I get home 🙂
20:50 fdobridge: <k​arolherbst🐧🦀> 😄
21:02 fdobridge: <g​fxstrand> Is NVK going to be the first conformant open-source Khronos API implementation on NVIDIA hardware?
21:02 fdobridge: <g​fxstrand> DId we ever do GL or GLES on nouveau GL?
21:10 fdobridge: <k​arolherbst🐧🦀> uhm...
21:10 fdobridge: <k​arolherbst🐧🦀> no
21:10 fdobridge: <k​arolherbst🐧🦀> for $reasons
21:10 fdobridge: <k​arolherbst🐧🦀> like
21:11 fdobridge: <g​fxstrand> Okay, cool. More stuff I can brag about in my upcoming blog post. 😅
21:11 fdobridge: <k​arolherbst🐧🦀> driver is dying randomly 🥲
21:11 fdobridge: <g​fxstrand> Well, yeah, that'd be a reason. 😝
21:11 fdobridge: <k​arolherbst🐧🦀> I think a normal GL CTS run passes like 99.9%
21:11 fdobridge: <k​arolherbst🐧🦀> and I had patches to fix the rest
21:11 fdobridge: <k​arolherbst🐧🦀> but doesn't help if the GPU dies randomly
21:12 fdobridge: <k​arolherbst🐧🦀> worst part is that it mostly happens after a couple of hours
21:13 fdobridge: <k​arolherbst🐧🦀> it's a pain
21:17 HdkR: the imnmx missing in Volta thing is fun. it's like, oops missed a couple of instructions in the new ISA and then they get added back the next generation :P
21:22 fdobridge: <g​fxstrand> Yeah... That's very much the way it looks.
21:23 fdobridge: <g​fxstrand> "We reworked the ISA for you."
21:23 fdobridge: <g​fxstrand> "WTF did you put IMNMX?!?"
21:23 fdobridge: <g​fxstrand> "Oh, right... we forgot about that one. Sorry!"
21:28 fdobridge: <g​fxstrand> Missing both IMNMX and saturating I2I is kinda brutal, NGL.
21:32 HdkR: Silver lining that they started throwing way more ALUs on the cards, so it can handle missing a few while they get added back
21:33 fdobridge: <g​fxstrand> Yeah
21:34 fdobridge: <k​arolherbst🐧🦀> it also misses `I2IP` but that probably doesn't matter much for now
21:35 fdobridge: <k​arolherbst🐧🦀> or well.. that's also saturating so it makes sense
21:35 fdobridge: <k​arolherbst🐧🦀> wait a sec...
21:36 fdobridge: <k​arolherbst🐧🦀> there are `I2IP` variants with `S4`/`U4` dest formats...
21:36 fdobridge: <k​arolherbst🐧🦀> on turing
21:36 fdobridge: <k​arolherbst🐧🦀> and `S2`/`U2`
21:36 fdobridge: <k​arolherbst🐧🦀> huh...
21:36 fdobridge: <k​arolherbst🐧🦀> that instruction is like super weird
21:37 fdobridge: <g​fxstrand> Wait, what?!? Why do you want S4?
21:37 fdobridge: <g​fxstrand> I mean, why not but also why?
21:37 fdobridge: <g​fxstrand> Maybe for some crazy S4 machine learning stuff?
21:38 fdobridge: <k​arolherbst🐧🦀> it's super odd.. let me say how it works
21:38 fdobridge: <k​arolherbst🐧🦀> if you do `4` you get the lower 24 bits from src2 into upper 24 bits the dest, with `2` 26 bits
21:39 fdobridge: <k​arolherbst🐧🦀> with `2` 28 bits
21:39 fdobridge: <k​arolherbst🐧🦀> so src0 gets placed at 0, src1 at + size and then filled with src2
21:40 fdobridge: <g​fxstrand> Okay, that might almost be useful
21:40 fdobridge: <k​arolherbst🐧🦀> `8` also allows you to select `HI`/`LO` of src2
21:40 fdobridge: <k​arolherbst🐧🦀> but yeah.. no idea why 2 and 4 😄
21:41 fdobridge: <k​arolherbst🐧🦀> let's see if ptx has something there
21:42 fdobridge: <k​arolherbst🐧🦀> maybe that's `e4m3x2` and `e5m2x2` stuff?
21:42 fdobridge: <k​arolherbst🐧🦀> aren't those weirdo float formats?
21:42 fdobridge: <k​arolherbst🐧🦀> no idea
21:43 fdobridge: <k​arolherbst🐧🦀> `Sub byte types (.u4/.s4 and .u2/.s2) requires sm_75 or higher.` ahh
21:43 fdobridge: <k​arolherbst🐧🦀> so yeah.. PTX has those natively as well
21:43 fdobridge: <k​arolherbst🐧🦀> `cvt.pack`
21:43 fdobridge: <k​arolherbst🐧🦀> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cvt-pack
21:44 fdobridge: <a​irlied> Some of those are for tensors but also I think some got removed in hopper
21:46 fdobridge: <k​arolherbst🐧🦀> I hate it when vendors remove changelogs 😄
21:47 fdobridge: <k​arolherbst🐧🦀> ahh found it
21:48 fdobridge: <k​arolherbst🐧🦀> yeah.. sounds like tensor stuff
22:12 fdobridge: <b​enjaminl> curious what other people's workflows are for testing runtime behavior when REing shader instructions
22:12 fdobridge: <b​enjaminl> normally I've been doing this by sticking the encoding in NAK and then trying to write a glsl shader that causes it to emit the instruction I'm interested in, but this can be a pain for some stuff
22:13 fdobridge: <b​enjaminl> currently want to test signed IMUL behavior for https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/255#note_2169502, and haven't found a way to generate it from glsl yet
22:20 fdobridge: <g​fxstrand> https://gitlab.freedesktop.org/gfxstrand/mesa/-/commits/nak/hw-tests
22:21 fdobridge: <g​fxstrand> It's not in great shape and I'd like to replace it with something that involves less back-doors (maybe just submit a compute shader directly?) but that's what I did to figure out IAdd3
22:22 fdobridge: <k​arolherbst🐧🦀> new wiki is live :ferrisSmug:
22:23 fdobridge: <b​enjaminl> ooh, that looks very helpful, thanks!
22:36 fdobridge: <g​fxstrand> I'm not actually fundamentally opposed to the back-door. I more struggled with the fact that I didn't have a good way get the GPU generation or SM out.
22:47 fdobridge: <g​fxstrand> I could probably patch that through `deviceUUID`
22:53 fdobridge: <e​sdrastarsis> Strange Brigade was working on nak/main branch but it crashes on mesa upstream
22:53 fdobridge: <g​fxstrand> backtrace?
22:56 fdobridge: <e​sdrastarsis> How? I'm using proton
22:57 fdobridge: <g​fxstrand> Good question. I, unfortunately, don't know the answer. I've heard a rumor that GDB works but I've never had a good time with it.
22:57 fdobridge: <g​fxstrand> Do you see a panic?
22:58 fdobridge: <e​sdrastarsis> Yeah, an assertion failure in winevulkan
23:22 fdobridge: <g​fxstrand> 😭
23:30 fdobridge: <d​adschoorse> that should you at least tell you the vk function that's failing