03:48 fdobridge: <g​fxstrand> First Ampere conformance run going. Starting with x86_64
04:14 fdobridge: <a​irlied> dakr, gfxstrand : I talked a bit with Maria today about v3d and gpuvm but also she mentioned userptr ideas
04:23 fdobridge: <b​enjaminl> okay so figuring out `SHFL.UP` was pretty simple after I actually read the PTX docs, but now I'm confused how it passes the CTS on turing
04:23 fdobridge: <b​enjaminl> the lower 5 bits are a lower-limit on the source lane in `SHFL.UP` and an upper-limit for all the other ops
04:24 fdobridge: <b​enjaminl> I'm betting they changed the behavior for turing and ptxas is transforming it
04:30 fdobridge: <b​enjaminl> nope... ptxas emits the same value on both sm50 and sm75...
04:33 fdobridge: <b​enjaminl> oh... it probably passes the CTS because we're not advertising `VK_SUBGROUP_FEATURE_SHUFFLE_BIT`, so the CTS is just skipping all those tests
04:33 fdobridge: <b​enjaminl> I had to add that to get it to run the tests on maxwell
04:59 fdobridge: <b​enjaminl> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26202 MR against mesa/mesa to fix it, since this applies to SM75
05:00 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> This is the only extension Wine uses for WoW64 support (there's map_memory_placed support for Wine in some branch though)
05:11 fdobridge: <g​fxstrand> Sounds plausible. Yeah, I got stuck on some control-flow stuff and that's why I didn't get all of subgroups wired up.
05:12 fdobridge: <g​fxstrand> That's my next task once I get back to actually writing code and not just moving stuff around. 😅
05:23 HdkR: map_memory_placed? What's this?
05:24 HdkR: Sounds interesting to my needs and google isn't giving me anything
05:30 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> HdkR: An extension that allows mapping memory in a specific address: https://github.com/KhronosGroup/Vulkan-Docs/pull/1906
05:31 HdkR: ah neat. Fixes the problem of wanting to allocate in to their allocator space without punching a hole in the mapping and introducing bugs
05:31 HdkR: Doesn't quite fit my needs, but cool to see.
06:36 fdobridge: <b​enjaminl> got vkcube running on SM50 🙂
06:37 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Is this Shader Model 5? /s
06:40 fdobridge: <b​enjaminl> maxwell
06:40 fdobridge: <b​enjaminl> no idea how things map to directx
06:42 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> NVIDIA and Microsoft both using SM makes things confusing :nouveau:
09:43 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Here's my attempt at rebasing the pipeline cache MR (it compiles but I'm not sure how well it works but at least DXVK doesn't crash) 🐸
09:43 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1174283543699013663/nvk-pipeline-cache.patch?ex=656707c5&is=655492c5&hm=0a13704556e9f137f85e4733fff284dfca257a951e59fb23d8d568b2d8d47fef&
10:29 fdobridge: <p​homes_> Thank you for the rebase. I appreciate it. The refactoring that just landed requires that the whole MR is redone a fair bit and I want to perhaps do some things in a different way. I plan to work on that next week
11:35 fdobridge: <p​rop_energy_ball> https://www.youtube.com/watch?v=npxsD7u52qk
11:36 fdobridge: <p​rop_energy_ball> Holy shit, Hat in Time running pretty well :)
11:36 fdobridge: <p​rop_energy_ball> Awesome work
11:37 fdobridge: <p​rop_energy_ball> Oh damn ♻️... I'm late. Sorry!
11:50 fdobridge: <m​ohamexiety> check this out too @prop_energy_ball :3
11:52 fdobridge: <p​rop_energy_ball> Niice
11:56 fdobridge: <m​arysaka> :hatkid:
11:58 fdobridge: <d​adschoorse> would run better with fmulz in NAK 🐸
12:01 fdobridge: <p​rop_energy_ball> DXVK should just pre-emptively enable the opencoded proper version now :demon:
12:37 fdobridge: <a​irlied> https://youtu.be/LipsVK5d_vM?t=19576 (my nouveau talk from yesterday, in case anyone is interested)
12:50 RSpliet: airlied: IIRC just sticking the firmware files in /boot is exactly what arch does? Or this is what I think I learned a few days ago when someone accidentally DD'd over the first gigabyte of his SSD and had to regenerate his partition table, boot and EFI partitions.
12:54 fdobridge: <d​adschoorse> inb4 nvidia refuses to optimize it because of the denorm issue
12:55 fdobridge: <d​adschoorse> @prop_energy_ball dxvk enabled it a while back for nvk
12:55 fdobridge: <d​adschoorse> I wrote that patch when codegen was used and we didn't know about the implied ftz yet
12:56 fdobridge: <d​adschoorse> so technically nak regressed dxvk
13:25 fdobridge: <e​sdrastarsis> Dota Underlords (low settings)
13:25 fdobridge: <e​sdrastarsis> https://cdn.discordapp.com/attachments/1034184951790305330/1174339298422689883/20231115_10h24m09s_grim.jpeg?ex=65673bb2&is=6554c6b2&hm=1d67b18aaa8fb41345b226ae8e966cced9908f063d39bbb55e360b9ebdd702ef&
13:25 fdobridge: <e​sdrastarsis> On Wayland
13:32 fdobridge: <k​arolherbst🐧🦀> :ferrisBongo:
14:32 fdobridge: <e​sdrastarsis> No Man's Sky... at least it boots :ferris_happy:
14:32 fdobridge: <e​sdrastarsis> https://cdn.discordapp.com/attachments/1034184951790305330/1174356240432902195/20231115_11h30m48s_grim.jpeg?ex=65674b7a&is=6554d67a&hm=ad76fddc52f2342fdc17066d26a6a68033c7b7ecee1be7ef714372f4bfd4b870&
14:33 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> With an old DXVK version?
14:36 fdobridge: <r​hed0x> no mans sky uses vulkan directly
14:37 fdobridge: <r​hed0x> who needs depth testing anyway 🐸
14:42 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> It's fairly rare to see Windows games using Vulkan
14:43 fdobridge: <e​sdrastarsis> Yeah, I'm using the pcgamingwiki's list
15:31 gfxstrand: dakr: I think we might have a fence roll-over issue or similar
15:35 fdobridge: <g​fxstrand> Thanks! Sorry for shifting the sand out from underneath you but I think caching will be a lot easier now.
15:38 fdobridge: <m​arysaka> @gfxstrand did something changed around sample locations with the 2 MR you merged? I have 14 failures and I'm out of idea about what might cause them...
15:38 fdobridge: <m​arysaka> https://cdn.discordapp.com/attachments/1034184951790305330/1174372865563439166/image.png?ex=65675af5&is=6554e5f5&hm=ef27f2be81e8adff47fccdbe8df1c789f9ea155ee6554cde76cbf74c21d8371e&
15:39 fdobridge: <g​fxstrand> Quite possibly...
15:39 fdobridge: <g​fxstrand> I didn't intend to change anything around sample locations
15:39 fdobridge: <g​fxstrand> But the FS input interp rework may have had a bug.
15:40 fdobridge: <m​arysaka> Will push that somewhere and tomorrow I will try to diff my old branch vs the new one on the shader code output
15:40 fdobridge: <g​fxstrand> Also, I re-ordered the universe so that may have affected things, too.
15:40 fdobridge: <g​fxstrand> sounds good
15:40 fdobridge: <m​arysaka> I tried to remove the ordering and remove flags for both my new intrinsic and ipa_nv but that didn't changed a thing sadly...
15:43 dakr: gfxstrand: What did you observe?
15:46 fdobridge: <m​arysaka> @gfxstrand did something changed around sample locations with the 2 MR you merged? I have 14 failures and I'm out of idea about what might cause them... (msaa_interpolate_at_offset / msaa_interpolate_at_sample but only for perspective) (edited)
15:46 fdobridge: <m​arysaka> https://cdn.discordapp.com/attachments/1034184951790305330/1174372865563439166/image.png?ex=65675af5&is=6554e5f5&hm=ef27f2be81e8adff47fccdbe8df1c789f9ea155ee6554cde76cbf74c21d8371e&
15:51 gfxstrand: dakr: Trying to do a full CTS run on Ampere last night. It failed twice at exactly the same test. It was a synchronization test. In both cases, the kernel thought it timed out and killed my context.
15:51 gfxstrand: Unfortunately, I can't repro with a single test. If I run dEQP-VK.synchronization.timeline_semaphore.*, they all pass.
15:52 gfxstrand: So there's some way in which we're sort of setting up for the bug.
15:53 gfxstrand: I have trouble believing that it's a userspace bug but it's hard to tell
15:53 gfxstrand: Fences wrapping is just a blind guess, BTW.
15:53 gfxstrand: A full CTS run is enough fences that it could hit a u32 overflow bug if we had one.
16:00 gfxstrand: dakr: So, yeah, kind-of a wild guess, unfortunately, but there's a bug of some sort in there.
16:01 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> I guess I'm going to push this patch into my AUR package then
16:04 gfxstrand: dakr: Yeah, I all of dEQP-VK.synchronization.* passis if I don't run the rest of the CTS before it. :sob:
16:38 fdobridge: <e​sdrastarsis> I think the other patches need a rebase too
16:38 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Other ones seem to be fine
16:50 fdobridge: <e​sdrastarsis> The patches are not applying here
16:51 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> It's only the pipeline caching one (let me quickly push the updated patch)
16:52 fdobridge: <p​homes_> no problem. The code is much nicer to work with now 🙂
16:55 fdobridge: <p​homes_> VK_EXT_color_write_enable is done. That is one (small) step closer to zink
19:31 fdobridge: <g​fxstrand> Ugh.. Titan V's are expensive...
19:31 fdobridge: <g​fxstrand> @karolherbst, @airlied Do either of you have an actual Volta card?
19:32 fdobridge: <g​fxstrand> I don't really want to drop $600 on a GPU that's never going to be fast.
19:33 fdobridge: <g​fxstrand> Pascals are like $100 but Voltas are expensive
19:33 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> I have a GPU with Volta NVENC 🐸
19:33 fdobridge: <a​irlied> I have a GV100 in my bag sitting beside me
19:33 fdobridge: <a​irlied> I was going to give it to Robert Foss to ship to Karol
19:33 fdobridge: <g​fxstrand> Heh
19:34 fdobridge: <g​fxstrand> Really, as long as someone has one, that's the important thing.
19:34 fdobridge: <a​irlied> I just gave ajax a volta also yesterdayt
19:34 fdobridge: <g​fxstrand> I can write the Volta MR but I don't have the HW to test it
19:34 fdobridge: <k​arolherbst🐧🦀> I can do it once I have the GPU 😄
19:34 fdobridge: <k​arolherbst🐧🦀> but yeah...
19:35 fdobridge: <k​arolherbst🐧🦀> on tha ISA side it's mostly like Turing just without uniform registers/predicates
19:35 fdobridge: <g​fxstrand> Yup
19:35 fdobridge: <g​fxstrand> I'm reworking NAK to use the Turing encodings for Volta
19:35 fdobridge: <k​arolherbst🐧🦀> I think the 3d/compute stuff is a bit different...
19:35 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> What other niche NVIDIA cards do you have laying around?
19:35 fdobridge: <k​arolherbst🐧🦀> volta still uses the old sph?
19:35 fdobridge: <k​arolherbst🐧🦀> something like that?
19:35 fdobridge: <k​arolherbst🐧🦀> dunno
19:37 fdobridge: <a​irlied> I have all the niche nvidia cards now, just not in my bag sitting beside me
19:37 fdobridge: <a​irlied> I gave away a jetson tk1 and tx1 yesterday
19:37 fdobridge: <k​arolherbst🐧🦀> the only niche one I have is the titan one :ferrisUpsideDown:
19:38 fdobridge: <a​irlied> I think I gave ajax the Titan, this one is the quadro
19:38 fdobridge: <k​arolherbst🐧🦀> @mupuf also still has a couple of GPUs
19:39 fdobridge: <g​fxstrand> Most of them aren't too hard to get your hands on. It's just that Volta only came in like 3 really high-end GPUs so they're stupid $$ on eBay.
19:39 mupuf: A couple :D That's a gross misrepresentation :D
19:39 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> How about the NV1?
19:40 fdobridge: <a​irlied> I think I have an nv3 but the caps are blown
19:40 fdobridge: <k​arolherbst🐧🦀> yeah....
19:40 fdobridge: <k​arolherbst🐧🦀> my NV4x gpus are also half trash
19:40 fdobridge: <k​arolherbst🐧🦀> though I never know if driver or hardware issue
19:41 fdobridge: <m​arysaka> I have ton of Maxwell GPUs around (most of them GM20B), some Kepler, some Fermi, and two Ampere
19:41 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> If they aren't SMD then they should be easy to replace 👩‍🔧
19:41 fdobridge: <m​arysaka> ~~I will not count the two Geforce 6600LE cards I have~~
19:42 fdobridge: <m​arysaka> I have ton of Maxwell GPUs around (most of them GM20B), 3 Pascal, some Kepler, some Fermi, and two Ampere (edited)
19:43 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> I had a NV96 until 2014
19:43 fdobridge: <m​arysaka> that remind me I need to open the issue about tegra's regression and the sync issue on prior versions
19:44 fdobridge: <g​fxstrand> @karolherbst Once you get the GPU, mind giving this a test? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26212
19:44 fdobridge: <m​arysaka> I would be nice to have the X1 be a nice target as it is quite faster than regular dGPU for Maxwell testing
19:44 fdobridge: <g​fxstrand> Or anyone else who has an actual Volta.
19:44 fdobridge: <k​arolherbst🐧🦀> that's going to be fun
19:44 fdobridge: <k​arolherbst🐧🦀> 🙂
19:44 fdobridge: <g​fxstrand> Shouldn't be too spicy
19:44 fdobridge: <k​arolherbst🐧🦀> well
19:44 fdobridge: <k​arolherbst🐧🦀> I wouldn't be surprised if we have a few `TURING_A` checks too many
19:45 fdobridge: <g​fxstrand> Yeah....
19:45 fdobridge: <k​arolherbst🐧🦀> or well.. if some needs to be `VOLTA_A` instead
19:45 fdobridge: <g​fxstrand> That's going to be the tricky bit
19:45 fdobridge: <k​arolherbst🐧🦀> run the CTS and check dmesg 😄
19:45 fdobridge: <g​fxstrand> It should only take like a day to find them all
19:45 fdobridge: <k​arolherbst🐧🦀> yeah...
19:45 fdobridge: <k​arolherbst🐧🦀> shouldn't be too hard
19:46 fdobridge: <m​ohamexiety> I have 1 Ampere and 1 Pascal. _may_ get Ada in the future but really not sure
19:46 fdobridge: <k​arolherbst🐧🦀> seems like Volta doesn't use the shader heap anymore, so that's good
19:47 fdobridge: <m​arysaka> random question: do anyone have any re documentations around MME macros used by the proprietary driver?
19:47 fdobridge: <k​arolherbst🐧🦀> @gfxstrand yeah soo... the new SPH stuff is Turing+ 🥲
19:47 fdobridge: <k​arolherbst🐧🦀> I _hope_ the alignment stuff checks out for volta
19:47 fdobridge: <k​arolherbst🐧🦀> as the header is 0x50 in size
19:48 fdobridge: <g​fxstrand> That's not a big deal. It only really impacts shader upload
19:48 fdobridge: <m​arysaka> I think I check for 75
19:48 fdobridge: <m​arysaka> I think I check for sm75 so should be fine (edited)
19:48 fdobridge: <g​fxstrand> For the compiler side, we just assume the full size header and it's fine.
19:48 fdobridge: <g​fxstrand> We should maybe assert that the high bits are 0 or something
19:48 fdobridge: <k​arolherbst🐧🦀> yeah.. it's the same as previous gens
19:48 fdobridge: <k​arolherbst🐧🦀> just the alignment matters I think
19:48 fdobridge: <k​arolherbst🐧🦀> _should_ be fine
19:50 fdobridge: <g​fxstrand> Really, someone needs to spend a day with it to get the CTS passing, do a conformance submission, and then we can forget Volta exists.
19:50 fdobridge: <g​fxstrand> There's so few of them...
19:50 fdobridge: <k​arolherbst🐧🦀> yeah
19:50 fdobridge: <k​arolherbst🐧🦀> mood
19:51 fdobridge: <m​arysaka> there is Tegra but yeah not even sure there is much product using the Xavier serie tbh...
19:51 fdobridge: <k​arolherbst🐧🦀> @gfxstrand uhm... let me check instruction support
19:52 fdobridge: <k​arolherbst🐧🦀> huh
19:52 fdobridge: <k​arolherbst🐧🦀> volta has `FADD32I` whatever that is
19:52 fdobridge: <k​arolherbst🐧🦀> oh wait.. that's just an alias...
19:53 fdobridge: <k​arolherbst🐧🦀> Volta doesn't have `FOOTPRINT`
19:54 fdobridge: <g​fxstrand> Yeah, FOOTPRINT is a Turing feature
19:54 fdobridge: <g​fxstrand> Volta seems to be where they tried out the fancy new ISA and then Turing was when they added the featrues.
19:54 fdobridge: <g​fxstrand> Volta seems to be where they tried out the fancy new ISA and then Turing was when they added the features. (edited)
19:54 fdobridge: <k​arolherbst🐧🦀> volta doens't have saturating `I2I`
19:54 fdobridge: <g​fxstrand> Oh fun....
19:54 fdobridge: <g​fxstrand> Wait, does turing?
19:54 fdobridge: <k​arolherbst🐧🦀> yes
19:54 fdobridge: <g​fxstrand> We're not using it yet
19:55 fdobridge: <g​fxstrand> Probably should
19:56 fdobridge: <k​arolherbst🐧🦀> volta doesn't have `IDE` which is uhm.. something weird
19:56 fdobridge: <k​arolherbst🐧🦀> only relevant for shader debugging
19:56 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> `SATA` is better anyway /s
19:57 fdobridge: <k​arolherbst🐧🦀> @gfxstrand volta doesn't have `IMNMX`
19:58 fdobridge: <g​fxstrand> Oh, well that's annoying.
19:59 fdobridge: <g​fxstrand> Maxwell has IMNMX
19:59 fdobridge: <g​fxstrand> WTH
19:59 fdobridge: <k​arolherbst🐧🦀> mhhh
19:59 fdobridge: <k​arolherbst🐧🦀> maybe the name is different?
19:59 fdobridge: <k​arolherbst🐧🦀> `MUFU.TANH` is turing+
20:00 fdobridge: <k​arolherbst🐧🦀> seems like we don't use `IMNMX` for turing in codegen ;')
20:00 fdobridge: <k​arolherbst🐧🦀> RIP
20:01 fdobridge: <k​arolherbst🐧🦀> `GV100LegalizeSSA::handleIMNMX` 🥲
20:02 fdobridge: <g​fxstrand> RIP
20:03 fdobridge: <k​arolherbst🐧🦀> I think that's more or less the biggest stuff
20:03 fdobridge: <k​arolherbst🐧🦀> it's kinda a pain to compare 😄
20:04 fdobridge: <m​arysaka> that's ood how do you debug stuffs huh
20:07 fdobridge: <k​arolherbst🐧🦀> you just trust the c++ compiler
20:08 fdobridge: <d​adschoorse> I guess you just have to use a plain text editor instead? 🐸
20:09 fdobridge: <k​arolherbst🐧🦀> yeah, what else are you using on your HPC cluster anyway
20:30 fdobridge: <g​fxstrand> Okay, I've actually merged that one because it's a rename and conflicts with a bunch of the SM50 work. I've also rebased the sm50 branch on top of it.
20:32 fdobridge: <g​fxstrand> The new Volta MR lives at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26214 and I've assigned it to @karolherbst . Feel free to add patches to the nvk/volta branch to fix whatever you find and let me know when it's working. Or you can tell me about a bug and ask me to type a patch. That works, too. I just don't have the HW. I put the new branch in nouveau/mesa so it's a bit easier for us to both push to it.
20:35 fdobridge: <k​arolherbst🐧🦀> cool
20:35 fdobridge: <k​arolherbst🐧🦀> nah, I can write some code once in a while :ferrisUpsideDown:
20:46 fdobridge: <a​irlied> Well either I will find robert in next hour or I will do Volta support when I get home 🙂
20:50 fdobridge: <k​arolherbst🐧🦀> 😄
21:02 fdobridge: <g​fxstrand> Is NVK going to be the first conformant open-source Khronos API implementation on NVIDIA hardware?
21:02 fdobridge: <g​fxstrand> DId we ever do GL or GLES on nouveau GL?
21:10 fdobridge: <k​arolherbst🐧🦀> uhm...
21:10 fdobridge: <k​arolherbst🐧🦀> no
21:10 fdobridge: <k​arolherbst🐧🦀> for $reasons
21:10 fdobridge: <k​arolherbst🐧🦀> like
21:11 fdobridge: <g​fxstrand> Okay, cool. More stuff I can brag about in my upcoming blog post. 😅
21:11 fdobridge: <k​arolherbst🐧🦀> driver is dying randomly 🥲
21:11 fdobridge: <g​fxstrand> Well, yeah, that'd be a reason. 😝
21:11 fdobridge: <k​arolherbst🐧🦀> I think a normal GL CTS run passes like 99.9%
21:11 fdobridge: <k​arolherbst🐧🦀> and I had patches to fix the rest
21:11 fdobridge: <k​arolherbst🐧🦀> but doesn't help if the GPU dies randomly
21:12 fdobridge: <k​arolherbst🐧🦀> worst part is that it mostly happens after a couple of hours
21:13 fdobridge: <k​arolherbst🐧🦀> it's a pain
21:17 HdkR: the imnmx missing in Volta thing is fun. it's like, oops missed a couple of instructions in the new ISA and then they get added back the next generation :P
21:22 fdobridge: <g​fxstrand> Yeah... That's very much the way it looks.
21:23 fdobridge: <g​fxstrand> "We reworked the ISA for you."
21:23 fdobridge: <g​fxstrand> "WTF did you put IMNMX?!?"
21:23 fdobridge: <g​fxstrand> "Oh, right... we forgot about that one. Sorry!"
21:28 fdobridge: <g​fxstrand> Missing both IMNMX and saturating I2I is kinda brutal, NGL.
21:32 HdkR: Silver lining that they started throwing way more ALUs on the cards, so it can handle missing a few while they get added back
21:33 fdobridge: <g​fxstrand> Yeah
21:34 fdobridge: <k​arolherbst🐧🦀> it also misses `I2IP` but that probably doesn't matter much for now
21:35 fdobridge: <k​arolherbst🐧🦀> or well.. that's also saturating so it makes sense
21:35 fdobridge: <k​arolherbst🐧🦀> wait a sec...
21:36 fdobridge: <k​arolherbst🐧🦀> there are `I2IP` variants with `S4`/`U4` dest formats...
21:36 fdobridge: <k​arolherbst🐧🦀> on turing
21:36 fdobridge: <k​arolherbst🐧🦀> and `S2`/`U2`
21:36 fdobridge: <k​arolherbst🐧🦀> huh...
21:36 fdobridge: <k​arolherbst🐧🦀> that instruction is like super weird
21:37 fdobridge: <g​fxstrand> Wait, what?!? Why do you want S4?
21:37 fdobridge: <g​fxstrand> I mean, why not but also why?
21:37 fdobridge: <g​fxstrand> Maybe for some crazy S4 machine learning stuff?
21:38 fdobridge: <k​arolherbst🐧🦀> it's super odd.. let me say how it works
21:38 fdobridge: <k​arolherbst🐧🦀> if you do `4` you get the lower 24 bits from src2 into upper 24 bits the dest, with `2` 26 bits
21:39 fdobridge: <k​arolherbst🐧🦀> with `2` 28 bits
21:39 fdobridge: <k​arolherbst🐧🦀> so src0 gets placed at 0, src1 at + size and then filled with src2
21:40 fdobridge: <g​fxstrand> Okay, that might almost be useful
21:40 fdobridge: <k​arolherbst🐧🦀> `8` also allows you to select `HI`/`LO` of src2
21:40 fdobridge: <k​arolherbst🐧🦀> but yeah.. no idea why 2 and 4 😄
21:41 fdobridge: <k​arolherbst🐧🦀> let's see if ptx has something there
21:42 fdobridge: <k​arolherbst🐧🦀> maybe that's `e4m3x2` and `e5m2x2` stuff?
21:42 fdobridge: <k​arolherbst🐧🦀> aren't those weirdo float formats?
21:42 fdobridge: <k​arolherbst🐧🦀> no idea
21:43 fdobridge: <k​arolherbst🐧🦀> `Sub byte types (.u4/.s4 and .u2/.s2) requires sm_75 or higher.` ahh
21:43 fdobridge: <k​arolherbst🐧🦀> so yeah.. PTX has those natively as well
21:43 fdobridge: <k​arolherbst🐧🦀> `cvt.pack`
21:43 fdobridge: <k​arolherbst🐧🦀> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cvt-pack
21:44 fdobridge: <a​irlied> Some of those are for tensors but also I think some got removed in hopper
21:46 fdobridge: <k​arolherbst🐧🦀> I hate it when vendors remove changelogs 😄
21:47 fdobridge: <k​arolherbst🐧🦀> ahh found it
21:48 fdobridge: <k​arolherbst🐧🦀> yeah.. sounds like tensor stuff
22:12 fdobridge: <b​enjaminl> curious what other people's workflows are for testing runtime behavior when REing shader instructions
22:12 fdobridge: <b​enjaminl> normally I've been doing this by sticking the encoding in NAK and then trying to write a glsl shader that causes it to emit the instruction I'm interested in, but this can be a pain for some stuff
22:13 fdobridge: <b​enjaminl> currently want to test signed IMUL behavior for https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/255#note_2169502, and haven't found a way to generate it from glsl yet
22:20 fdobridge: <g​fxstrand> https://gitlab.freedesktop.org/gfxstrand/mesa/-/commits/nak/hw-tests
22:21 fdobridge: <g​fxstrand> It's not in great shape and I'd like to replace it with something that involves less back-doors (maybe just submit a compute shader directly?) but that's what I did to figure out IAdd3
22:22 fdobridge: <k​arolherbst🐧🦀> new wiki is live :ferrisSmug:
22:23 fdobridge: <b​enjaminl> ooh, that looks very helpful, thanks!
22:36 fdobridge: <g​fxstrand> I'm not actually fundamentally opposed to the back-door. I more struggled with the fact that I didn't have a good way get the GPU generation or SM out.
22:47 fdobridge: <g​fxstrand> I could probably patch that through `deviceUUID`
22:53 fdobridge: <e​sdrastarsis> Strange Brigade was working on nak/main branch but it crashes on mesa upstream
22:53 fdobridge: <g​fxstrand> backtrace?
22:56 fdobridge: <e​sdrastarsis> How? I'm using proton
22:57 fdobridge: <g​fxstrand> Good question. I, unfortunately, don't know the answer. I've heard a rumor that GDB works but I've never had a good time with it.
22:57 fdobridge: <g​fxstrand> Do you see a panic?
22:58 fdobridge: <e​sdrastarsis> Yeah, an assertion failure in winevulkan
23:22 fdobridge: <g​fxstrand> 😭
23:30 fdobridge: <d​adschoorse> that should you at least tell you the vk function that's failing