09:07 fdobridge: <k​arolherbst🐧🦀> if you thought supporting nv30 is absurd, today we got a bug against nv10 from a user still using the hardware :ferrisUpsideDown:
09:13 fdobridge: <k​arolherbst🐧🦀> but they also submitted the fix, so I guess that's good 😄
09:20 fdobridge: <p​ac85> I've got this thing not sure what it is
09:20 fdobridge: <p​ac85> https://cdn.discordapp.com/attachments/1034184951790305330/1165942747706970122/IMG_20231023_111941.jpg?ex=6548afcd&is=65363acd&hm=0f5082e1246c8c55bd67baccdad6e60deb8efbb238bb75617428162e042a1bb1&
09:20 fdobridge: <k​arolherbst🐧🦀> probably fx 5200 or something
09:20 fdobridge: <k​arolherbst🐧🦀> that's nv30 anyway
09:21 fdobridge: <p​ac85> Uh I see thx
09:21 fdobridge: <k​arolherbst🐧🦀> I have the exact same one 😄
09:21 fdobridge: <k​arolherbst🐧🦀> it has a nv35 label on it
09:22 fdobridge: <p​ac85> Is there something interesting it can do? Seems like it can only support gl 2.1
09:22 fdobridge: <k​arolherbst🐧🦀> running gnome
09:23 fdobridge: <p​ac85> That's something
09:26 fdobridge: <k​arolherbst🐧🦀> yeah.. we fixed it like this and last year as it was broken since forever 😄
09:27 fdobridge: <k​arolherbst🐧🦀> https://gitlab.freedesktop.org/mesa/mesa/-/commit/7908cb895e2dea8cff91cecb53457a099dc96b07
09:27 fdobridge: <p​ac85> Oh wow, wouldn't have thought there would be interest in such old hw
09:28 fdobridge: <p​ac85> Oh wow, wouldn't have thought there would be some interest in such old hw (edited)
09:28 fdobridge: <k​arolherbst🐧🦀> and the most embarassing fix https://gitlab.freedesktop.org/mesa/mesa/-/commit/1387d1d41103b3120d40f93f66a7cfe00304bfd7
09:29 fdobridge: <p​ac85> Uh lol
09:29 fdobridge: <k​arolherbst🐧🦀> it was broken since the initial code drop 😄
09:36 fdobridge: <c​onan_kudo> :facepalmsonic:
09:36 fdobridge: <p​ac85> Whoooops
09:37 fdobridge: <p​ac85> Well, nv30 users can finally take advantage of the full potential of their hw
09:48 fdobridge: <k​arolherbst🐧🦀> yeah.. the main reason it wasn't found earlier was that desktops weren't hitting that code path before
09:49 fdobridge: <p​ac85> Uhm I see
12:21 fdobridge: <d​wlsalmeida> @vdpafaor I am working on a refactor of sm50 to eliminate encode_alu() and friends, maybe wait for this if you want to contribute on top?
13:00 fdobridge: <g​fxstrand> I should probably take a look at the sm50 branch soon. It looks like we're having to add a bunch of new versions of things and I'd like to have a more unified plan for how we're going to do that.
13:01 fdobridge: <g​fxstrand> I think OpLd and OpSt should be fine but I know a bunch of the ALU is different.
13:10 fdobridge: <k​arolherbst🐧🦀> yeah.. probably a good idea
13:11 fdobridge: <k​arolherbst🐧🦀> some instructions get a new name, but sometimes you get added/removed support for some bit sizes or flags or other funky details
13:11 fdobridge: <g​fxstrand> Yeah
13:11 fdobridge: <k​arolherbst🐧🦀> it's one of the biggest pain points of codegen and I'd like to not run into the same mistakes
13:11 fdobridge: <g​fxstrand> Yup
13:11 fdobridge: <g​fxstrand> And that's why I want to apply my editorial voice a bit
13:12 fdobridge: <g​fxstrand> I think the best thing to do at the moment is for the folks working on nv50 to just liberally insert SM checks in nak_from_nir.rs and plan to figure out how to unify later.
13:13 fdobridge: <k​arolherbst🐧🦀> NAK could also work in two stages... first to do passes which work on all gens, and then lower/convert to arch specific.. but that could add a lot of code and complexity, but might be better than trying to force all ISAs into a common IR
13:14 fdobridge: <k​arolherbst🐧🦀> like most of hte UBO pulls can just be moved into registers across all gens, might just require some callbacks to check if it's allowed for specific archs (or they get unpulled later) but yeah.. not sure what would be the best approach overall
13:17 fdobridge: <g​fxstrand> Yeah, I want to see it all in front of me before trying to come up with a grand unified plan that turns out to be busted.
13:17 fdobridge: <g​fxstrand> Yeah... I want to see it all in front of me before trying to come up with a grand unified plan that turns out to be busted. (edited)
13:18 fdobridge: <g​fxstrand> Premature unification is about as useful as premature optimization
13:18 fdobridge: <k​arolherbst🐧🦀> yeah
13:24 fdobridge: <g​fxstrand> My original plan was to "unify" on Turing but Turing is enough more clever than some of the older generations that IDK that we really want to do that, at least for integer ops.
13:24 fdobridge: <k​arolherbst🐧🦀> Turing is kinda weird, because it's like the same ISA compared to Volta, just with more stuff
13:24 fdobridge:<k​arolherbst🐧🦀> but
13:24 fdobridge: <g​fxstrand> Float should be fine. There's not a lot of clever there.
13:25 fdobridge: <k​arolherbst🐧🦀> Volta is different enough compared to the previous stuff
13:25 fdobridge: <k​arolherbst🐧🦀> it probably makes sense to have some ISA groups
13:25 fdobridge: <k​arolherbst🐧🦀> like SM1x, SM2x + SM30, SM35, SM5x + SM6x, SM7x+
13:26 fdobridge: <d​wlsalmeida> @gfxstrand I am merely removing encode_alu in favor of set_field() and friends, a-la sm75, I thought we had agreed on that?
13:26 fdobridge: <d​wlsalmeida> i.e. that it was more typing, but more clear in the end?
13:26 fdobridge: <k​arolherbst🐧🦀> no sure how much hooper is different with its SM90
13:28 fdobridge: <g​fxstrand> Yeah, that's all fine
13:28 fdobridge: <g​fxstrand> I'm more concerned with things like where Maxwell doesn't have LOP3 and we have to do something else.
13:29 fdobridge: <g​fxstrand> Shifts are another strange one
13:29 fdobridge: <g​fxstrand> Really, all the integer stuff gets funky
13:29 fdobridge: <k​arolherbst🐧🦀> yeah...
13:29 fdobridge: <g​fxstrand> All the core ones, anyway
13:29 fdobridge: <k​arolherbst🐧🦀> and then you have funky exceptions like fma with an imm32 as this affects RA
13:29 fdobridge: <g​fxstrand> On Turing, they got really clever with ways to make int64 fast(ish) and they're all a bit clever.
13:31 fdobridge: <m​arysaka> Kind of tried to make all the SM5x/SM6x related changes to common code separate to the actual encoder stuffs so hopefully it's not too much of a mess for you to review 😅
13:46 fdobridge: <g​fxstrand> @karolherbst I just learned who is to blame for the global load/store helpers bit. 😂
13:46 fdobridge: <k​arolherbst🐧🦀> I hope it's me
13:48 fdobridge: <g​fxstrand> Oh, I mean who at NVIDIA added the bit.
13:48 fdobridge: <k​arolherbst🐧🦀> ahhh
13:48 fdobridge: <k​arolherbst🐧🦀> I see
13:49 fdobridge: <k​arolherbst🐧🦀> ohh that helper invoc stuff?
13:49 fdobridge: <g​fxstrand> Yup
13:49 fdobridge: <g​fxstrand> I'm at a Khronos F2F and was chatting w/ folks and someone admitted to adding that bit. 😂
13:50 fdobridge: <k​arolherbst🐧🦀> 😄
13:50 fdobridge: <k​arolherbst🐧🦀> I hope they had a good reason?
13:51 fdobridge: <g​fxstrand> Mostly that they weren't sure where stuff was going to fall with the spec so they left themselves options.
13:51 fdobridge: <k​arolherbst🐧🦀> ahh
13:51 fdobridge: <k​arolherbst🐧🦀> that's pretty reasonable actually
14:01 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> What SMs can support D3D SM6 shaders? 😅
14:02 fdobridge: <k​arolherbst🐧🦀> I have no idea
14:02 fdobridge: <k​arolherbst🐧🦀> what's D3D SM6 anyway?
14:02 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/hlsl-shader-model-6-0-features-for-direct3d-12
14:03 fdobridge: <k​arolherbst🐧🦀> SM35 I'd guess?
14:03 fdobridge: <k​arolherbst🐧🦀> SM35 added shuffle
14:03 fdobridge: <k​arolherbst🐧🦀> but no half stuff.. uhh
14:03 fdobridge: <k​arolherbst🐧🦀> mhh
14:03 fdobridge: <k​arolherbst🐧🦀> yeah dunno
14:06 fdobridge: <g​fxstrand> Ugh... No JOINAT on Turing+
14:07 fdobridge: <k​arolherbst🐧🦀> welll
14:07 fdobridge: <k​arolherbst🐧🦀> there is
14:08 fdobridge: <k​arolherbst🐧🦀> @gfxstrand `BSSY` and `BSYNC`
14:08 fdobridge: <k​arolherbst🐧🦀> it uses barriers starting with Volta
14:09 fdobridge: <k​arolherbst🐧🦀> it kinda works the same even
14:16 fdobridge: <g​fxstrand> Yeah
14:16 fdobridge: <g​fxstrand> I need to figure out how that stuff works for realz
14:16 fdobridge: <k​arolherbst🐧🦀> well.. it's simple 😛
14:16 fdobridge: <g​fxstrand> It's "simple" until you realize we only have 15 barrier registers
14:16 fdobridge: <k​arolherbst🐧🦀> just do it around the outer most cfg
14:17 fdobridge: <g​fxstrand> Well, sure, I could do that
14:17 fdobridge: <k​arolherbst🐧🦀> but you can also spill barriers to regs
14:17 fdobridge: <k​arolherbst🐧🦀> `B2R` and `R2B`
14:17 fdobridge: <g​fxstrand> For that matter, I could put a WARPSYNC at any point where we re-converge back to uniform control-flow
14:18 fdobridge: <k​arolherbst🐧🦀> right...
14:18 fdobridge: <k​arolherbst🐧🦀> I think in codegen we've done it like two levels into the loop or something
14:18 fdobridge: <k​arolherbst🐧🦀> it kinda makes sense to converge on loop iterations so
15:34 fdobridge: <m​henning> iirc, codegen always reconverges loops before gv100 because it's fundamental to how the control flow works
15:35 fdobridge: <m​henning> it has a limit on how deep we nest reconvergence for if statements because we don't size the hardware stack correctly (also pre-gv100)
15:35 fdobridge: <k​arolherbst🐧🦀> it's broken though :ferrisUpsideDown:
15:35 fdobridge: <m​henning> post gv100, I'm pretty sure there's just no reconvergence at all
15:36 fdobridge: <k​arolherbst🐧🦀> we execute `warpsync` which kinda does the same, just in a different way or something
15:37 fdobridge: <m​henning> right, but not on control flow - only when some other operator needs it
15:37 fdobridge: <k​arolherbst🐧🦀> yeah
15:37 fdobridge: <k​arolherbst🐧🦀> doing it in control flow is mostly a perf optimization
15:37 fdobridge: <k​arolherbst🐧🦀> I have code to actually support it for volta+, just it never fixed anything
15:37 fdobridge: <k​arolherbst🐧🦀> and doing benchmarks is kinda.. uhh.. hard
15:38 fdobridge: <k​arolherbst🐧🦀> but the infra is mostly there. I've added the barrier stuff for quadop operations for 3d texgrad lowering
15:50 fdobridge: <m​henning> at any rate, I think a reasonable design here would be to have two different passes to insert control flow pre-gv100 and post-gv100
15:50 fdobridge: <m​henning> the control flow models are different enough that I think it's cleanest to convert to one or the other starting at nir's structured control flow
15:50 fdobridge: <m​henning> which would probably mean splitting the conversion from nir into two stages: 1) convert basic blocks 2) insert control flow
15:51 fdobridge: <k​arolherbst🐧🦀> yeah... potentially. Though you can map the pre volta thing pretty easily to the volta thing _if_ you make them use barriers and just throw them away pre volta
15:51 fdobridge: <k​arolherbst🐧🦀> or something
15:52 fdobridge: <m​henning> Oh, you mean insert both and discard the one you don't need?
15:52 fdobridge: <k​arolherbst🐧🦀> yeah
15:52 fdobridge: <k​arolherbst🐧🦀> that's how I kinda did it in codegen
15:52 fdobridge: <k​arolherbst🐧🦀> `popon`/`popoff` always emits a barrier and I just handle it arch specific later
15:53 fdobridge: <k​arolherbst🐧🦀> ehh
15:53 fdobridge: <k​arolherbst🐧🦀> `quadon` and `quadpop`?
15:54 fdobridge: <k​arolherbst🐧🦀> though I think I only insert the barrier for gm107+
15:54 fdobridge: <k​arolherbst🐧🦀> was kinda the easiest way to deal with it 😄
16:10 benjaminl: marysaka: do you want me to MR the sm50 stuff I was working on over the weekend to your branch? I know there was some discussion about making a branch on mesa/mesa earlier, but don't remember how that landed
16:20 fdobridge: <m​arysaka> you mean on nouveau/mesa?
16:21 fdobridge: <m​arysaka> but sure you can MR that on my branch 👍
16:32 benjaminl: for commit organization, I know https://docs.mesa3d.org/submittingpatches.html says that commits should be bisectable. In this case, we're implementing a thing that didn't previously work, so I'm assuming that just means "don't have a commit that breaks the build or regresses non-sm50 stuff"?
16:42 fdobridge: <o​rowith2os> @karolherbst got something to fact check with you
16:42 fdobridge: <o​rowith2os> > fun fact: NVK actually uses repr(C) to be able to get slices of side-by-side fields
16:42 fdobridge: <o​rowith2os> > (or *used*; I dunno if that stuck around)
16:42 fdobridge: <o​rowith2os> courtest of Refi
16:43 fdobridge: <k​arolherbst🐧🦀> yeah, that's required
16:43 fdobridge: <k​arolherbst🐧🦀> rust repr can reorder fields
16:43 fdobridge: <k​arolherbst🐧🦀> I actually ran into that issue in rusticl, because I forgot it on one type
16:43 fdobridge: <k​arolherbst🐧🦀> and the offset of the first field became non 0 with a rustc update
16:44 fdobridge: <k​arolherbst🐧🦀> but these days I don't use repr(C) anymore unless for things C code interacts with
16:44 fdobridge: <k​arolherbst🐧🦀> @gfxstrand even talked about it on the NAK XDC talk on why that is
16:45 fdobridge: <k​arolherbst🐧🦀> tldr: being able to declare dests/src in a struct with names without using arrays, but still being able to get slices over all of them
16:46 fdobridge: <k​arolherbst🐧🦀> so e.g.
16:46 fdobridge: <k​arolherbst🐧🦀> ```rust
16:46 fdobridge: <k​arolherbst🐧🦀> #[repr(C)]
16:46 fdobridge: <k​arolherbst🐧🦀> struct R {
16:46 fdobridge: <k​arolherbst🐧🦀> a: Src,
16:46 fdobridge: <k​arolherbst🐧🦀> b: Src,
16:46 fdobridge: <k​arolherbst🐧🦀> }
16:46 fdobridge: <k​arolherbst🐧🦀> ```
16:46 fdobridge: <k​arolherbst🐧🦀> and then some proc macro magic adds a `R::srcs_as_slice` method which returns `&[Src]`
16:46 fdobridge: <k​arolherbst🐧🦀> names might be different
16:59 benjaminl: https://gitlab.freedesktop.org/benjaminl/envyfuzz also wrote this for REing instruction encodings, possible it would be useful to other people
17:14 fdobridge: <m​arysaka> benjaminl: you might want to set SM53 instead of SM52 for fp16 stuffs
17:15 fdobridge: <m​arysaka> but nice thing!
17:21 benjaminl: thanks!
17:23 benjaminl: good catch with SM53. I forgot that I had changed the sm50 input to map to SM52 in the first place. *probably* the thing that makes the most sense here is to just allow arbitrary sm[57][0-9] on the CLI and parse it internally
17:25 fdobridge: <g​fxstrand> Yup and the `DstsAsSlice` and `SrcsAsSlice` implementation macros assert that it's `repr(C)`
17:26 benjaminl: are sm70 and sm75 mostly the same encoding?
17:27 fdobridge: <k​arolherbst🐧🦀> yes
17:27 fdobridge: <k​arolherbst🐧🦀> sm75 added uniform regs/preds
17:27 fdobridge: <k​arolherbst🐧🦀> probably the biggest change
19:45 fdobridge: <g​fxstrand> Ugh... `gl_SubgroupId`...
19:47 fdobridge: <k​arolherbst🐧🦀> what about it?
19:47 fdobridge: <g​fxstrand> It requires us to understand the mapping from `gl_InvocationIndex` to subgroups
19:48 fdobridge: <g​fxstrand> Looking at dumps out of the blob, they seem to change the calculation entirely depending on workgroup size. I'm still trying to figure out the details
19:48 fdobridge: <k​arolherbst🐧🦀> huh?
19:48 fdobridge: <g​fxstrand> I really wish it were just a sysval
19:48 fdobridge: <k​arolherbst🐧🦀> there are sys vals for those tho
19:49 fdobridge: <g​fxstrand> for `gl_SubgroupId`? I don't think so
19:49 fdobridge: <k​arolherbst🐧🦀> 0 bits 4:0
19:49 fdobridge: <k​arolherbst🐧🦀> ohh wait, that's thread id withing a subgroup
19:49 fdobridge: <g​fxstrand> Yeah
19:49 fdobridge: <k​arolherbst🐧🦀> warp id is what you want.. right
19:50 fdobridge: <k​arolherbst🐧🦀> Sr3 bits 14:8
19:50 fdobridge: <g​fxstrand> sr3?
19:50 fdobridge: <k​arolherbst🐧🦀> yeah.. system value 3
19:50 fdobridge: <k​arolherbst🐧🦀> the r stands for register
19:50 fdobridge: <g​fxstrand> right
19:51 fdobridge: <k​arolherbst🐧🦀> but not sure how that all works out in detail, because there is some "virtual" thing going on
19:51 fdobridge: <k​arolherbst🐧🦀> so 14:8 is the virtual warp id whatever that means
19:51 fdobridge: <k​arolherbst🐧🦀> fun fact
19:51 fdobridge: <k​arolherbst🐧🦀> SR0 and SR3 both have the lane id at 4:0
19:52 fdobridge: <k​arolherbst🐧🦀> SR3 19:16 is the CTA id
19:52 fdobridge: <k​arolherbst🐧🦀> and then 28:20 has the SM id
19:53 fdobridge: <k​arolherbst🐧🦀> ohh there are two more bits for the CTA id at 30:29
19:53 fdobridge: <k​arolherbst🐧🦀> hope that helps :ferrisUpsideDown:
19:53 fdobridge: <k​arolherbst🐧🦀> there is also like the actual thread id at SR32+
19:53 fdobridge: <k​arolherbst🐧🦀> but those you should already know about
19:54 fdobridge: <k​arolherbst🐧🦀> SR3 just is very nvidia specific stuff
19:57 fdobridge: <g​fxstrand> So, SR32 is TID and SR33..36 appear to be `gl_LocalInvocationId` which isn't quite what I want
19:58 fdobridge: <k​arolherbst🐧🦀> not quite
19:58 fdobridge: <k​arolherbst🐧🦀> SR32 and SR33.35 are the same
19:58 fdobridge: <k​arolherbst🐧🦀> just
19:59 fdobridge: <k​arolherbst🐧🦀> SR32 packs all dimensions
19:59 fdobridge: <g​fxstrand> Right
19:59 fdobridge: <k​arolherbst🐧🦀> SR32.35 is just the thread id within a block
19:59 fdobridge: <k​arolherbst🐧🦀> a.k.a. CTA
19:59 fdobridge: <k​arolherbst🐧🦀> and CTA id is at SR37..39 a.k.a grid
20:00 fdobridge: <k​arolherbst🐧🦀> Warps == subgroups is at SR3
20:00 fdobridge: <k​arolherbst🐧🦀> nvidia even has a system value for the warp size
20:00 fdobridge: <k​arolherbst🐧🦀> it's just always 32
20:00 fdobridge: <g​fxstrand> hehe
20:00 fdobridge: <k​arolherbst🐧🦀> at SR2 5:0
20:00 fdobridge: <k​arolherbst🐧🦀> SR2 15:8 is the number of warps per SM
20:00 fdobridge: <g​fxstrand> What exactly is in SR3?
20:01 fdobridge: <k​arolherbst🐧🦀> a couple of things
20:01 fdobridge: <k​arolherbst🐧🦀> 4:0 subgroup thread id (probably)
20:01 fdobridge: <k​arolherbst🐧🦀> 14:8 subgroup id (warp id)
20:02 fdobridge: <k​arolherbst🐧🦀> 30:29 + 19:16 is the virtual CTA id
20:02 fdobridge: <k​arolherbst🐧🦀> but that's not within the 3D grid and per SM
20:02 fdobridge: <k​arolherbst🐧🦀> so more like the id if each virtual CTA running within the SM
20:02 fdobridge: <k​arolherbst🐧🦀> 28:20 the SM id
20:02 fdobridge: <k​arolherbst🐧🦀> nvidia has an extension for that stuff...
20:02 fdobridge: <k​arolherbst🐧🦀> let me find it
20:03 fdobridge: <k​arolherbst🐧🦀> `GL_NV_shader_thread_group`?
20:03 fdobridge: <k​arolherbst🐧🦀> sounds about right..
20:04 fdobridge: <k​arolherbst🐧🦀> there had this cool demo where you can see on which SM/warp each pixel gets rendered on
20:05 fdobridge: <g​fxstrand> So warp ID is relative to the current workgroup, not a physical warpID within the hardware, right?
20:05 fdobridge: <k​arolherbst🐧🦀> ohh.. there is also `GL_NV_shader_sm_builtins`
20:05 fdobridge: <k​arolherbst🐧🦀> I think so
20:06 fdobridge: <g​fxstrand> Yeah, no.
20:06 fdobridge: <g​fxstrand> If this maps to GL_NV_shader_thread_group, it's physical IDs
20:07 fdobridge: <k​arolherbst🐧🦀> it's per SM as far as I can tell
20:07 fdobridge: <g​fxstrand> Yeah
20:07 fdobridge: <g​fxstrand> That's not `gl_SubgroupId`
20:07 fdobridge: <k​arolherbst🐧🦀> multiply it with the SM id
20:07 fdobridge: <k​arolherbst🐧🦀> ehh
20:07 fdobridge: <k​arolherbst🐧🦀> or something
20:08 fdobridge: <g​fxstrand> No
20:08 fdobridge: <k​arolherbst🐧🦀> you have the number of warps per SM
20:08 fdobridge: <g​fxstrand> `gl_SubgroupId` is a linear subgroup index within the workgroup.
20:08 fdobridge: <k​arolherbst🐧🦀> right
20:08 fdobridge: <k​arolherbst🐧🦀> mhhh
20:08 fdobridge: <g​fxstrand> It's entirely a logical thing, physical SM numbers don't help me
20:10 fdobridge: <k​arolherbst🐧🦀> but for the subgroup_invocation nvidia also uses SR0/SR3 4:0, right?
20:11 fdobridge: <g​fxstrand> Nope
20:11 fdobridge: <k​arolherbst🐧🦀> what are they doing instead then?
20:11 fdobridge: <g​fxstrand> The only sysvals I see are TID.XYZ, CTAID.XYZ, and LANEID
20:12 fdobridge: <k​arolherbst🐧🦀> yeah, LANEID is SR0
20:12 fdobridge: <k​arolherbst🐧🦀> 31:5 are zero
20:12 fdobridge: <k​arolherbst🐧🦀> TID.XYZ is SR33..35
20:12 fdobridge: <g​fxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1166106953110532187/blob-shader.txt?ex=654948ba&is=6536d3ba&hm=141c0b7661fd91b4a07852099919e43d15c40ecc194f48029cf75117771cb1b4&
20:12 fdobridge: <k​arolherbst🐧🦀> CTAID.XYZ is SR37..39
20:13 fdobridge: <k​arolherbst🐧🦀> mhhhh
20:13 fdobridge: <k​arolherbst🐧🦀> what's in the uniform?
20:14 fdobridge: <g​fxstrand> The `STG` at the end writes `uvec4(gl_SubgroupSize, gl_SubgroupInvocationID, gl_NumSubgroups, gl_SubgroupID)`
20:14 fdobridge: <g​fxstrand> No clue!
20:14 fdobridge: <k​arolherbst🐧🦀> okay.. let's see...
20:14 fdobridge: <g​fxstrand> I also have no idea why there's an LEA in there
20:14 fdobridge: <k​arolherbst🐧🦀> could hack the test and only write one of those values
20:15 fdobridge: <g​fxstrand> I guess the `LEA` is a hack to avoid 64-bit arithmetic
20:15 fdobridge: <g​fxstrand> But, yeah, no idea what the magic uniforms are
20:17 fdobridge: <g​fxstrand> I suspect `c[0x0][0x20..0x2c]` is `baseGroupX/Y/Z`
20:18 fdobridge: <k​arolherbst🐧🦀> `LEA R4, P0, R3, c[0x0][0x30], 0x4 ; ` -> `R4 = R3 << 0x4 + c[0x0][0x30]` I think
20:19 fdobridge: <k​arolherbst🐧🦀> yeah.. wouldn't surprise me
20:20 fdobridge: <k​arolherbst🐧🦀> so `gl_SubgroupSize` is just constant 0x20, which totally makes sense
20:20 fdobridge: <k​arolherbst🐧🦀> `gl_SubgroupInvocationID` is `SR_LANEID` also makes sense
20:21 fdobridge: <k​arolherbst🐧🦀> `gl_NumSubgroups` is `0x4`... okay?
20:21 fdobridge: <k​arolherbst🐧🦀> `gl_SubgroupID` is weird math
20:21 fdobridge: <g​fxstrand> Yeah
20:22 fdobridge: <g​fxstrand> We need weird math
20:23 fdobridge: <k​arolherbst🐧🦀> `R3 = R0 * (c[0x0][0x0] * 0x3) + (SR_CTAID.X + c[0x0][0x20]) * 0x3 + SR_TID.X`?
20:24 fdobridge: <k​arolherbst🐧🦀> and whatever R0 is
20:24 fdobridge: <k​arolherbst🐧🦀> mhhh
20:24 fdobridge: <k​arolherbst🐧🦀> that kinda doesn't feel like `gl_SubgroupID`...
20:24 fdobridge: <k​arolherbst🐧🦀> ohhh wait
20:25 fdobridge: <k​arolherbst🐧🦀> `gl_SubgroupID` is across _all_ subgroups over _all_ blocks?
20:25 fdobridge: <k​arolherbst🐧🦀> also all grids?
20:25 fdobridge: <k​arolherbst🐧🦀> if it's across the entire thing, then it makes sense they pull in the block size/id
20:28 fdobridge: <k​arolherbst🐧🦀> ehh.. should be within the block
20:28 fdobridge: <k​arolherbst🐧🦀> weird...
20:28 fdobridge: <k​arolherbst🐧🦀> it makes no sense
21:06 fdobridge: <k​arolherbst🐧🦀> the code is clearly mapping a 3D space to 1D, but uhhh....
21:23 fdobridge: <c​aitcatdev> Okay so atm I am using EGL I want to know how it works internally
21:23 fdobridge: <c​aitcatdev> so I guess it's time
21:24 fdobridge: <c​aitcatdev> to peak at mesa