05:59 fdobridge: <g​fxstrand> I'm going to get IADD3 sorted tomorrow. It looks like that's the only significant regression vs. codegen for compute shaders at this point.
05:59 fdobridge: <g​fxstrand> Which kinda surprises me, TBH. There's one more for image size of cube maps but that's trivial to fix.
06:00 fdobridge: <g​fxstrand> I've got half my R/E framework written. I'll finish it up tomorrow and I should be able to get IADD3 figured out in a gross level of detail. Then I need to figure out how to do isub64 better but that should be easy enough.
15:19 fdobridge: <g​fxstrand> @karolherbst @airlied I'm going to SPDX everything. Should I do Collabora and Red Hat, just Collabora, or actually try to sort it out per-file?
15:19 fdobridge: <k​arolherbst🐧🦀> uhm....
15:20 fdobridge: <k​arolherbst🐧🦀> personally I think listing authors/companies is kinda pointless, but I guess we have to?
15:20 fdobridge: <g​fxstrand> Yeah...
15:20 fdobridge: <g​fxstrand> Same
15:20 fdobridge: <g​fxstrand> It's part of the SPDX header style
15:20 fdobridge: <k​arolherbst🐧🦀> pain
15:21 fdobridge: <g​fxstrand> I'll do Collabora and Red Hat on everything and spread the blame around
15:21 fdobridge: <k​arolherbst🐧🦀> I assume the correct way is doing it per file, but....
15:21 fdobridge: <k​arolherbst🐧🦀> yeah...
15:21 fdobridge: <k​arolherbst🐧🦀> probably the easiest solution here
15:21 fdobridge: <k​arolherbst🐧🦀> why isn't there a "create SPDX from git log" script? 😄
15:22 fdobridge: <g​fxstrand> Is it "Red Hat" or "Red Hat Inc"
15:22 fdobridge: <g​fxstrand> Or IBM? 😂
15:22 fdobridge: <k​arolherbst🐧🦀> I think the latter
15:22 fdobridge: <k​arolherbst🐧🦀> it's kinda a meme already... people blame IBM for all the silly decisions RH is doing 🙃
15:23 fdobridge: <k​arolherbst🐧🦀> we can convert to IBM once I get my IBM email
15:36 fdobridge: <g​fxstrand> nah
15:46 fdobridge: <g​fxstrand> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25085
15:46 fdobridge: <g​fxstrand> I'd like explicit ACKs from @karolherbst and @marysaka on that one if I could because it touches licenses.
15:53 fdobridge: <k​arolherbst🐧🦀> why can't we do it like the kernel though 😦
15:53 fdobridge: <k​arolherbst🐧🦀> though I guess they also list authors there just even more cursed
15:53 fdobridge: <k​arolherbst🐧🦀> pain
18:29 tertl8: whats going on o/
21:01 fdobridge: <g​fxstrand> Okay, I now know what's up with the two overflow bits
21:02 fdobridge: <g​fxstrand> It's because you can overflow at most twice on an IADD3
21:02 fdobridge: <g​fxstrand> So if only once the first bit gets set and if twice the 2nd bit also gets set
21:02 fdobridge: <g​fxstrand> Lets you implement 64-bit IADD3
21:09 fdobridge: <k​arolherbst🐧🦀> no matter the addition?
21:09 fdobridge: <k​arolherbst🐧🦀> guess that also makes kinda sense
21:10 fdobridge: <k​arolherbst🐧🦀> and shouldn't even matter
21:23 fdobridge: <g​fxstrand> Are there docs for all the sysvals somewhere?
21:23 fdobridge: <k​arolherbst🐧🦀> on my laptop
21:24 fdobridge: <k​arolherbst🐧🦀> It might be easier if I just write down the code for all of them or something 😄 I could also potentially ask nvidia if I can share this part of the docs
21:25 fdobridge: <k​arolherbst🐧🦀> but if you tell me what you need I can probably also just say where those are without problems
21:26 fdobridge: <g​fxstrand> gl_InvocationIndex
21:26 fdobridge: <g​fxstrand> the 1D one that contains both global and workgroup
21:28 fdobridge: <k​arolherbst🐧🦀> there is no index in hardware
21:28 fdobridge: <k​arolherbst🐧🦀> ehh
21:28 fdobridge: <k​arolherbst🐧🦀> wait
21:28 fdobridge: <k​arolherbst🐧🦀> yeah.. there is no 1d
21:29 fdobridge: <k​arolherbst🐧🦀> it's all 3d values on nvidia
21:30 fdobridge: <k​arolherbst🐧🦀> ehh wiat.. that's only for compute stuff
21:30 fdobridge: <k​arolherbst🐧🦀> @gfxstrand gl_InvocationIndex is like that GS/tess/whatever stuff?
21:30 fdobridge: <g​fxstrand> no for compute
21:31 fdobridge: <g​fxstrand> I thought it was a vec4 with (1D, x, y, z)
21:31 fdobridge: <k​arolherbst🐧🦀> so it's nir_intrinsic_load_global_invocation_id?
21:31 fdobridge: <k​arolherbst🐧🦀> or local
21:32 fdobridge: <k​arolherbst🐧🦀> nvidia doesn't have a global one either, just the local + grid id
21:33 fdobridge: <k​arolherbst🐧🦀> the locals are at 33, 34 and 35, the combined one is at 32
21:33 fdobridge: <k​arolherbst🐧🦀> combined: x: 10:0, y: 25:16, z: 31:26
21:33 fdobridge: <k​arolherbst🐧🦀> the split ones have the same value range
21:34 fdobridge: <k​arolherbst🐧🦀> the grid ID is at 37-39
21:34 fdobridge: <k​arolherbst🐧🦀> 31:0 for X, 15:0 for Y/Z
21:36 fdobridge: <k​arolherbst🐧🦀> with that I mean they start at 0, but have the same length
21:36 fdobridge: <k​arolherbst🐧🦀> they are beneficial in different situations
21:36 fdobridge: <k​arolherbst🐧🦀> the split ones don't need any shifts, the combined one is faster to load
21:36 fdobridge: <k​arolherbst🐧🦀> but they are semantically identical
21:37 fdobridge: <k​arolherbst🐧🦀> block ID I mean
21:37 fdobridge: <k​arolherbst🐧🦀> funny enough.. 36 is reserved, so they might put a combined one there in the future 🙃
21:37 fdobridge: <k​arolherbst🐧🦀> anyway, hope that helps
21:38 fdobridge: <k​arolherbst🐧🦀> ehh.. you might also need the size of a block actually
21:40 fdobridge: <k​arolherbst🐧🦀> there is one of live threads in the current CTA at 40, but it's getting reduced for each warp which exist
21:40 fdobridge: <k​arolherbst🐧🦀> *exits
21:40 fdobridge: <k​arolherbst🐧🦀> one for launched threads pet CTA at 42
21:41 fdobridge: <k​arolherbst🐧🦀> but it has weird semantics
21:42 fdobridge: <k​arolherbst🐧🦀> we lower the block size in gl
21:42 fdobridge: <k​arolherbst🐧🦀> (and pull the info from the driver constbuf)
21:42 fdobridge: <g​fxstrand> Okay, so there's no quick and easy way to do what I want
21:43 fdobridge: <g​fxstrand> That's okay
21:43 fdobridge: <g​fxstrand> I've got something that's working well enough.
21:43 fdobridge: <g​fxstrand> So.... iadd3 and carries...
21:43 fdobridge: <g​fxstrand> It looks like source modifiers tweak with them in funny ways
21:44 fdobridge: <g​fxstrand> `IADD3 -0, 0, 0 -> (0, true, false)`
21:44 fdobridge: <g​fxstrand> Why? I have no idea...
21:44 fdobridge: <k​arolherbst🐧🦀> funky
21:44 fdobridge:<g​fxstrand> is glad she wrote unit tests...
21:45 fdobridge: <k​arolherbst🐧🦀> I honestly don't see why those predicates would be not false...
21:46 fdobridge: <k​arolherbst🐧🦀> maybe there is a valid reason for it
21:46 fdobridge: <g​fxstrand> Well, that's what I get to figure out. 🙃
21:46 fdobridge: <k​arolherbst🐧🦀> I have no idea
21:46 fdobridge: <k​arolherbst🐧🦀> what's the result if you use those carries?
21:46 fdobridge: <k​arolherbst🐧🦀> with only 9
21:46 fdobridge: <k​arolherbst🐧🦀> *0
21:47 fdobridge: <k​arolherbst🐧🦀> I guess you'll have to use ~0 for one
21:47 fdobridge: <k​arolherbst🐧🦀> and I wouldn't be surprised if the result is 0
21:47 fdobridge: <g​fxstrand> It's not 0
21:47 fdobridge: <g​fxstrand> carries work exactly like you think they do
21:47 fdobridge: <k​arolherbst🐧🦀> yeah.. but see.. ~0 is 0xffffffff + carry is 0
21:49 fdobridge: <g​fxstrand> each carry is just like a 1-bit add operand
21:50 fdobridge: <k​arolherbst🐧🦀> yeah
21:50 fdobridge: <g​fxstrand> There's nothing interesting with them
21:50 fdobridge: <k​arolherbst🐧🦀> 0xffffffff + 1 is 0
21:50 fdobridge: <g​fxstrand> yup
21:50 fdobridge: <g​fxstrand> with an overflow
21:50 fdobridge: <k​arolherbst🐧🦀> yeah sure
21:50 fdobridge: <k​arolherbst🐧🦀> but that way it makes sense that -0, 0, 0 returns true/false
21:50 fdobridge: <k​arolherbst🐧🦀> because in the .X side, you get 0xffffffff for one + the carry
21:50 fdobridge: <k​arolherbst🐧🦀> making it 0
21:51 fdobridge: <k​arolherbst🐧🦀> so to me it looks like it makes all perfect sense to return a true carry in the first op
21:54 fdobridge: <g​fxstrand> `IADD3 0, 0, -(-2) -> (2, false, false)`
21:55 fdobridge: <k​arolherbst🐧🦀> IADD3.X 0, 0, ~0, false, false -> (0xffffffff) I guess?
21:55 fdobridge: <g​fxstrand> No .x
21:55 fdobridge: <g​fxstrand> Why are we talking about .x?
21:55 fdobridge: <k​arolherbst🐧🦀> I meant for the upper bits of a 64 bit substract
21:55 fdobridge: <k​arolherbst🐧🦀> or 64 bit op
21:55 fdobridge: <k​arolherbst🐧🦀> in general
21:55 fdobridge: <g​fxstrand> Yeah, the weird behavior is clearly to make 64-bit subtract work
21:56 fdobridge: <g​fxstrand> I'm just trying to figure out what the behavior *is*
21:56 fdobridge: <k​arolherbst🐧🦀> yeah..
21:56 fdobridge: <k​arolherbst🐧🦀> ehh wait
21:56 fdobridge: <k​arolherbst🐧🦀> if you got a -2 in the lower half...
21:57 fdobridge: <k​arolherbst🐧🦀> then you have IADD3.x 0, 0, ~0xffffffff in the upper one
21:57 fdobridge: <k​arolherbst🐧🦀> making it 0
21:57 fdobridge: <k​arolherbst🐧🦀> with two false carries
21:57 fdobridge: <k​arolherbst🐧🦀> mhhh
21:57 fdobridge: <g​fxstrand> I've got an idea
21:58 fdobridge: <k​arolherbst🐧🦀> I suspect it's to deal with values like 0x00000000ffffffff
21:58 fdobridge: <k​arolherbst🐧🦀> and the carry logic needs to be adjusted to not screw that up
21:59 fdobridge: <k​arolherbst🐧🦀> positive values negated give you a true carry?
22:00 fdobridge: <k​arolherbst🐧🦀> mhh
22:00 fdobridge: <k​arolherbst🐧🦀> okay
22:00 fdobridge: <k​arolherbst🐧🦀> so my theory is, if sign extending the value makes it change the signedness there is a true carry
22:01 fdobridge: <k​arolherbst🐧🦀> or something
22:05 fdobridge: <g​fxstrand> Got it!
22:05 fdobridge: <g​fxstrand> They implement - in the most obvious stupid way: Flip the bits and add 1
22:05 fdobridge: <k​arolherbst🐧🦀> uhhh
22:05 fdobridge: <g​fxstrand> Except the add 1 is included in the larger sum and therefore contributes to overflow
22:05 fdobridge: <k​arolherbst🐧🦀> ....
22:05 fdobridge: <k​arolherbst🐧🦀> why does it make sense
22:05 fdobridge: <g​fxstrand> So `-0 = !0 + 1`
22:05 fdobridge: <g​fxstrand> Why does it make sense? Because it chains really nicely with the top half.
22:05 fdobridge: <k​arolherbst🐧🦀> yeah...
22:05 fdobridge: <g​fxstrand> The carries naturally propagate into the upper bits
22:06 fdobridge: <k​arolherbst🐧🦀> cursed
22:06 fdobridge: <g​fxstrand> Very cursed
22:06 fdobridge: <g​fxstrand> In a kind-of fantastic way
22:06 fdobridge: <k​arolherbst🐧🦀> yeah.. it makes total sense to do it like this
22:06 fdobridge: <k​arolherbst🐧🦀> and if you overflow the negate you don't run into the risk of needing three carries anyway
22:07 fdobridge: <k​arolherbst🐧🦀> because that only happens with 0, right?
22:08 fdobridge: <k​arolherbst🐧🦀> I wonder if anybody uses that information to optimize away a compare against 0....
22:10 fdobridge: <g​fxstrand> Yeah and because IADD3 doesn't have input carry predicates, the most you can have is `u32::MAX + u32::MAX + u32::MAX + 1 + 1`
22:10 fdobridge: <g​fxstrand> Incidentally, this is probably why they only allow at most 2 - modifiers
22:10 fdobridge: <g​fxstrand> 😂
22:10 fdobridge: <k​arolherbst🐧🦀> 😄
22:10 fdobridge: <g​fxstrand> This is so cursed
22:10 fdobridge: <g​fxstrand> I love it
22:10 fdobridge: <k​arolherbst🐧🦀> yeah.. it would totally break with 3
22:11 fdobridge: <g​fxstrand> Okay, time to update NAK
22:25 fdobridge: <g​fxstrand> Now I'm back to debating if I want .X to be a different op or a modifier
22:25 fdobridge: <g​fxstrand> I kinda think a modifier
22:25 fdobridge: <g​fxstrand> Everything's so thoroughly cursed I don't think there's any point in splitting them anymore
22:26 fdobridge: <g​fxstrand> The one advantage is that I like having the sources just not exist when they're ignored
22:27 fdobridge: <k​arolherbst🐧🦀> yeah... I don't really like the way we've handled that stuff in codegen, where a subop magically changes the semantics of the instruction in a non obvious way
22:30 fdobridge: <k​arolherbst🐧🦀> I think it's fine to make them different opcodes as the semantics are different enough
22:30 fdobridge: <g​fxstrand> Hrm... Now I'm thinking we want OpIAdd3 to not have *any* of this nonsense and be the one that copy-prop works on and have OpIAdd3X be the one that's got all the crazy and have a high/low bit
22:30 fdobridge: <k​arolherbst🐧🦀> they are different ops in nir afterall
22:30 fdobridge: <g​fxstrand> Or, given that integer source modifiers are generally non-propagatable, maybe just don't care and tell NIR to give us isub?
22:32 fdobridge: <g​fxstrand> I think I like this plan
23:22 fdobridge: <g​fxstrand> Assuming it's based on Turing or later, I pitty anyone writing a Switch 2 emulator. 😅
23:22 fdobridge: <g​fxstrand> I mean, you can generate something sane whenever none of the fancy bits are used so there is that.
23:23 fdobridge: <g​fxstrand> But yeah....
23:24 fdobridge: <r​hed0x> haven't really been following but I'm curious now, what's the most cursed thing you could do on, say ampere, that would be a nightmare to emulate with Vulkan
23:55 fdobridge: <g​fxstrand> https://mastodon.gamedev.place/@gfxstrand/111020866052957274
23:55 fdobridge: <r​hed0x> thanks :happy_gears: