05:59fdobridge: <gfxstrand> I'm going to get IADD3 sorted tomorrow. It looks like that's the only significant regression vs. codegen for compute shaders at this point.
05:59fdobridge: <gfxstrand> Which kinda surprises me, TBH. There's one more for image size of cube maps but that's trivial to fix.
06:00fdobridge: <gfxstrand> I've got half my R/E framework written. I'll finish it up tomorrow and I should be able to get IADD3 figured out in a gross level of detail. Then I need to figure out how to do isub64 better but that should be easy enough.
15:19fdobridge: <gfxstrand> @karolherbst @airlied I'm going to SPDX everything. Should I do Collabora and Red Hat, just Collabora, or actually try to sort it out per-file?
15:19fdobridge: <karolherbst🐧🦀> uhm....
15:20fdobridge: <karolherbst🐧🦀> personally I think listing authors/companies is kinda pointless, but I guess we have to?
15:20fdobridge: <gfxstrand> Yeah...
15:20fdobridge: <gfxstrand> Same
15:20fdobridge: <gfxstrand> It's part of the SPDX header style
15:20fdobridge: <karolherbst🐧🦀> pain
15:21fdobridge: <gfxstrand> I'll do Collabora and Red Hat on everything and spread the blame around
15:21fdobridge: <karolherbst🐧🦀> I assume the correct way is doing it per file, but....
15:21fdobridge: <karolherbst🐧🦀> yeah...
15:21fdobridge: <karolherbst🐧🦀> probably the easiest solution here
15:21fdobridge: <karolherbst🐧🦀> why isn't there a "create SPDX from git log" script? 😄
15:22fdobridge: <gfxstrand> Is it "Red Hat" or "Red Hat Inc"
15:22fdobridge: <gfxstrand> Or IBM? 😂
15:22fdobridge: <karolherbst🐧🦀> I think the latter
15:22fdobridge: <karolherbst🐧🦀> it's kinda a meme already... people blame IBM for all the silly decisions RH is doing 🙃
15:23fdobridge: <karolherbst🐧🦀> we can convert to IBM once I get my IBM email
15:36fdobridge: <gfxstrand> nah
15:46fdobridge: <gfxstrand> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25085
15:46fdobridge: <gfxstrand> I'd like explicit ACKs from @karolherbst and @marysaka on that one if I could because it touches licenses.
15:53fdobridge: <karolherbst🐧🦀> why can't we do it like the kernel though 😦
15:53fdobridge: <karolherbst🐧🦀> though I guess they also list authors there just even more cursed
15:53fdobridge: <karolherbst🐧🦀> pain
18:29tertl8: whats going on o/
21:01fdobridge: <gfxstrand> Okay, I now know what's up with the two overflow bits
21:02fdobridge: <gfxstrand> It's because you can overflow at most twice on an IADD3
21:02fdobridge: <gfxstrand> So if only once the first bit gets set and if twice the 2nd bit also gets set
21:02fdobridge: <gfxstrand> Lets you implement 64-bit IADD3
21:09fdobridge: <karolherbst🐧🦀> no matter the addition?
21:09fdobridge: <karolherbst🐧🦀> guess that also makes kinda sense
21:10fdobridge: <karolherbst🐧🦀> and shouldn't even matter
21:23fdobridge: <gfxstrand> Are there docs for all the sysvals somewhere?
21:23fdobridge: <karolherbst🐧🦀> on my laptop
21:24fdobridge: <karolherbst🐧🦀> It might be easier if I just write down the code for all of them or something 😄 I could also potentially ask nvidia if I can share this part of the docs
21:25fdobridge: <karolherbst🐧🦀> but if you tell me what you need I can probably also just say where those are without problems
21:26fdobridge: <gfxstrand> gl_InvocationIndex
21:26fdobridge: <gfxstrand> the 1D one that contains both global and workgroup
21:28fdobridge: <karolherbst🐧🦀> there is no index in hardware
21:28fdobridge: <karolherbst🐧🦀> ehh
21:28fdobridge: <karolherbst🐧🦀> wait
21:28fdobridge: <karolherbst🐧🦀> yeah.. there is no 1d
21:29fdobridge: <karolherbst🐧🦀> it's all 3d values on nvidia
21:30fdobridge: <karolherbst🐧🦀> ehh wiat.. that's only for compute stuff
21:30fdobridge: <karolherbst🐧🦀> @gfxstrand gl_InvocationIndex is like that GS/tess/whatever stuff?
21:30fdobridge: <gfxstrand> no for compute
21:31fdobridge: <gfxstrand> I thought it was a vec4 with (1D, x, y, z)
21:31fdobridge: <karolherbst🐧🦀> so it's nir_intrinsic_load_global_invocation_id?
21:31fdobridge: <karolherbst🐧🦀> or local
21:32fdobridge: <karolherbst🐧🦀> nvidia doesn't have a global one either, just the local + grid id
21:33fdobridge: <karolherbst🐧🦀> the locals are at 33, 34 and 35, the combined one is at 32
21:33fdobridge: <karolherbst🐧🦀> combined: x: 10:0, y: 25:16, z: 31:26
21:33fdobridge: <karolherbst🐧🦀> the split ones have the same value range
21:34fdobridge: <karolherbst🐧🦀> the grid ID is at 37-39
21:34fdobridge: <karolherbst🐧🦀> 31:0 for X, 15:0 for Y/Z
21:36fdobridge: <karolherbst🐧🦀> with that I mean they start at 0, but have the same length
21:36fdobridge: <karolherbst🐧🦀> they are beneficial in different situations
21:36fdobridge: <karolherbst🐧🦀> the split ones don't need any shifts, the combined one is faster to load
21:36fdobridge: <karolherbst🐧🦀> but they are semantically identical
21:37fdobridge: <karolherbst🐧🦀> block ID I mean
21:37fdobridge: <karolherbst🐧🦀> funny enough.. 36 is reserved, so they might put a combined one there in the future 🙃
21:37fdobridge: <karolherbst🐧🦀> anyway, hope that helps
21:38fdobridge: <karolherbst🐧🦀> ehh.. you might also need the size of a block actually
21:40fdobridge: <karolherbst🐧🦀> there is one of live threads in the current CTA at 40, but it's getting reduced for each warp which exist
21:40fdobridge: <karolherbst🐧🦀> *exits
21:40fdobridge: <karolherbst🐧🦀> one for launched threads pet CTA at 42
21:41fdobridge: <karolherbst🐧🦀> but it has weird semantics
21:42fdobridge: <karolherbst🐧🦀> we lower the block size in gl
21:42fdobridge: <karolherbst🐧🦀> (and pull the info from the driver constbuf)
21:42fdobridge: <gfxstrand> Okay, so there's no quick and easy way to do what I want
21:43fdobridge: <gfxstrand> That's okay
21:43fdobridge: <gfxstrand> I've got something that's working well enough.
21:43fdobridge: <gfxstrand> So.... iadd3 and carries...
21:43fdobridge: <gfxstrand> It looks like source modifiers tweak with them in funny ways
21:44fdobridge: <gfxstrand> `IADD3 -0, 0, 0 -> (0, true, false)`
21:44fdobridge: <gfxstrand> Why? I have no idea...
21:44fdobridge: <karolherbst🐧🦀> funky
21:44fdobridge:<gfxstrand> is glad she wrote unit tests...
21:45fdobridge: <karolherbst🐧🦀> I honestly don't see why those predicates would be not false...
21:46fdobridge: <karolherbst🐧🦀> maybe there is a valid reason for it
21:46fdobridge: <gfxstrand> Well, that's what I get to figure out. 🙃
21:46fdobridge: <karolherbst🐧🦀> I have no idea
21:46fdobridge: <karolherbst🐧🦀> what's the result if you use those carries?
21:46fdobridge: <karolherbst🐧🦀> with only 9
21:46fdobridge: <karolherbst🐧🦀> *0
21:47fdobridge: <karolherbst🐧🦀> I guess you'll have to use ~0 for one
21:47fdobridge: <karolherbst🐧🦀> and I wouldn't be surprised if the result is 0
21:47fdobridge: <gfxstrand> It's not 0
21:47fdobridge: <gfxstrand> carries work exactly like you think they do
21:47fdobridge: <karolherbst🐧🦀> yeah.. but see.. ~0 is 0xffffffff + carry is 0
21:49fdobridge: <gfxstrand> each carry is just like a 1-bit add operand
21:50fdobridge: <karolherbst🐧🦀> yeah
21:50fdobridge: <gfxstrand> There's nothing interesting with them
21:50fdobridge: <karolherbst🐧🦀> 0xffffffff + 1 is 0
21:50fdobridge: <gfxstrand> yup
21:50fdobridge: <gfxstrand> with an overflow
21:50fdobridge: <karolherbst🐧🦀> yeah sure
21:50fdobridge: <karolherbst🐧🦀> but that way it makes sense that -0, 0, 0 returns true/false
21:50fdobridge: <karolherbst🐧🦀> because in the .X side, you get 0xffffffff for one + the carry
21:50fdobridge: <karolherbst🐧🦀> making it 0
21:51fdobridge: <karolherbst🐧🦀> so to me it looks like it makes all perfect sense to return a true carry in the first op
21:54fdobridge: <gfxstrand> `IADD3 0, 0, -(-2) -> (2, false, false)`
21:55fdobridge: <karolherbst🐧🦀> IADD3.X 0, 0, ~0, false, false -> (0xffffffff) I guess?
21:55fdobridge: <gfxstrand> No .x
21:55fdobridge: <gfxstrand> Why are we talking about .x?
21:55fdobridge: <karolherbst🐧🦀> I meant for the upper bits of a 64 bit substract
21:55fdobridge: <karolherbst🐧🦀> or 64 bit op
21:55fdobridge: <karolherbst🐧🦀> in general
21:55fdobridge: <gfxstrand> Yeah, the weird behavior is clearly to make 64-bit subtract work
21:56fdobridge: <gfxstrand> I'm just trying to figure out what the behavior *is*
21:56fdobridge: <karolherbst🐧🦀> yeah..
21:56fdobridge: <karolherbst🐧🦀> ehh wait
21:56fdobridge: <karolherbst🐧🦀> if you got a -2 in the lower half...
21:57fdobridge: <karolherbst🐧🦀> then you have IADD3.x 0, 0, ~0xffffffff in the upper one
21:57fdobridge: <karolherbst🐧🦀> making it 0
21:57fdobridge: <karolherbst🐧🦀> with two false carries
21:57fdobridge: <karolherbst🐧🦀> mhhh
21:57fdobridge: <gfxstrand> I've got an idea
21:58fdobridge: <karolherbst🐧🦀> I suspect it's to deal with values like 0x00000000ffffffff
21:58fdobridge: <karolherbst🐧🦀> and the carry logic needs to be adjusted to not screw that up
21:59fdobridge: <karolherbst🐧🦀> positive values negated give you a true carry?
22:00fdobridge: <karolherbst🐧🦀> mhh
22:00fdobridge: <karolherbst🐧🦀> okay
22:00fdobridge: <karolherbst🐧🦀> so my theory is, if sign extending the value makes it change the signedness there is a true carry
22:01fdobridge: <karolherbst🐧🦀> or something
22:05fdobridge: <gfxstrand> Got it!
22:05fdobridge: <gfxstrand> They implement - in the most obvious stupid way: Flip the bits and add 1
22:05fdobridge: <karolherbst🐧🦀> uhhh
22:05fdobridge: <gfxstrand> Except the add 1 is included in the larger sum and therefore contributes to overflow
22:05fdobridge: <karolherbst🐧🦀> ....
22:05fdobridge: <karolherbst🐧🦀> why does it make sense
22:05fdobridge: <gfxstrand> So `-0 = !0 + 1`
22:05fdobridge: <gfxstrand> Why does it make sense? Because it chains really nicely with the top half.
22:05fdobridge: <karolherbst🐧🦀> yeah...
22:05fdobridge: <gfxstrand> The carries naturally propagate into the upper bits
22:06fdobridge: <karolherbst🐧🦀> cursed
22:06fdobridge: <gfxstrand> Very cursed
22:06fdobridge: <gfxstrand> In a kind-of fantastic way
22:06fdobridge: <karolherbst🐧🦀> yeah.. it makes total sense to do it like this
22:06fdobridge: <karolherbst🐧🦀> and if you overflow the negate you don't run into the risk of needing three carries anyway
22:07fdobridge: <karolherbst🐧🦀> because that only happens with 0, right?
22:08fdobridge: <karolherbst🐧🦀> I wonder if anybody uses that information to optimize away a compare against 0....
22:10fdobridge: <gfxstrand> Yeah and because IADD3 doesn't have input carry predicates, the most you can have is `u32::MAX + u32::MAX + u32::MAX + 1 + 1`
22:10fdobridge: <gfxstrand> Incidentally, this is probably why they only allow at most 2 - modifiers
22:10fdobridge: <gfxstrand> 😂
22:10fdobridge: <karolherbst🐧🦀> 😄
22:10fdobridge: <gfxstrand> This is so cursed
22:10fdobridge: <gfxstrand> I love it
22:10fdobridge: <karolherbst🐧🦀> yeah.. it would totally break with 3
22:11fdobridge: <gfxstrand> Okay, time to update NAK
22:25fdobridge: <gfxstrand> Now I'm back to debating if I want .X to be a different op or a modifier
22:25fdobridge: <gfxstrand> I kinda think a modifier
22:25fdobridge: <gfxstrand> Everything's so thoroughly cursed I don't think there's any point in splitting them anymore
22:26fdobridge: <gfxstrand> The one advantage is that I like having the sources just not exist when they're ignored
22:27fdobridge: <karolherbst🐧🦀> yeah... I don't really like the way we've handled that stuff in codegen, where a subop magically changes the semantics of the instruction in a non obvious way
22:30fdobridge: <karolherbst🐧🦀> I think it's fine to make them different opcodes as the semantics are different enough
22:30fdobridge: <gfxstrand> Hrm... Now I'm thinking we want OpIAdd3 to not have *any* of this nonsense and be the one that copy-prop works on and have OpIAdd3X be the one that's got all the crazy and have a high/low bit
22:30fdobridge: <karolherbst🐧🦀> they are different ops in nir afterall
22:30fdobridge: <gfxstrand> Or, given that integer source modifiers are generally non-propagatable, maybe just don't care and tell NIR to give us isub?
22:32fdobridge: <gfxstrand> I think I like this plan
23:22fdobridge: <gfxstrand> Assuming it's based on Turing or later, I pitty anyone writing a Switch 2 emulator. 😅
23:22fdobridge: <gfxstrand> I mean, you can generate something sane whenever none of the fancy bits are used so there is that.
23:23fdobridge: <gfxstrand> But yeah....
23:24fdobridge: <rhed0x> haven't really been following but I'm curious now, what's the most cursed thing you could do on, say ampere, that would be a nightmare to emulate with Vulkan
23:55fdobridge: <gfxstrand> https://mastodon.gamedev.place/@gfxstrand/111020866052957274
23:55fdobridge: <rhed0x> thanks :happy_gears: