03:55karolherbst[d]: I found the frigging URGP SuSt encoding...
03:55karolherbst[d]: `0xf9b` and `0xf9d`
03:55karolherbst[d]: `SUST.P.1D_BUFFER.STRONG.GPU.IGN [R0], R4, UR0, 0x0 ; /* 0x2000000400007f9b */`
03:55karolherbst[d]: `SUST.D.BA.1D_BUFFER.INVALID7.STRONG.GPU.IGN [R0], R4, UR0, 0x0 ; /* 0x2000000400007f9d */`
03:56karolherbst[d]: also found the three immediate encoding
03:56karolherbst[d]: 79b and 79d
03:56karolherbst[d]: `SUST.P.1D_BUFFER.STRONG.GPU.IGN [R0], R4, 0x0, 0x0, 0x0 ; /* 0x200000040000779b */`
03:57karolherbst[d]: no idea how that one is useful for vulkan tho
03:58karolherbst[d]: ohh wait.. that's hybrid indirect thing
03:58karolherbst[d]: you specify a base + an offset into the group
03:58karolherbst[d]: where the base is just << 2 really
03:59karolherbst[d]: anyway.. I'll wire it up and then sust/suld can take handles inside UGPRs
04:03karolherbst[d]: the nice thing is the UGPR form can also take a constant offset that is applied on top
04:03karolherbst[d]: not sure we'll need it tho
04:14karolherbst[d]: also why is the range of uregs enforced to be 8 bits even though it's actually 6...
04:14karolherbst[d]: there is an actual overlap in the sust encoding
09:16asdqueerfromeu[d]: airlied[d]: That's probably `NV_PFAULT_FAULT_TYPE_UNSUPPORTED_APERTURE` (according to the OGK source code)
14:34karolherbst[d]: also what's up with the 0x3a0 suatom encoding? It doesn't seem to exist somehow
14:36karolherbst[d]: ohh that's SURED...
15:39karolherbst[d]: okay.. so `ATOM.CAS` doesn't have the UGPR form.. fun
16:34karolherbst[d]: atomics also have an overlap with the UGPR source. They moved the 32/64 bit selection of the GPR into bit UGPR + 6
18:46karolherbst[d]: why is `CCTL`... it has a GPR + const_offset and a GPR + UGPR form..
18:47marysaka[d]: karolherbst[d]: isn't that cache control at the SM level?
18:48karolherbst[d]: yeah..
18:48karolherbst[d]: but it's wild it doesn't have a GPR + UGPR + const_offset form
18:48marysaka[d]: I think this had control for tex cache and other stuffs
18:49marysaka[d]: actually did they split CCTL after SM50/SM60? looking at cuda binutils docs it has separate instructions for the texture kind it seems
18:49karolherbst[d]: yeah it has
18:49karolherbst[d]: CCTLT
18:51karolherbst[d]: and there is CCTLL for local memory...
18:51karolherbst[d]: which... I have no idea what to use it for honestly
18:53karolherbst[d]: ohhh actually...
18:53karolherbst[d]: I'm sure that's helpful for a debugger
18:55karolherbst[d]: or if you need to dump the local memory for _something_
18:55marysaka[d]: I remember seeing some being generated on some massive shaders on some Switch games but not sure what for :P
18:55karolherbst[d]: maybe left over debugging stuff? 😄
18:56marysaka[d]: probably not, those are offline generated or from the GL compiler
18:56karolherbst[d]: though threads can query the address of the local memory buffer
18:56karolherbst[d]: and access it freely
18:56marysaka[d]: so there must be reasons around those being present but well
18:57karolherbst[d]: probably some bespoken optimization
21:25karolherbst[d]: okay funky.. I have a shader in front of me that went from 693 to 1764 cycles just by ending up with a different order of things triggered through some other optimization...
21:26karolherbst[d]: it's a single block
21:26karolherbst[d]: a big one
21:26karolherbst[d]: and the delays just ended up to be pretty huge
21:28karolherbst[d]: the fun part is that... the only difference are a couple of mov rz things...
21:32karolherbst[d]: okay.. there is actually a difference.. ldg uses a UGPR instead of GPR...
21:32karolherbst[d]: I wonder if that confuses the scheduler...
22:36karolherbst[d]: anyway.. I think I finished with the "let's use UGPRs more often part" and surprisingly it's the biggest part of the benefit... Seems like UGPR + GPR aren't all that useful _yet_, but I suspect it's going to be better with smarter optimizations and range analysis: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384
22:42HdkR: Woo more uniform usage
22:51HdkR: Did the ISA ever gain a reduction op between GPRs and a UGPR? min/max/fmin/fmax/etc?
22:52HdkR: Always seemed like a good idea to reduce relying on SHFL.
23:07mhenning[d]: yes, and we have support already
23:09HdkR: \o/
23:09HdkR: Hopefully Dolphin's bounding box emulation hits that...somehow