03:55 karolherbst[d]: I found the frigging URGP SuSt encoding...
03:55 karolherbst[d]: `0xf9b` and `0xf9d`
03:55 karolherbst[d]: `SUST.P.1D_BUFFER.STRONG.GPU.IGN [R0], R4, UR0, 0x0 ; /* 0x2000000400007f9b */`
03:55 karolherbst[d]: `SUST.D.BA.1D_BUFFER.INVALID7.STRONG.GPU.IGN [R0], R4, UR0, 0x0 ; /* 0x2000000400007f9d */`
03:56 karolherbst[d]: also found the three immediate encoding
03:56 karolherbst[d]: 79b and 79d
03:56 karolherbst[d]: `SUST.P.1D_BUFFER.STRONG.GPU.IGN [R0], R4, 0x0, 0x0, 0x0 ; /* 0x200000040000779b */`
03:57 karolherbst[d]: no idea how that one is useful for vulkan tho
03:58 karolherbst[d]: ohh wait.. that's hybrid indirect thing
03:58 karolherbst[d]: you specify a base + an offset into the group
03:58 karolherbst[d]: where the base is just << 2 really
03:59 karolherbst[d]: anyway.. I'll wire it up and then sust/suld can take handles inside UGPRs
04:03 karolherbst[d]: the nice thing is the UGPR form can also take a constant offset that is applied on top
04:03 karolherbst[d]: not sure we'll need it tho
04:14 karolherbst[d]: also why is the range of uregs enforced to be 8 bits even though it's actually 6...
04:14 karolherbst[d]: there is an actual overlap in the sust encoding
09:16 asdqueerfromeu[d]: airlied[d]: That's probably `NV_PFAULT_FAULT_TYPE_UNSUPPORTED_APERTURE` (according to the OGK source code)
14:34 karolherbst[d]: also what's up with the 0x3a0 suatom encoding? It doesn't seem to exist somehow
14:36 karolherbst[d]: ohh that's SURED...
15:39 karolherbst[d]: okay.. so `ATOM.CAS` doesn't have the UGPR form.. fun
16:34 karolherbst[d]: atomics also have an overlap with the UGPR source. They moved the 32/64 bit selection of the GPR into bit UGPR + 6
18:46 karolherbst[d]: why is `CCTL`... it has a GPR + const_offset and a GPR + UGPR form..
18:47 marysaka[d]: karolherbst[d]: isn't that cache control at the SM level?
18:48 karolherbst[d]: yeah..
18:48 karolherbst[d]: but it's wild it doesn't have a GPR + UGPR + const_offset form
18:48 marysaka[d]: I think this had control for tex cache and other stuffs
18:49 marysaka[d]: actually did they split CCTL after SM50/SM60? looking at cuda binutils docs it has separate instructions for the texture kind it seems
18:49 karolherbst[d]: yeah it has
18:49 karolherbst[d]: CCTLT
18:51 karolherbst[d]: and there is CCTLL for local memory...
18:51 karolherbst[d]: which... I have no idea what to use it for honestly
18:53 karolherbst[d]: ohhh actually...
18:53 karolherbst[d]: I'm sure that's helpful for a debugger
18:55 karolherbst[d]: or if you need to dump the local memory for _something_
18:55 marysaka[d]: I remember seeing some being generated on some massive shaders on some Switch games but not sure what for :P
18:55 karolherbst[d]: maybe left over debugging stuff? 😄
18:56 marysaka[d]: probably not, those are offline generated or from the GL compiler
18:56 karolherbst[d]: though threads can query the address of the local memory buffer
18:56 karolherbst[d]: and access it freely
18:56 marysaka[d]: so there must be reasons around those being present but well
18:57 karolherbst[d]: probably some bespoken optimization
21:25 karolherbst[d]: okay funky.. I have a shader in front of me that went from 693 to 1764 cycles just by ending up with a different order of things triggered through some other optimization...
21:26 karolherbst[d]: it's a single block
21:26 karolherbst[d]: a big one
21:26 karolherbst[d]: and the delays just ended up to be pretty huge
21:28 karolherbst[d]: the fun part is that... the only difference are a couple of mov rz things...
21:32 karolherbst[d]: okay.. there is actually a difference.. ldg uses a UGPR instead of GPR...
21:32 karolherbst[d]: I wonder if that confuses the scheduler...
22:36 karolherbst[d]: anyway.. I think I finished with the "let's use UGPRs more often part" and surprisingly it's the biggest part of the benefit... Seems like UGPR + GPR aren't all that useful _yet_, but I suspect it's going to be better with smarter optimizations and range analysis: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384
22:42 HdkR: Woo more uniform usage
22:51 HdkR: Did the ISA ever gain a reduction op between GPRs and a UGPR? min/max/fmin/fmax/etc?
22:52 HdkR: Always seemed like a good idea to reduce relying on SHFL.
23:07 mhenning[d]: yes, and we have support already
23:09 HdkR: \o/
23:09 HdkR: Hopefully Dolphin's bounding box emulation hits that...somehow