00:29 fdobridge: <g​fxstrand> @asdqueerfromeu Christmas came early! https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26242
00:35 fdobridge: <g​fxstrand> That one's getting a full CTS run before merging because it messes with buffer allocation.
00:39 fdobridge: <g​fxstrand> The other thing this gets us is that we can start building up a pipeline DB and doing comparisons between NAK and codegen.
00:40 fdobridge: <g​fxstrand> Of course, you have to be careful with shader-db type things as instruction counts usually tell a very incomplete story.
00:40 fdobridge: <g​fxstrand> But it can be pretty useful for testing for codegen quality regressions or figuring out spots where the compiler can be improved.
03:13 fdobridge: <g​fxstrand> Okay. Let's see how badly sync2 explodes
05:30 fdobridge: <g​fxstrand> I should go through and update docs/features.txt tomorrow. I keep forgetting things. :upsie
05:30 fdobridge: <g​fxstrand> I should go through and update docs/features.txt tomorrow. I keep forgetting things. 🙃 (edited)
06:06 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Because you merged it I'm assuming it passed
06:11 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> I wonder how complex is the Overwatch 2 rendering issue though :triangle_nvk:
06:14 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> Should I use `NVK_DEBUG=push_sync` for this case? Getting a single frame with RenderDoc seems impossible because the driver crashes
07:48 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> I still get the assert with my traces even after disabling pipeline cache patch and wiping the mesa_shader_cache
07:52 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> The GTA fog issue is still present with the latest commit
11:42 fdobridge: <d​adschoorse> > HACK: spirv: Add a MESA_SPIRV_DUMP_PATH environment variable
11:42 fdobridge: <d​adschoorse> was that intended to be merged?
15:48 fdobridge: <e​sdrastarsis> Rage 2 on NVK/NAK: `Unsupported intrinsic instruction: reduce`
15:58 fdobridge: <g​fxstrand> Yup. Still missing a bunch of subgroup ops
15:58 fdobridge: <g​fxstrand> I was thinking of trying to knock out the rest of subgroups today
18:00 fdobridge: <g​fxstrand> Gotta get the blob going again. I have no idea how we're supposed to avoid data in invalid channels for scan/reduce.
19:06 fdobridge: <g​fxstrand> Yeah, I have no clue how this is supposed to work. 🙃
19:11 fdobridge: <g​fxstrand> Oh, I see what they're doing and... ouch. 😭
19:16 fdobridge: <g​fxstrand> NVIDIA has one subgroup op... shuffle.
19:16 fdobridge: <g​fxstrand> 🤦🏻‍♀️
19:16 fdobridge: <g​fxstrand> And the things they have to do to get scan/reduce... Well, "sins" would probably be a reasonable description.
19:18 fdobridge: <!​[NVK Whacker] Echo (she) 🇱🇹> So can NVIDIA GPUs can now be considered "meme hardware"? :nouveau:
19:18 fdobridge: <a​irlied> Ouch
19:19 fdobridge: <a​irlied> Can it be lowered in nir or does it need hw specific bits?
19:23 fdobridge: <g​fxstrand> I'm going to lower in NIR
19:23 fdobridge: <g​fxstrand> I think
19:49 fdobridge: <d​adschoorse> isn't their shuffle quite powerful at least?
19:50 fdobridge: <d​adschoorse> iirc it also has a predicate dst to tell if you fetched something inactive/out of bounds, that's very useful for reduction/scans
19:52 fdobridge: <g​fxstrand> It's less useful than it looks. 😭
19:53 fdobridge: <g​fxstrand> I mean, it is useful but it's not OMG useful
19:56 fdobridge: <g​fxstrand> And modeling that predicate dest in NIR is going to be really annoying.
19:56 fdobridge: <g​fxstrand> It can be done bug ugh
20:02 fdobridge: <d​adschoorse> Doesn't it mean a scan is just this?
20:02 fdobridge: <d​adschoorse> ```
20:02 fdobridge: <d​adschoorse> sum = identity; // or input for inclusive scan
20:02 fdobridge: <d​adschoorse> for (uint i = 1; i < 32; i *= 2)
20:02 fdobridge: <d​adschoorse> valid, value = shuffleUp(input, i);
20:02 fdobridge: <d​adschoorse> sum = valid ? (sum op value) : valid; // iirc nv can do this in one predicated instruction
20:02 fdobridge: <d​adschoorse> ```
20:02 fdobridge: <d​adschoorse> Doesn't it mean a scan is just this?
20:02 fdobridge: <d​adschoorse> ```
20:02 fdobridge: <d​adschoorse> sum = identity; // or input for inclusive scan
20:02 fdobridge: <d​adschoorse> for (uint i = 1; i < 32; i *= 2)
20:02 fdobridge: <d​adschoorse> valid, value = shuffleUp(input, i);
20:02 fdobridge: <d​adschoorse> sum = valid ? (sum op value) : value; // iirc nv can do this in one predicated instruction
20:03 fdobridge: <d​adschoorse> ``` (edited)
20:04 fdobridge: <d​adschoorse> or wait, that's probably not enough if not all simd lanes are active
20:05 fdobridge: <d​adschoorse> amd just activates all lanes as the first step of a reduction/scan, can nv do that?
20:19 fdobridge: <g​fxstrand> Yeah, we can't do that
20:19 fdobridge: <g​fxstrand> Intel can and it's easy there too
20:20 fdobridge: <g​fxstrand> NV emits
20:20 fdobridge: <g​fxstrand> ```c++
20:20 fdobridge: <g​fxstrand> if (ballot(true) == -1) {
20:20 fdobridge: <g​fxstrand> /* that */
20:20 fdobridge: <g​fxstrand> } else {
20:20 fdobridge: <g​fxstrand> /* insanity */
20:20 fdobridge: <g​fxstrand> }
20:20 fdobridge: <g​fxstrand> ```
20:20 fdobridge: <g​fxstrand> It's the else bit that I'm trying to figure out. 😅
20:24 fdobridge: <d​adschoorse> Doesn't it mean a scan is just this?
20:24 fdobridge: <d​adschoorse> ```
20:24 fdobridge: <d​adschoorse> sum = identity; // or input for inclusive scan
20:24 fdobridge: <d​adschoorse> for (uint i = 1; i < 32; i *= 2)
20:24 fdobridge: <d​adschoorse> valid, value = shuffleUp(input, i);
20:24 fdobridge: <d​adschoorse> sum = valid ? (sum op value) : sum; // iirc nv can do this in one predicated instruction
20:24 fdobridge: <d​adschoorse> ``` (edited)
21:52 fdobridge: <e​sdrastarsis> Second Extinction on NVK/NAK: `../mesa/src/nouveau/vulkan/nvk_cmd_draw.c:630: nvk_CmdBeginRendering: Assertion 'level->tiling.is_tiled' failed.`
23:35 fdobridge: <g​fxstrand> Yeah, no rendering to linear. How on earth you're hitting that, I don't know.
23:39 fdobridge: <g​fxstrand> ```
23:39 fdobridge: <g​fxstrand> Test run totals:
23:39 fdobridge: <g​fxstrand> Passed: 3491/37170 (9.4%)
23:39 fdobridge: <g​fxstrand> Failed: 1/37170 (0.0%)
23:39 fdobridge: <g​fxstrand> Not supported: 33678/37170 (90.6%)
23:39 fdobridge: <g​fxstrand> Warnings: 0/37170 (0.0%)
23:39 fdobridge: <g​fxstrand> Waived: 0/37170 (0.0%)
23:39 fdobridge: <g​fxstrand> ```
23:39 fdobridge: <g​fxstrand> Damn! Where's my fail...
23:39 fdobridge: <g​fxstrand> Oh, it's a know tessellation barrier problem.
23:39 fdobridge: <g​fxstrand> Okay then
23:41 fdobridge: <g​fxstrand> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26264
23:41 fdobridge: <g​fxstrand> scan/reduce are pretty horrible but I got them working in the end.
23:41 fdobridge: <g​fxstrand> And what I'm generating is pretty close to NVIDIA so I'm fairly confident that the insanity is warranted.
23:41 fdobridge: <g​fxstrand> More usage of predicates would make it very slightly more efficient but meh.