00:29fdobridge: <gfxstrand> @asdqueerfromeu Christmas came early! https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26242
00:35fdobridge: <gfxstrand> That one's getting a full CTS run before merging because it messes with buffer allocation.
00:39fdobridge: <gfxstrand> The other thing this gets us is that we can start building up a pipeline DB and doing comparisons between NAK and codegen.
00:40fdobridge: <gfxstrand> Of course, you have to be careful with shader-db type things as instruction counts usually tell a very incomplete story.
00:40fdobridge: <gfxstrand> But it can be pretty useful for testing for codegen quality regressions or figuring out spots where the compiler can be improved.
03:13fdobridge: <gfxstrand> Okay. Let's see how badly sync2 explodes
05:30fdobridge: <gfxstrand> I should go through and update docs/features.txt tomorrow. I keep forgetting things. :upsie
05:30fdobridge: <gfxstrand> I should go through and update docs/features.txt tomorrow. I keep forgetting things. 🙃 (edited)
06:06fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Because you merged it I'm assuming it passed
06:11fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I wonder how complex is the Overwatch 2 rendering issue though :triangle_nvk:
06:14fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Should I use `NVK_DEBUG=push_sync` for this case? Getting a single frame with RenderDoc seems impossible because the driver crashes
07:48fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I still get the assert with my traces even after disabling pipeline cache patch and wiping the mesa_shader_cache
07:52fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> The GTA fog issue is still present with the latest commit
11:42fdobridge: <dadschoorse> > HACK: spirv: Add a MESA_SPIRV_DUMP_PATH environment variable
11:42fdobridge: <dadschoorse> was that intended to be merged?
15:48fdobridge: <esdrastarsis> Rage 2 on NVK/NAK: `Unsupported intrinsic instruction: reduce`
15:58fdobridge: <gfxstrand> Yup. Still missing a bunch of subgroup ops
15:58fdobridge: <gfxstrand> I was thinking of trying to knock out the rest of subgroups today
18:00fdobridge: <gfxstrand> Gotta get the blob going again. I have no idea how we're supposed to avoid data in invalid channels for scan/reduce.
19:06fdobridge: <gfxstrand> Yeah, I have no clue how this is supposed to work. 🙃
19:11fdobridge: <gfxstrand> Oh, I see what they're doing and... ouch. 😭
19:16fdobridge: <gfxstrand> NVIDIA has one subgroup op... shuffle.
19:16fdobridge: <gfxstrand> 🤦🏻♀️
19:16fdobridge: <gfxstrand> And the things they have to do to get scan/reduce... Well, "sins" would probably be a reasonable description.
19:18fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> So can NVIDIA GPUs can now be considered "meme hardware"? :nouveau:
19:18fdobridge: <airlied> Ouch
19:19fdobridge: <airlied> Can it be lowered in nir or does it need hw specific bits?
19:23fdobridge: <gfxstrand> I'm going to lower in NIR
19:23fdobridge: <gfxstrand> I think
19:49fdobridge: <dadschoorse> isn't their shuffle quite powerful at least?
19:50fdobridge: <dadschoorse> iirc it also has a predicate dst to tell if you fetched something inactive/out of bounds, that's very useful for reduction/scans
19:52fdobridge: <gfxstrand> It's less useful than it looks. 😭
19:53fdobridge: <gfxstrand> I mean, it is useful but it's not OMG useful
19:56fdobridge: <gfxstrand> And modeling that predicate dest in NIR is going to be really annoying.
19:56fdobridge: <gfxstrand> It can be done bug ugh
20:02fdobridge: <dadschoorse> Doesn't it mean a scan is just this?
20:02fdobridge: <dadschoorse> ```
20:02fdobridge: <dadschoorse> sum = identity; // or input for inclusive scan
20:02fdobridge: <dadschoorse> for (uint i = 1; i < 32; i *= 2)
20:02fdobridge: <dadschoorse> valid, value = shuffleUp(input, i);
20:02fdobridge: <dadschoorse> sum = valid ? (sum op value) : valid; // iirc nv can do this in one predicated instruction
20:02fdobridge: <dadschoorse> ```
20:02fdobridge: <dadschoorse> Doesn't it mean a scan is just this?
20:02fdobridge: <dadschoorse> ```
20:02fdobridge: <dadschoorse> sum = identity; // or input for inclusive scan
20:02fdobridge: <dadschoorse> for (uint i = 1; i < 32; i *= 2)
20:02fdobridge: <dadschoorse> valid, value = shuffleUp(input, i);
20:02fdobridge: <dadschoorse> sum = valid ? (sum op value) : value; // iirc nv can do this in one predicated instruction
20:03fdobridge: <dadschoorse> ``` (edited)
20:04fdobridge: <dadschoorse> or wait, that's probably not enough if not all simd lanes are active
20:05fdobridge: <dadschoorse> amd just activates all lanes as the first step of a reduction/scan, can nv do that?
20:19fdobridge: <gfxstrand> Yeah, we can't do that
20:19fdobridge: <gfxstrand> Intel can and it's easy there too
20:20fdobridge: <gfxstrand> NV emits
20:20fdobridge: <gfxstrand> ```c++
20:20fdobridge: <gfxstrand> if (ballot(true) == -1) {
20:20fdobridge: <gfxstrand> /* that */
20:20fdobridge: <gfxstrand> } else {
20:20fdobridge: <gfxstrand> /* insanity */
20:20fdobridge: <gfxstrand> }
20:20fdobridge: <gfxstrand> ```
20:20fdobridge: <gfxstrand> It's the else bit that I'm trying to figure out. 😅
20:24fdobridge: <dadschoorse> Doesn't it mean a scan is just this?
20:24fdobridge: <dadschoorse> ```
20:24fdobridge: <dadschoorse> sum = identity; // or input for inclusive scan
20:24fdobridge: <dadschoorse> for (uint i = 1; i < 32; i *= 2)
20:24fdobridge: <dadschoorse> valid, value = shuffleUp(input, i);
20:24fdobridge: <dadschoorse> sum = valid ? (sum op value) : sum; // iirc nv can do this in one predicated instruction
20:24fdobridge: <dadschoorse> ``` (edited)
21:52fdobridge: <esdrastarsis> Second Extinction on NVK/NAK: `../mesa/src/nouveau/vulkan/nvk_cmd_draw.c:630: nvk_CmdBeginRendering: Assertion 'level->tiling.is_tiled' failed.`
23:35fdobridge: <gfxstrand> Yeah, no rendering to linear. How on earth you're hitting that, I don't know.
23:39fdobridge: <gfxstrand> ```
23:39fdobridge: <gfxstrand> Test run totals:
23:39fdobridge: <gfxstrand> Passed: 3491/37170 (9.4%)
23:39fdobridge: <gfxstrand> Failed: 1/37170 (0.0%)
23:39fdobridge: <gfxstrand> Not supported: 33678/37170 (90.6%)
23:39fdobridge: <gfxstrand> Warnings: 0/37170 (0.0%)
23:39fdobridge: <gfxstrand> Waived: 0/37170 (0.0%)
23:39fdobridge: <gfxstrand> ```
23:39fdobridge: <gfxstrand> Damn! Where's my fail...
23:39fdobridge: <gfxstrand> Oh, it's a know tessellation barrier problem.
23:39fdobridge: <gfxstrand> Okay then
23:41fdobridge: <gfxstrand> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26264
23:41fdobridge: <gfxstrand> scan/reduce are pretty horrible but I got them working in the end.
23:41fdobridge: <gfxstrand> And what I'm generating is pretty close to NVIDIA so I'm fairly confident that the insanity is warranted.
23:41fdobridge: <gfxstrand> More usage of predicates would make it very slightly more efficient but meh.