00:31fdobridge: <mhenning> Oh, I compile with `-D_GLIBCXX_ASSERTIONS=1 -D_FORTIFY_SOURCE=2` . without that you might read out of bounds instead of asserting
03:01fdobridge: <airlied> https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/245 this fixes a bunch of gpu crashes I'm seeting in a cts run
03:36fdobridge: <gfxstrand> Sure. Looks good.
03:39fdobridge: <gfxstrand> Which hardware?
03:41fdobridge: <airlied> ampere, but I seen the same crashes on turing
03:41fdobridge: <airlied> actually maybe I didn't test turing
03:42fdobridge: <airlied> also might be on newer CTS
03:43fdobridge: <airlied> @gfxstrand have you written down anywhere what is needed for 3d depth/stencil?
03:44fdobridge: <gfxstrand> No
03:44fdobridge: <gfxstrand> It's not just 3D depth/stencil, it's 3D rendering in general.
03:45fdobridge: <gfxstrand> Hrm... maybe it is just 3D depth/stencil.
03:45fdobridge: <gfxstrand> Yeah, I'm not sure quite what the problem is there. Worst case, we can just force `tiling.z_log2 = 0` for 3D depth/stencil and treat them as 2D arrays.
03:46fdobridge: <gfxstrand> Let me play with that quick.
03:51fdobridge: <airlied> oh is it just clears might need a loop or something?
04:02fdobridge: <airlied> nvidia use yet another mme macros for ds clears
04:02fdobridge: <airlied> oh no it's using a compute shader
04:03fdobridge: <airlied> okay it seems like they clear an array texture then dma it over each layer of the depth image
04:07fdobridge: <gfxstrand> Okay, I've got a patch
04:07fdobridge: <gfxstrand> If we find an app that REALLY cares about 3D depth/stencil perf, we can deal with that when the time comes.
04:08fdobridge: <gfxstrand> But, like, it's depth/stencil. If you're not binding it as a depth/stencil attachment what are you doing?
04:10fdobridge: <gfxstrand> I pushed it. If it introduces regressions, I'll fix them in the morning.
04:10fdobridge: <airlied> yeah nvidia really don't seem to care, it looks like a complete second surface is allocated
04:10fdobridge: <gfxstrand> Yeah, I just set the array-compatible flag
04:10fdobridge: <gfxstrand> Because meh
04:11fdobridge: <gfxstrand> NIL makes this shit soooo easy. 😄
04:11fdobridge: <gfxstrand> NIL makes this shit soooo easy. 😍 (edited)
04:12fdobridge: <gfxstrand> Crazy. It's been running for almost 5m and nothing has crashed. 🤯
04:14fdobridge: <gfxstrand> IDK why that took me so long to fix. 🙃
04:14fdobridge: <gfxstrand> Okay, I'm going to bed. I'll fix any regressions from my patch in the morning.
05:25fdobridge: <gfxstrand> `Pass: 368653, Fail: 1970, Crash: 14, Skip: 1616585, Flake: 153, Duration: 37:03`
09:29fdobridge: <karolherbst🐧🦀> @gfxstrand your MS patch doesn't seem to change a thing on 2nd gen Kepler at least 🙃 But it's all busted anyway and it smells like something is really broken with indirect handles and MS in general there
09:30fdobridge: <karolherbst🐧🦀> running all tests with `multisample` in the name though
09:31fdobridge: <karolherbst🐧🦀> maybe I should add `resolve` as well
09:58fdobridge: <karolherbst🐧🦀> ahh yeah, that's what I actually wanted to look at before getting distracted by everything else 🙃
09:59fdobridge: <karolherbst🐧🦀> anyway, we can't emit a 0 there
09:59fdobridge: <karolherbst🐧🦀> I think
10:00fdobridge: <karolherbst🐧🦀> anyway yeah.. that was mostly the main source of crashes also on pre turing
15:37fdobridge: <gfxstrand> Updated my CTS branch and there are more fails. 😿
15:37fdobridge: <gfxstrand> ```
15:37fdobridge: <gfxstrand> Pass: 376335, Fail: 4205, Crash: 12, Skip: 1752648, Flake: 170, Duration: 45:22
15:37fdobridge: <gfxstrand> ```
15:37fdobridge: <gfxstrand> Also, 8 more minutes. 😭
18:36fdobridge:<gfxstrand> wonders if there's something wrong with the VS in this test
19:13fdobridge: <gfxstrand> Woo! Dirty tracking bug. That should fix most of the new fails. \o/
19:50fdobridge: <airlied> You might want to pull the clip fix out if my clip enable MR. It fixes a few fails
19:53fdobridge: <gfxstrand> kk
19:54fdobridge: <gfxstrand> `Pass: 378590, Fail: 1990, Crash: 12, Skip: 1752648, Flake: 130, Duration: 38:13`
19:54fdobridge: <gfxstrand> Much better. 😁
20:00fdobridge: <karolherbst🐧🦀> I wonder how well pre Turing now is with that ts stuff fixed
20:03fdobridge: <gfxstrand> I'm doing another Turing run now with the latest MSAA TXQ patch and @airlied's point clip fix.
20:03fdobridge: <gfxstrand> I can attempt a maxwell run once that's done
20:04fdobridge: <karolherbst🐧🦀> is there anything in vulkan which allows a shader to get the sample position? Mostly curious
20:06fdobridge: <gfxstrand> `gl_SamplePos`
20:07fdobridge: <karolherbst🐧🦀> ohh
20:07fdobridge: <karolherbst🐧🦀> mhhh
20:07fdobridge: <karolherbst🐧🦀> sample shading, right...
20:07fdobridge: <gfxstrand> which we lower to `fract(gl_FragCoord)`
20:09fdobridge: <karolherbst🐧🦀> the hardware returns the position in 4.12 format
20:09fdobridge: <karolherbst🐧🦀> packed
20:12fdobridge: <gfxstrand> ?
20:12fdobridge: <karolherbst🐧🦀> two fixed point 4.12 values packed inside a 32 bit register
20:13fdobridge: <gfxstrand> @airlied Is !242 passing everything?
20:13fdobridge: <airlied> Yeah it didn't make things worse, though there is some outstanding failure in the area
20:14fdobridge: <gfxstrand> Oh, sure. We could probably use that instead but codegen was doing something really funky with pushing the whole sample position table as a cbuf and I just gave up and said "screw it" and did something easy.
20:14fdobridge: <karolherbst🐧🦀> fair 😄
20:15fdobridge: <karolherbst🐧🦀> mhhhhhh
20:15fdobridge: <karolherbst🐧🦀> I'm actually wondering why nobody started to use TXQ_SAMPLE_POSITION for this...
20:15fdobridge: <gfxstrand> That's the thing with codegen... Like, I'm sure someone at some time thought they had a good reason for that. Maybe some older hardware requires it? But no one realized it could be done better on recent GPUs and fixed it. So even Turing was still pushing the whole damn table.
20:16fdobridge: <karolherbst🐧🦀> I think it's not a thing pre fermi...
20:16fdobridge: <gfxstrand> That's believable. Seems odd that it'd be in the texture unit but IDK where else you'd put it.
20:17fdobridge: <karolherbst🐧🦀> the fun part is.. the code emitter supports it since fermi for years
20:17fdobridge: <karolherbst🐧🦀> it's all there
20:17fdobridge: <karolherbst🐧🦀> but yeah...
20:17fdobridge: <karolherbst🐧🦀> my motivation on working on codegen more than the bare minimum is kinda low 🙃
20:17fdobridge: <karolherbst🐧🦀> I'd rather just wait for NAK and do things properly there
20:17fdobridge: <gfxstrand> Oh, for sure. No shame.
20:18fdobridge: <karolherbst🐧🦀> though I wouldn't be surprised if NAK stays Volta+ and before that we either do something similar or try to make it work
20:18fdobridge: <gfxstrand> That's just why, apart from instruction encodings, I'm not really pulling much insparation from codegen at all.
20:18fdobridge: <karolherbst🐧🦀> I'll definetly experiment with that
20:20fdobridge: <karolherbst🐧🦀> quite sad that using external crates didn't really make it in meson 1.2 though 😢
20:59fdobridge: <gfxstrand> 😭
21:01fdobridge: <airlied> one other codegen low hanging fruiut might be the clamp stuff that would fix a bunch of tests
21:04fdobridge: <gfxstrand> @karolherbst NVK doesn't like the new MS patch. 😭
21:06fdobridge: <gfxstrand> It's dying in RA of all places. 🙄
21:08fdobridge: <gfxstrand> I'll comment
21:09fdobridge: <karolherbst🐧🦀> uhhhhh
21:09fdobridge: <karolherbst🐧🦀> 🥲
21:12fdobridge: <karolherbst🐧🦀> could be a worse fix, like actually fixing RA
21:18fdobridge: <gfxstrand> 😂
21:31fdobridge: <gfxstrand> @airlied if the new UAPI lands soonish, what kernel version will it hit?
21:34fdobridge: <gfxstrand> If I'm reading things right, it looks like the 6.5 window just closed so we'd be targeting 6.6, right?
21:35fdobridge: <airlied> yes new uapi would be 6.6
21:35fdobridge: <airlied> did you give an official rb for the mesa side MR?
21:39fdobridge: <gfxstrand> No, I was planning to do that the next round of patches. RB both at the same time.
21:40fdobridge: <gfxstrand> Mostly because of the BO flag.
21:41fdobridge: <gfxstrand> But unless something is madly different, it'll be basically automatic.
21:43fdobridge: <gfxstrand> Okay, I've now added a docs page: https://mesa.pages.freedesktop.org/-/mesa/-/jobs/46263246/artifacts/public/drivers/nvk.html
22:02fdobridge: <mohamexiety> I love that `NVK_I_WANT_A_BROKEN_VULKAN_DRIVER` flag. it's just brilliant haha
22:13fdobridge: <gfxstrand> Okay, let's run Maxwell and see if it survives. 😅
22:57fdobridge: <karolherbst🐧🦀> I'm sure it's gotten a lot better now
22:58fdobridge: <karolherbst🐧🦀> I think most of the instability simply comes from channel recovery. If you point me towards another common problem with NVK after your run I can take a look tomorrow
23:10fdobridge: <gfxstrand> Yeah, this run has survived to the halfway point.
23:10fdobridge: <gfxstrand> I'm not gonna place any bets just yet, though. 😂
23:11fdobridge: <karolherbst🐧🦀> 😄
23:11fdobridge: <karolherbst🐧🦀> but is dmesg way more clean with your current run? I think most of the dead channels were caused by that ts thing
23:12fdobridge: <karolherbst🐧🦀> but maybe there is more
23:12fdobridge: <karolherbst🐧🦀> ehh wait
23:12fdobridge: <karolherbst🐧🦀> there was this image thing...
23:12fdobridge: <karolherbst🐧🦀> right...
23:12fdobridge: <karolherbst🐧🦀> uhhh
23:12fdobridge: <karolherbst🐧🦀> the null handle
23:14fdobridge: <karolherbst🐧🦀> and then also that SUQ lowering...
23:15fdobridge: <gfxstrand> Yeah, I just updated !24327
23:15fdobridge: <karolherbst🐧🦀> I think I'd rather make that vulkan only than messing with the nvc0 driver.... but uploading those 8 image views _might_ be simple enough
23:15fdobridge: <gfxstrand> What's the deal with SUQ?
23:15fdobridge: <karolherbst🐧🦀> so the thing is, that in opengl for 1st gen maxwell we upload 8 special image views inside the tic table
23:15fdobridge: <karolherbst🐧🦀> we don't for kepler
23:16fdobridge: <karolherbst🐧🦀> for vulkan it's all fine, because the driver isn't in such a weird hackish state
23:16fdobridge: <karolherbst🐧🦀> but what the SUQ -> TXQ lowering does on maxwell is to use one of those 8 image views on maxwell
23:16fdobridge: <karolherbst🐧🦀> on kepler that lowering code just pulls some data from the driver const buffer instead
23:19fdobridge: <karolherbst🐧🦀> https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/drivers/nouveau/nvc0/nvc0_tex.c#L1284
23:19fdobridge: <karolherbst🐧🦀> I could do this stuff on kepler as well and call it a day
23:19fdobridge: <karolherbst🐧🦀> but the lowering would need to be disabled for fermi anyway
23:20fdobridge: <karolherbst🐧🦀> maybe...
23:20fdobridge: <karolherbst🐧🦀> fermi is weird
23:22fdobridge: <karolherbst🐧🦀> I wouldn't mind adding a flag to `nv50_ir_prog_info` to flip that behavior...