00:01fdobridge: <gfxstrand> There's no way to not specify a sampler/header
00:01fdobridge: <gfxstrand> So I just pass zero there. Seems to work.
01:03fdobridge: <karolherbst🐧🦀> weird
01:04fdobridge: <karolherbst🐧🦀> well.. in theory in non bindless mode you just use the fields in the encoding
01:04fdobridge: <karolherbst🐧🦀> but if you are in bindless mode, you have to pass along the sampler/header pointer.. but the bound stuff is wonky?
01:04fdobridge: <gfxstrand> Oh, I think I'm always in bindless mode. 😅
01:04fdobridge: <karolherbst🐧🦀> ahh yeah..
01:06fdobridge: <gfxstrand> I should maybe implement non-bindless mode one of these days. 🤔 Meh.
01:06fdobridge: <karolherbst🐧🦀> I actually have no idea what kind of difference in makes, well.. you don't have to store it inside regs I guess
01:06fdobridge: <karolherbst🐧🦀> and don't have to load it
01:06fdobridge: <karolherbst🐧🦀> for the times 1% more perf is important
01:07fdobridge: <karolherbst🐧🦀> it's kinda funky that the hardware knows two bindless forms
01:07fdobridge: <karolherbst🐧🦀> ehh 3 actually
01:08fdobridge: <karolherbst🐧🦀> you can either pull the sampler, the texture or both from the handle
01:09fdobridge: <karolherbst🐧🦀> ehh or there are two ways of encoding the texture header or something
01:09fdobridge: <karolherbst🐧🦀> yeah..
01:09fdobridge: <gfxstrand> 🤷🏻♀️
01:10fdobridge: <karolherbst🐧🦀> apparently one version makes more sense for GL the other for D3D
01:10fdobridge: <gfxstrand> I've noticed the blob does bind textures sometimes so I assume there's a perf benefit somewhere.
01:10fdobridge: <karolherbst🐧🦀> yeah.. you don't need to waste a reg
01:10fdobridge: <karolherbst🐧🦀> which means you might be able to run more threads
01:11fdobridge: <karolherbst🐧🦀> and the tex ops loads the handle itself, which I think is also a tad quicker
01:11fdobridge: <karolherbst🐧🦀> but anyway, I expect the RA benefits to be the most significant here
01:11fdobridge: <gfxstrand> 🤷🏻♀️
01:13fdobridge: <gfxstrand> It also avoids fetching from the descriptor set or at least moves the fetch to something on the CPU or MME and happens once per draw or pipeline bind instead of per pixel.
01:14fdobridge: <karolherbst🐧🦀> I'm sure it's somewhat worth it
01:16fdobridge: <gfxstrand> Yeah, probably. But also probably not the first thing I need to care about
01:16fdobridge: <gfxstrand> So meh
03:23fdobridge: <gfxstrand> `Pass: 405125, Fail: 137, Crash: 133, Skip: 3195085, Timeout: 2, Flake: 401, Duration: 1:47:57`
04:15fdobridge: <gfxstrand> I think I'm going to have to write some more targetted tests to figure out how MSAA interpolation works on this hardware but that's okay.
04:15fdobridge: <gfxstrand> I'm starting to fix tests in NAK that are currently busted in codegen.
04:15fdobridge: <gfxstrand> So the copy+paste method no longer works. 😅
04:20fdobridge: <gfxstrand> NVIDIA interpolation is so weird...
04:21fdobridge: <gfxstrand> There are these seemingly independent flags but only a very limited set of combinations is actually allowed. 🙃
04:21fdobridge: <gfxstrand> And I'm still working through what they do
04:21fdobridge: <gfxstrand> And then there's an API bit that says whether centroid is per-pass or not and IDK what to do with that, either.
05:43fdobridge: <esdrastarsis> oh, there are no WIP commits on the GSP branch, Ben fixed everything :happy_gears:
07:40fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Where's my hwmon though? :nope_gears:
08:47fdobridge: <karolherbst🐧🦀> yeah.... I just made it kinda worked in codegen, because it's so special. It's a big hack in codegen 🙃
13:29fdobridge: <gfxstrand> Sleep, or maybe just relaxed brain time I think answered that question.
15:46fdobridge: <gfxstrand> `Pass: 405206, Fail: 103, Crash: 77, Skip: 3195084, Timeout: 2, Flake: 411, Duration: 1:47:17`
15:47fdobridge: <gfxstrand> It's just very different from Intel. Like, that's fine I think, it's just not at all the same.
15:51fdobridge: <gfxstrand> That might be the best NVK CTS run ever. With just CS, it has a few less crashes but more fails. 🥳
15:53fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> EXCEPTION_LOW_CRASH_COUNT (NOT_ENOUGH_NVK_USE_NAK)
15:54fdobridge: <gfxstrand> I should probably push the FS stuff to nak/main soon
15:56fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I mean you should try enabling NAK for VS too
16:02fdobridge: <gfxstrand> Yeah, we'll get there.
16:03fdobridge: <gfxstrand> I need to try and merge Mary's SPH stuff today. VS/GS should be the easy stages.
16:03fdobridge: <gfxstrand> The big pain there will be XFB
16:21fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> That is one of @ zmike's biggest headaches 🤕
16:21fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Why is VS easier than FS (or PS for 🪟)?
16:22fdobridge: <gfxstrand> Yeah, the problem there is that the way codegen maps it is horrible. The hardware is reasonable from what I remember. The bits we get out of NIR are okay. Codegen then does it's thing of trying to TGSI-ify and then un-TGSI-ify it all.
16:23fdobridge: <gfxstrand> I/O is stupid easy there
16:24fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Is TGSI the original shader IR in :gears:?
16:24fdobridge: <gfxstrand> FS/PS, on the other hand, have interpolation and all these annoying things like helper pixels, discard, multisampling headaches, etc.
16:25fdobridge: <gfxstrand> No. It's the original gallium thing. Several IR's pre-date it but it's the really annoying one all the drivers had in the middle for a while.
16:25fdobridge: <gfxstrand> It's mostly gone from Mesa at this point. It's pretty much only used by virgl and old-school swrast.
16:26fdobridge: <gfxstrand> But there are a lot of remnants of the history in codegen
16:26fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Does swrast still exist in Mesa?
16:26fdobridge: <gfxstrand> yeah
16:28fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> It's weird that virgl still uses TGSI though
16:38fdobridge: <gfxstrand> It passes TGSI across the VM boundary. 😬
16:38fdobridge: <gfxstrand> It's awesome like that
16:39fdobridge: <gfxstrand> Or maybe it doesn't? I thought it did but that seems kinda dumb...
16:41fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Why can't it pass NIR? :nope_gears:
16:44fdobridge: <gfxstrand> NIR isn't nearly as stable so having two components on different sides of the boundary gets tricky fast.
16:44fdobridge: <gfxstrand> IDK that passing TGSI is a *good* idea but it's a much better idea than passing NIR
16:46fdobridge: <gfxstrand> @karolherbst Any idea what we're supposed to do with `gl_SampleMask` and fractional shading rates? As far as I can tell `PIXLD.COVMASK` gives you the coverage of the entire pixel, not the coverage of the current pass
16:46fdobridge: <gfxstrand> Maybe `InnerCoverage`?
16:51fdobridge: <gfxstrand> No, that's a different feature.
16:55fdobridge: <gfxstrand> Okay, all that's left is really stupid MSAA stuff. I think it's time for me to pull in SPH. Lunch first, though.
16:56fdobridge: <gfxstrand> And I know what both stupid MSAA things are, just not how to solve them yet.
16:56fdobridge: <gfxstrand> I need to poke at the blob for that.
16:56fdobridge: <gfxstrand> Woo blob
16:57fdobridge: <gfxstrand> Fortunately(ish), ESO actually gives us state here so I can stuff stuff in a shader key if I need.
17:06fdobridge: <karolherbst🐧🦀> there is a system value for that
17:07fdobridge: <karolherbst🐧🦀> at 132
17:07fdobridge: <karolherbst🐧🦀> it has three values: 0:7 passcount 15:8 pixels on X 23:16 pixels on Y
17:08fdobridge: <karolherbst🐧🦀> I think that's what you need?
17:11fdobridge: <gfxstrand> Yeah, probably
17:18fdobridge: <karolherbst🐧🦀> I don't see anything else, so I guess that must be it. Also looks like it's newish
18:57fdobridge: <gfxstrand> I'll monkey about with it later. Passcount is definitely useful. IDK if the others give me quite enough to re-construct an accurate mask, though.
18:59fdobridge: <karolherbst🐧🦀> it's per thread in case that helps
19:00fdobridge: <karolherbst🐧🦀> but yeah...
20:28fdobridge: <karolherbst🐧🦀> https://patchwork.freedesktop.org/series/123876/ :ferrisBongo:
20:33fdobridge: <gfxstrand> :ferris_happy:
20:36fdobridge: <marysaka> Really when I thought about poking GSP stuffs a bit on my laptop tonight, the timing is perfect 😄
20:49fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Hopefully it has hwmon support
20:53fdobridge: <karolherbst🐧🦀> not yet, but it can be added.. probably
20:59fdobridge: <mohamexiety> wait so this is native GSP support in new kernels?
20:59fdobridge: <mohamexiety> no need to compile custom kernels and all that anymore? (when this passes)
20:59fdobridge: <karolherbst🐧🦀> yeah
20:59fdobridge: <karolherbst🐧🦀> that's the idea
20:59fdobridge: <mohamexiety> nice!
21:00fdobridge: <mohamexiety> can't wait to finally yeet off the 1030
21:00fdobridge: <mohamexiety> (turns out 2 SM Pascal without reclocking can barely drive a 1440p desktop :blobcatnotlikethis:)
21:01fdobridge: <karolherbst🐧🦀> 😄
21:01fdobridge: <karolherbst🐧🦀> yeah, not surprised by that
21:11fdobridge: <airlied> patches welcome, I don't think hwmon is high on anybody's priority list
21:14fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> Imagine running a GPU at full speed without any stats monitoring :blobcatnotlikethis:
21:15fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> I wonder what register/command we can poke for getting the GPU temperature with GSP
21:16fdobridge: <karolherbst🐧🦀> the same ones probably
21:16fdobridge: <karolherbst🐧🦀> it's really just reading out some registers
22:29fdobridge: <airlied> yeah not sure if we can just read out regs on gsp for therms, I expect you might need to ask GSP, or setup something but who knows, tracing nvidia-smi might be an option