00:00fdobridge: <gfxstrand> Is it constant latency?
00:03fdobridge: <karolherbst🐧🦀> it's not 🙂
01:16fdobridge: <gfxstrand> Of course...
01:16fdobridge: <gfxstrand> Making it variable latency doesn't fix my GPU hangs, tough. 🤷🏻♀️
01:50fdobridge: <karolherbst🐧🦀> mhhh
01:50fdobridge: <karolherbst🐧🦀> given how those uniform regs work, I'd be surprised if it's scoreboard related
01:51fdobridge: <karolherbst🐧🦀> or huh
01:51fdobridge: <karolherbst🐧🦀> @gfxstrand what latency are you assuming for U* instructions?
01:52fdobridge: <gfxstrand> The only uniform instruction I'm emitting right now is R2UR
01:52fdobridge: <karolherbst🐧🦀> I see
01:54fdobridge: <karolherbst🐧🦀> are you sure that the register contains the same across all lanes?
01:56fdobridge: <karolherbst🐧🦀> oh uh
01:56fdobridge: <karolherbst🐧🦀> @gfxstrand what kind of GPU hang are you seeing though?
01:56fdobridge: <gfxstrand> A fault
01:57fdobridge: <karolherbst🐧🦀> could be that something needs to be set up before those regs can be used... though... dunno, maybe not
01:57fdobridge: <karolherbst🐧🦀> are you using the result of the R2UR?
01:57fdobridge: <gfxstrand> All active lanes, yes.
01:57fdobridge: <gfxstrand> Yes, I'm using it for a CBuf load
01:57fdobridge: <karolherbst🐧🦀> mhhhh
01:58fdobridge: <gfxstrand> I don't think there's anything in SPH.
01:58fdobridge: <karolherbst🐧🦀> active as in control flow, or instruction predicate?
01:59fdobridge: <gfxstrand> I can plug in my Turing and disable GSP
01:59fdobridge: <gfxstrand> And get actual fault into
01:59fdobridge: <gfxstrand> *info
01:59fdobridge: <gfxstrand> There's no control flow or predication in this shader
01:59fdobridge: <karolherbst🐧🦀> okay
02:00fdobridge: <karolherbst🐧🦀> mind reading the output predicate and only do the load if it's false?
02:01fdobridge: <karolherbst🐧🦀> though that shouldn't matter at all...
02:02fdobridge: <karolherbst🐧🦀> well.. maybe you should double check the encoding or so, but I'm also going to bed now (it's 4 am 🥲 )
02:03fdobridge: <gfxstrand> Yeah, I'm going to look at it more tomorrow
02:03fdobridge: <gfxstrand> It passes the first test and faults on the second.
02:06fdobridge: <gfxstrand> I think if I reboot with Turing + no GSP and get actual fault addresses, I'll be able to figure it out.
02:07fdobridge: <karolherbst🐧🦀> probably something wrong fetching the memory or so
06:54fdobridge: <dadschoorse> I mean, before gfx11.5, amd didn't support floating point operations on the SALU either, but the SALU was still very useful. The other restrictions sound pretty bad
06:54fdobridge: <dadschoorse> I mean, before gfx11.5, amd didn't support floating point operations on the SALU either, but the SALU was still very useful. The other restrictions sound pretty bad tho (edited)
09:11fdobridge: <phomes_> I just got back from travel and my NVK merch package has arrived. The mug is really beautiful 🙂
10:12fdobridge: <magic_rb.> Im gonna get a nvk tshirt when i have some more cash
10:12fdobridge: <magic_rb.> (It takes up less space than a mug)
12:13pandaaaaa: encountered some issues during system hibernation: Here's the log output:nouveau 0000:01:00.0: DRM: failed to idle channel 1 [DRM], 0000:01:00.0: PM: pci_pm_freeze(): nouveau_pmops_freeze+0x0/0x20 [nouveau] returns -1,nouveau 0000:01:00.0: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xc0 returns -16, nouveau 0000:01:00.0: PM: failed to freeze async: error -16;
12:14pandaaaaa: system info:Linux DESKTOP 6.6.13+bpo-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.13-1~bpo12+1 (2024-02-15) x86_64 GNU/Linux
12:56fdobridge: <dadschoorse> does nvidia have unordered comparisons and ordered not equal?
13:16fdobridge: <dadschoorse> ah, I found nak's FloatCmpOp, so that's a yes
13:19fdobridge: <dadschoorse> what do IsNum and IsNan do?
13:20fdobridge: <dadschoorse> do they take one operand or two like amd's v_cmp_u_f32/v_cmp_o_f32
13:20fdobridge: <dadschoorse> do they take one operand or two like amd's v_cmp_u_f32/v_cmp_o_f32? (edited)
13:33fdobridge: <dadschoorse> ah, looks like they do the same things as amd's according to PTX
13:33fdobridge: <dadschoorse> I think I will move some aco opts to nir then, and it will also be useful for nak
14:22fdobridge: <gfxstrand> Yeah, I think we definitely want to use it and probably more than Nvidia. I just need to think about how it all works and come up with a strategy. Honestly, treating them like preambles might not be a bad plan.
14:40cwabbott: aco has a whole pre-isel pass that tries to figure out if uniform NIR ALU ops have sources that can be put in scalar registers "for free" and demotes it if not
14:40cwabbott: and it's a dataflow analysis that runs until fixed point to handle phi nodes in loops
14:41cwabbott: AMD had pretty similar restrictions around not handling float ops in the scalar ALU until recently
14:42cwabbott: in ir3 I took a slightly different approach and had a pass to demote scalar ALU + scalar to vector into vector ALU
14:42cwabbott: and that also demotes phi nodes
14:42cwabbott: qualcomm lets you use the scalar ALU for more things, so it made sense
15:07fdobridge: <gfxstrand> There's a part of me that's tempted to just emit scalar for all `!divergent && in_uniform_cf` and then have a legalization pass in NAK which demotes things as needed.
15:29fdobridge: <dadschoorse> Not being able to use the scalar alu in control flow suck, especially since some games have started to use scalarization loops. Which work well for amd.
15:32fdobridge: <dadschoorse> cwabbott: Can Qualcomm's scalar alu run in parallel to the vector units?
15:32fdobridge: <dadschoorse> Or is it just for saving regs?
15:36cwabbott: it can run in parallel, I think
15:37cwabbott: we have to ignore it when inserting nops for vector instructions
16:24fdobridge: <gfxstrand> It's shifted. My fault address has an extra 0
16:59fdobridge: <karolherbst🐧🦀> :blobcatnotlikethis:
17:00fdobridge: <karolherbst🐧🦀> guess truncated means not encoded then
17:06fdobridge: <gfxstrand> 🤷🏻♀️
18:17fdobridge: <gfxstrand> With that fixed, I'm making progress. Looks like something's going funky with helpers, though.
18:18fdobridge: <gfxstrand> Or at least something isn't 100% deterministic
18:18fdobridge: <gfxstrand> Ugh... it's synchronization
18:21fdobridge: <gfxstrand> Yeah, my dep tracker wasn't taking bindless cbufs into account.
19:17fdobridge: <zmike.> @gfxstrand you probably have seen this: does vkcts have sampler swizzle tests for every swizzle combination of every format?
19:46fdobridge: <gfxstrand> It has a lot but not for the whole matrix
20:39fdobridge: <gfxstrand> I'm honestly a little surprised Larabel hasn't been benchmarking NVK like once a month...
20:44fdobridge: <zmike.> usually he does it after releases
20:48fdobridge: <gfxstrand> Yeah
21:34fdobridge: <redsheep> I bet there will be more interest in benchmarks once people realize how many apps and games are working with reasonable performance now
21:39fdobridge: <redsheep> Is this bindless-ubo branch testable at the moment? I'm very curious how things are looking so far
21:50fdobridge: <gfxstrand> Not yet
21:50fdobridge: <gfxstrand> It's still hacks upon hacks.
21:51fdobridge: <gfxstrand> I don't even have `dEQP-VK.ubo.*` passing yet.
21:51fdobridge: <gfxstrand> Getting close, though.
21:52fdobridge: <gfxstrand> It's probably going to take most of next week before we can put all the pieces together.
21:53fdobridge: <gfxstrand> I also need to do more NVK reworks so that bindless UBOs are the default and therefore cheap.
21:53fdobridge: <redsheep> Yeah not trying to rush, one more week is a very quick turnaround for something this big, I think
23:07fdobridge: <gfxstrand> Okay, time to double-check that I didn't totally destroy the compiler refactoring. 😅
23:37fdobridge: <gfxstrand> Yup! Totally hosed shifts. Oh, well. Easy fix.