00:00 fdobridge: <g​fxstrand> Is it constant latency?
00:03 fdobridge: <k​arolherbst🐧🦀> it's not 🙂
01:16 fdobridge: <g​fxstrand> Of course...
01:16 fdobridge: <g​fxstrand> Making it variable latency doesn't fix my GPU hangs, tough. 🤷🏻‍♀️
01:50 fdobridge: <k​arolherbst🐧🦀> mhhh
01:50 fdobridge: <k​arolherbst🐧🦀> given how those uniform regs work, I'd be surprised if it's scoreboard related
01:51 fdobridge: <k​arolherbst🐧🦀> or huh
01:51 fdobridge: <k​arolherbst🐧🦀> @gfxstrand what latency are you assuming for U* instructions?
01:52 fdobridge: <g​fxstrand> The only uniform instruction I'm emitting right now is R2UR
01:52 fdobridge: <k​arolherbst🐧🦀> I see
01:54 fdobridge: <k​arolherbst🐧🦀> are you sure that the register contains the same across all lanes?
01:56 fdobridge: <k​arolherbst🐧🦀> oh uh
01:56 fdobridge: <k​arolherbst🐧🦀> @gfxstrand what kind of GPU hang are you seeing though?
01:56 fdobridge: <g​fxstrand> A fault
01:57 fdobridge: <k​arolherbst🐧🦀> could be that something needs to be set up before those regs can be used... though... dunno, maybe not
01:57 fdobridge: <k​arolherbst🐧🦀> are you using the result of the R2UR?
01:57 fdobridge: <g​fxstrand> All active lanes, yes.
01:57 fdobridge: <g​fxstrand> Yes, I'm using it for a CBuf load
01:57 fdobridge: <k​arolherbst🐧🦀> mhhhh
01:58 fdobridge: <g​fxstrand> I don't think there's anything in SPH.
01:58 fdobridge: <k​arolherbst🐧🦀> active as in control flow, or instruction predicate?
01:59 fdobridge: <g​fxstrand> I can plug in my Turing and disable GSP
01:59 fdobridge: <g​fxstrand> And get actual fault into
01:59 fdobridge: <g​fxstrand> *info
01:59 fdobridge: <g​fxstrand> There's no control flow or predication in this shader
01:59 fdobridge: <k​arolherbst🐧🦀> okay
02:00 fdobridge: <k​arolherbst🐧🦀> mind reading the output predicate and only do the load if it's false?
02:01 fdobridge: <k​arolherbst🐧🦀> though that shouldn't matter at all...
02:02 fdobridge: <k​arolherbst🐧🦀> well.. maybe you should double check the encoding or so, but I'm also going to bed now (it's 4 am 🥲 )
02:03 fdobridge: <g​fxstrand> Yeah, I'm going to look at it more tomorrow
02:03 fdobridge: <g​fxstrand> It passes the first test and faults on the second.
02:06 fdobridge: <g​fxstrand> I think if I reboot with Turing + no GSP and get actual fault addresses, I'll be able to figure it out.
02:07 fdobridge: <k​arolherbst🐧🦀> probably something wrong fetching the memory or so
06:54 fdobridge: <d​adschoorse> I mean, before gfx11.5, amd didn't support floating point operations on the SALU either, but the SALU was still very useful. The other restrictions sound pretty bad
06:54 fdobridge: <d​adschoorse> I mean, before gfx11.5, amd didn't support floating point operations on the SALU either, but the SALU was still very useful. The other restrictions sound pretty bad tho (edited)
09:11 fdobridge: <p​homes_> I just got back from travel and my NVK merch package has arrived. The mug is really beautiful 🙂
10:12 fdobridge: <m​agic_rb.> Im gonna get a nvk tshirt when i have some more cash
10:12 fdobridge: <m​agic_rb.> (It takes up less space than a mug)
12:13 pandaaaaa: encountered some issues during system hibernation: Here's the log output:nouveau 0000:01:00.0: DRM: failed to idle channel 1 [DRM], 0000:01:00.0: PM: pci_pm_freeze(): nouveau_pmops_freeze+0x0/0x20 [nouveau] returns -1,nouveau 0000:01:00.0: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xc0 returns -16, nouveau 0000:01:00.0: PM: failed to freeze async: error -16;
12:14 pandaaaaa: system info:Linux DESKTOP 6.6.13+bpo-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.13-1~bpo12+1 (2024-02-15) x86_64 GNU/Linux
12:56 fdobridge: <d​adschoorse> does nvidia have unordered comparisons and ordered not equal?
13:16 fdobridge: <d​adschoorse> ah, I found nak's FloatCmpOp, so that's a yes
13:19 fdobridge: <d​adschoorse> what do IsNum and IsNan do?
13:20 fdobridge: <d​adschoorse> do they take one operand or two like amd's v_cmp_u_f32/v_cmp_o_f32
13:20 fdobridge: <d​adschoorse> do they take one operand or two like amd's v_cmp_u_f32/v_cmp_o_f32? (edited)
13:33 fdobridge: <d​adschoorse> ah, looks like they do the same things as amd's according to PTX
13:33 fdobridge: <d​adschoorse> I think I will move some aco opts to nir then, and it will also be useful for nak
14:22 fdobridge: <g​fxstrand> Yeah, I think we definitely want to use it and probably more than Nvidia. I just need to think about how it all works and come up with a strategy. Honestly, treating them like preambles might not be a bad plan.
14:40 cwabbott: aco has a whole pre-isel pass that tries to figure out if uniform NIR ALU ops have sources that can be put in scalar registers "for free" and demotes it if not
14:40 cwabbott: and it's a dataflow analysis that runs until fixed point to handle phi nodes in loops
14:41 cwabbott: AMD had pretty similar restrictions around not handling float ops in the scalar ALU until recently
14:42 cwabbott: in ir3 I took a slightly different approach and had a pass to demote scalar ALU + scalar to vector into vector ALU
14:42 cwabbott: and that also demotes phi nodes
14:42 cwabbott: qualcomm lets you use the scalar ALU for more things, so it made sense
15:07 fdobridge: <g​fxstrand> There's a part of me that's tempted to just emit scalar for all `!divergent && in_uniform_cf` and then have a legalization pass in NAK which demotes things as needed.
15:29 fdobridge: <d​adschoorse> Not being able to use the scalar alu in control flow suck, especially since some games have started to use scalarization loops. Which work well for amd.
15:32 fdobridge: <d​adschoorse> cwabbott: Can Qualcomm's scalar alu run in parallel to the vector units?
15:32 fdobridge: <d​adschoorse> Or is it just for saving regs?
15:36 cwabbott: it can run in parallel, I think
15:37 cwabbott: we have to ignore it when inserting nops for vector instructions
16:24 fdobridge: <g​fxstrand> It's shifted. My fault address has an extra 0
16:59 fdobridge: <k​arolherbst🐧🦀> :blobcatnotlikethis:
17:00 fdobridge: <k​arolherbst🐧🦀> guess truncated means not encoded then
17:06 fdobridge: <g​fxstrand> 🤷🏻‍♀️
18:17 fdobridge: <g​fxstrand> With that fixed, I'm making progress. Looks like something's going funky with helpers, though.
18:18 fdobridge: <g​fxstrand> Or at least something isn't 100% deterministic
18:18 fdobridge: <g​fxstrand> Ugh... it's synchronization
18:21 fdobridge: <g​fxstrand> Yeah, my dep tracker wasn't taking bindless cbufs into account.
19:17 fdobridge: <z​mike.> @gfxstrand you probably have seen this: does vkcts have sampler swizzle tests for every swizzle combination of every format?
19:46 fdobridge: <g​fxstrand> It has a lot but not for the whole matrix
20:39 fdobridge: <g​fxstrand> I'm honestly a little surprised Larabel hasn't been benchmarking NVK like once a month...
20:44 fdobridge: <z​mike.> usually he does it after releases
20:48 fdobridge: <g​fxstrand> Yeah
21:34 fdobridge: <r​edsheep> I bet there will be more interest in benchmarks once people realize how many apps and games are working with reasonable performance now
21:39 fdobridge: <r​edsheep> Is this bindless-ubo branch testable at the moment? I'm very curious how things are looking so far
21:50 fdobridge: <g​fxstrand> Not yet
21:50 fdobridge: <g​fxstrand> It's still hacks upon hacks.
21:51 fdobridge: <g​fxstrand> I don't even have `dEQP-VK.ubo.*` passing yet.
21:51 fdobridge: <g​fxstrand> Getting close, though.
21:52 fdobridge: <g​fxstrand> It's probably going to take most of next week before we can put all the pieces together.
21:53 fdobridge: <g​fxstrand> I also need to do more NVK reworks so that bindless UBOs are the default and therefore cheap.
21:53 fdobridge: <r​edsheep> Yeah not trying to rush, one more week is a very quick turnaround for something this big, I think
23:07 fdobridge: <g​fxstrand> Okay, time to double-check that I didn't totally destroy the compiler refactoring. 😅
23:37 fdobridge: <g​fxstrand> Yup! Totally hosed shifts. Oh, well. Easy fix.