01:07 fdobridge: <g​fxstrand> 32-bit NVK userspace works fine.
01:08 HdkR: Would be very sad if it didn't, It's needed for all those old D3D8/9/10/11 games :)
01:53 fdobridge: <g​fxstrand> Ugh... breaks...
01:55 fdobridge: <g​fxstrand> This part of the hardware really does hate SSA
02:04 fdobridge: <k​arolherbst🐧🦀> yes
02:04 fdobridge: <k​arolherbst🐧🦀> 😄
02:06 fdobridge: <k​arolherbst🐧🦀> ~~could just handle them as intrinsics~~
02:06 fdobridge: <k​arolherbst🐧🦀> ~~and the barrier file as memory~~
02:14 fdobridge: <g​fxstrand> Yeah, the problem with that is that all my spilling assumes SSA.
02:15 fdobridge: <g​fxstrand> Maybe I should just RE `bmov.32` and figure out what the value is
02:16 fdobridge: <g​fxstrand> If they really are just a mask (I still don't know if I believe that), then `bssy` is just `|=` and `break` is just `&= ~ACTIVE`
02:19 fdobridge: <k​arolherbst🐧🦀> see what I wrote earlier
02:20 fdobridge: <k​arolherbst🐧🦀> there are those `TS_THREAD_STATE_ENUM0` to `TS_THREAD_STATE_ENUM4` things and those instructions do something funky with that
02:20 fdobridge: <k​arolherbst🐧🦀> so if you want to RE it, you probably also have to query those 5 values
02:22 fdobridge: <k​arolherbst🐧🦀> it's apparently unsafe to modify those 5 values if interrupts are enabled
02:25 fdobridge: <g​fxstrand> Yeah, I've got to think about this more. 😕
02:25 fdobridge: <g​fxstrand> I think I may be able to play some tricks and make it work.
02:25 fdobridge: <g​fxstrand> It's just going to be annoying.
03:21 fdobridge: <g​fxstrand> It's break that's the really annoying part
03:29 fdobridge: <k​arolherbst🐧🦀> @gfxstrand could just implement `B1 = BREAK B0` as `BMOV B1 B0 + BREAK B1` and if RA makes `BMOV B0 B0` + `BREAK B0` out of it so be it :ferrisUpsideDown:
03:38 fdobridge: <g​fxstrand> Yeah...
06:13 fdobridge: <!​DodoNVK (she) 🇱🇹> Implementing EXT_map_memory_placed also works
21:39 fdobridge: <g​fxstrand> This should help with perf a decent bit: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26408
21:41 fdobridge: <g​fxstrand> CTS run looks mostly good but it looks like there's a small handfull of fails yet.
21:49 fdobridge: <g​fxstrand> This is entertaining...
21:50 fdobridge: <g​fxstrand> ```
21:50 fdobridge: <g​fxstrand> fifo: PBDMA0: 01000000 [] ch 3 [007fb10000 deqp-vk[275722]] subc 0 mthd 0000 data 00000000
21:50 fdobridge: <g​fxstrand> ```
21:51 fdobridge: <g​fxstrand> It looks like it maybe shaves a few minutes off a CTS run. That's fun.
21:54 fdobridge: <m​arysaka> Can't wait to have the mesh shader tests double it :vReiAgony:
21:55 fdobridge: <k​arolherbst🐧🦀> are there that many mesh shader tests?
21:57 fdobridge: <p​ixelcluster> just wait for raytracing :frog_demon:
21:59 fdobridge: <a​irlied> @gfxstrand that's a lot of 0s
22:00 fdobridge: <m​arysaka> the txt file is 4MB soo
22:00 fdobridge: <m​arysaka> the txt file with the test cases is 4MB soo (edited)
22:01 fdobridge: <k​arolherbst🐧🦀> pain
22:01 fdobridge: <m​arysaka> I will try to do a run tomorrow and see how things goes
22:54 Lyude: you know
22:54 Lyude: having openrm's source kind of fucks
23:12 karolherbst: Lyude: that reminds me.. we should scan the repo for all those PCI workarounds as well
23:13 Lyude: oh yeah good point
23:13 karolherbst: there are some laptops where runpm is still a bit broken
23:13 Lyude: (for those wondering: I found a Big list of displayport workarounds in nvidia's codebase :DD)
23:13 karolherbst: but I also wanted to wait until they officially support runpm
23:13 Lyude: fair
23:14 Lyude: I'm still trying to figure out link training btw, been going through nvidia's link training code and taking notes on the stuff they do. at least I've got a list of stuff to start trying now
23:35 fdobridge: <g​fxstrand> `Pass: 556678, Fail: 108, Crash: 84, Warn: 4, Skip: 3431474, Flake: 20, Duration: 52:02`
23:35 fdobridge: <g​fxstrand> So, it needs some fixing but not bad. Also, that's 10-15 min faster than a run without it.
23:46 fdobridge: <g​fxstrand> Okay, why does flushing compute caches at the end of `EndTransformFeedback()` do something?!? If I replace it with a 3D flush, even WFI, it doesn't workl.
23:46 fdobridge: <g​fxstrand> Okay, why does flushing compute caches at the end of `EndTransformFeedback()` do something?!? If I replace it with a 3D flush, even WFI, it doesn't work. (edited)
23:46 fdobridge: <g​fxstrand> WTH?!?
23:47 fdobridge: <g​fxstrand> It was even a NO_WFI invalidate. Somehow switching to compute flushes *something*
23:54 fdobridge: <k​arolherbst🐧🦀> 3D and compute state aliases
23:54 fdobridge: <k​arolherbst🐧🦀> well.. some parts of those
23:54 fdobridge: <k​arolherbst🐧🦀> and a subchannel switch causes a full WFI afaik
23:56 fdobridge: <p​ixelcluster> ye that was my knowledge too
23:56 fdobridge: <p​ixelcluster> ye that was what I heard too (that subchannel switch causes WFI) (edited)