00:00 phomes_[d]: I will do that. Thanks. I was considering some of the llama.cpp tests too
00:00 phomes_[d]: The general theme of the tests was "how does gaming on nvk compare to prop". But I am happy to include other tests
00:00 karolherbst[d]: mhh yeah.. those are quite reliable as well
00:00 karolherbst[d]: though those are just compute testing, and.. it seems like shader improvements have a bigger impact on llama.cpp than on games
00:01 karolherbst[d]: and it's very very specialized in what it does
00:01 karolherbst[d]: memory load/stores, some arithmetic and then more load/stores and some MMA
00:02 karolherbst[d]: so I doubt llama.cpp numbers would reflect on gaming performance in any reliable way
00:02 karolherbst[d]: but at least it can be automated 🙃
00:03 karolherbst[d]: ~~or just run the pts stuff~~
00:04 karolherbst[d]: I wonder how difficult it would be to have steam account integration in pts to run certain games that have benchmark modes from the CLI...
00:08 karolherbst[d]: anyway, MR is here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40897
00:48 karolherbst[d]: mhhh
00:48 karolherbst[d]: r80 = fmul.ftz r80 r79 // delay=1 wt=000001
00:48 karolherbst[d]: r79 = fadd.ftz -r80 0xbf800000 // delay=1
00:48 karolherbst[d]: that really should be a `ffma`
00:50 karolherbst[d]: con 32 %59 = fmul %15 (2.000000), %55
00:50 karolherbst[d]: con 32 %64 = fneg %59
00:50 karolherbst[d]: con 32 %66 = fadd %65 (-1.700000), %64
00:50 karolherbst[d]: yeah.. soo...
00:51 karolherbst[d]: ahhh I see
00:52 karolherbst[d]: nir does this for fabs properly, but not for fneg.. simple solution..
00:53 karolherbst[d]: ohhhhh
00:53 karolherbst[d]: it only does it on the add side...
00:55 mhenning[d]: I think it's supposed to be normalized to only use fadd during opt_algebraic and then fneg is created in algebraic_late
00:56 karolherbst[d]: mhhh
00:56 mhenning[d]: or was it fsub maybe?
00:56 karolherbst[d]: yeah.. possibly
00:56 karolherbst[d]: late algebraic has the opts to crate ffma with fabs/fneg at least
00:56 karolherbst[d]: but only if it's on the add
00:57 karolherbst[d]: should be easy to add some nvk specific opts, but I'm wondering if other hw could make use of it as well
00:57 karolherbst[d]: ehh wait...
00:59 karolherbst[d]: there are rules for that
00:59 karolherbst[d]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/compiler/nir/nir_opt_algebraic.py?ref_type=heads#L3701
00:59 karolherbst[d]: that should trigger here, no?
01:00 karolherbst[d]: I don't see that fneg used anywhere else...
01:01 karolherbst[d]: mhh...
01:01 karolherbst[d]: maybe something something DCE..
01:05 karolherbst[d]: ohh wait..
01:05 karolherbst[d]: it's not not contract I think...
01:06 karolherbst[d]: mhhhhhh
01:07 karolherbst[d]: mhh it's not that...
01:08 karolherbst[d]: `fmul = ('fmulz' if mulz else 'fmul') + '(is_only_used_by_fadd)'` 🙃
01:08 karolherbst[d]: yes.....
01:09 karolherbst[d]: yeah it's not used by an fadd at all
01:15 karolherbst[d]: mhh...
01:16 karolherbst[d]: seems like the fmul was used elsewhere..
01:16 karolherbst[d]: oh well
01:17 karolherbst[d]: and `is_only_used_by_fadd` searches through `fneg` and `fabs`
01:21 karolherbst[d]: mhhh...
01:21 karolherbst[d]: `P2R` and `R2P` might help this shader...
01:31 karolherbst[d]: mhh TEX.SCR would also help..
01:32 karolherbst[d]: yeah.. seeing a couple of pointless movs to align the 2D coords
01:33 karolherbst[d]: and with .SCR it would be two scalar sources
01:38 karolherbst[d]: okay.. this part looks odd
01:39 karolherbst[d]: r112 = mufu.log2.f32 |r102|
01:39 karolherbst[d]: r112 = fmul.ftz r112 0x41800000
01:39 karolherbst[d]: r112 = mufu.exp2.f32 r112
01:42 karolherbst[d]: with an `fadd` I'd knew what to do about it, but with an `fmul`? probably has to stay that way
01:48 karolherbst[d]: ohh wait...
01:48 karolherbst[d]: `|r102|^16`, no?
01:48 karolherbst[d]: not sure that's much faster tho 🙃
01:49 karolherbst[d]: ohh maybe it's lowered `fpow` already...
01:49 karolherbst[d]: that would make sense...
01:49 karolherbst[d]: ah yeah it is
01:50 karolherbst[d]: yeah okay, I don't think there is any easy thing besides the `TEX.SCR` part.. that should help a bunch of shaders
02:25 karolherbst[d]: heh apparently `HMMA` can do negation on the two first sources...
03:05 mhenning[d]: karolherbst[d]: 4 fmuls probably actually are faster than two mufus and an fmul
03:06 karolherbst[d]: mhhh good idea, will try that if I don't forget
03:07 karolherbst[d]: I think I've looked into that for codegen at some point
05:16 eueumesmo: Hi!
06:17 esdrastarsis[d]: phomes_[d]: I think there's a good reason for doing this: https://bugzilla.mozilla.org/show_bug.cgi?id=2021722
12:04 karolherbst[d]: mhenning[d]: yeah.. it's like.. 0.5% faster
12:05 karolherbst[d]: Not sure it makes sense going above, but it might. I was seeing that it happened twice in the shader, and one of those even allowed to fuse with an fadd to a ffma
12:06 karolherbst[d]: mhh hard to tell it's really 0.5% faster tho because of thermals...
12:06 karolherbst[d]: but I seems to be a bit faster
12:12 karolherbst[d]: mhhhh
12:12 karolherbst[d]: I don't have high enough confidence there 🙃
12:12 karolherbst[d]: I really need to be able to disable boosting for that one
13:05 chikuwad[d]: time to start daily driving nvk
13:09 chikuwad[d]: still have prop installed but I'll only switch to it if/when needed
13:15 chikuwad[d]: not being able to screenshot the steam client sure is a fun bug
13:16 chikuwad[d]: even more fun than the banner rendering seeming being broken in the steam client on nvk
13:30 chikuwad[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1492517104501657741/image.png?ex=69db9e5f&is=69da4cdf&hm=e5c4fbcd5153f62ede462fcaaf938751568c8bee0a3a9ff2e1edccbd13608575&
13:30 chikuwad[d]: oh apparently the screenshot works, just the preview is broken
13:30 chikuwad[d]: but yeah, the banner rendering in steam is also broken
13:46 blisto[d]: chikuwad[d]: Use kernel 7.0!!!!
13:49 chikuwad[d]: chikuwad[d]: this might be a false alarm I'm accidentally on nvc0
13:49 chikuwad[d]: blisto[d]: yup, already am
13:49 chikuwad[d]: `7.0.0-rc7-1-mainline`
13:50 blisto[d]: Yay
13:50 blisto[d]: Compression
13:50 blisto[d]: Cab you feel the claustrophobia
13:51 chikuwad[d]: I can't actually
13:51 chikuwad[d]: preview in spectacle and steam banner rendering are both not broken on zink, phew
14:09 chikuwad[d]: https://tenor.com/view/penguinz0-moistcritikal-gif-27692002
14:10 chikuwad[d]: loading into wolfenstein new order took down my gpu and the whole system with it
14:11 chikuwad[d]: though that might've been my graphic settings
14:15 chikuwad[d]: yeah it was my graphic settings
14:17 chikuwad[d]: turns out cranking everything to max might fly on nvprop but not on nvk, who would've thought
15:54 karolherbst[d]: Okay.. I found something _very_ concerning, https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40900/diffs?commit_id=7bd07f71d39bd7758e9a4a12636bb970d6bad5a5
15:54 karolherbst[d]: and I wonder what's the reason we never ran into this or did we paper over this somehow?
15:54 karolherbst[d]: gfxstrand[d]: any idea why you pushed the MS index that early in the tex lowering?
15:55 karolherbst[d]: well my assumption is that LOD and offsets usually don't exist with MS, soo maybe this does nothing in the end
15:55 karolherbst[d]: just wondering if we have any MS related issues we were running into
15:56 mhenning[d]: maybe https://gitlab.freedesktop.org/mesa/mesa/-/work_items/14108 is related?
15:56 karolherbst[d]: yeah.. gonna CTS it
15:57 karolherbst[d]: but dunno
15:57 mhenning[d]: If you want to try that one, you'll want to re-enable VK_SAMPLE_COUNT_1_BIT in VkPhysicalDeviceSampleLocationsPropertiesEXT.sampleLocationSampleCounts
15:57 karolherbst[d]: would be funny if nvidia has the same bug 🙃
15:59 karolherbst[d]: but I don't know if the order is wrong/right prior Volta
16:04 karolherbst[d]: mhenning[d]: okay that test doesn't even do use any tex instructions
16:04 karolherbst[d]: just `pixld.my_index`
16:05 karolherbst[d]: okay sooo
16:05 karolherbst[d]: pixld writes to a predicate to signal if it's a MS texture or not
16:05 karolherbst[d]: ehh
16:05 karolherbst[d]: SSAA mode or not
16:06 karolherbst[d]: so if not SSAA it returns 0 and sets the pred to false
16:09 mhenning[d]: oh, right I was misremembering that one
16:09 karolherbst[d]: at least that "fix" doesn't have any changes in the fossils I have here...
16:09 karolherbst[d]: I do think it's a purely theoretical issue
16:10 karolherbst[d]: ohhh
16:10 karolherbst[d]: max payne 3 hits it...
16:11 karolherbst[d]: which is a pain to start because I kinda can't use my license/key for that one 🥲
16:13 karolherbst[d]: mhhhh
16:14 karolherbst[d]: sample_id, 0x1 -> 0x1, sample_id..
16:14 karolherbst[d]: `backend_flags=0x226`
16:14 gfxstrand[d]: karolherbst[d]: Do you have a test case that’s failing?
16:14 karolherbst[d]: gfxstrand[d]: no, I go by docs here...
16:15 karolherbst[d]: but it seems that a single game in my fossils hits this...
16:15 karolherbst[d]: running the CTS atm and no change so far
16:16 gfxstrand[d]: Then we should probably file a bug for more tests or write something in piglit
16:16 karolherbst[d]: I say no change so far, but: `Pass: 134965, ExpectedFail: 14, Skip: 137021, Duration: 8:57, Remaining: 1:26:47` there is still some time for that to change 😄
16:17 karolherbst[d]: but I also don't see why any of the tests in my baseline would hit this...
16:17 karolherbst[d]: *failing
16:17 karolherbst[d]: somebody with actual access to `Max Payne 3` might want to verify this...
16:19 gfxstrand[d]: We’re conformant so there’s nothing in the Vulkan or GL CTS that hits it. We need to write a test.
16:19 karolherbst[d]: could also be simply wrong in the doc
16:20 gfxstrand[d]: Could be
16:21 gfxstrand[d]: I know the old compiler was wrong in a couple cases so I ended up just looking at blob shaders
16:21 karolherbst[d]: yeah...
16:21 karolherbst[d]: it's the only thing I found that seems different
16:21 karolherbst[d]: or well
16:21 karolherbst[d]: the only thing that would matter that is different
16:22 karolherbst[d]: TEX,TLD and TLD4 have slightly different args, but the overall order seems okay except MS index
16:22 marysaka[d]: pretty certain that MS was always the "last" arg of the second source on Maxwell
16:23 marysaka[d]: it at least always was after lod for sure
16:23 marysaka[d]: but yeah we probably need some test here...
16:42 karolherbst[d]: yeah.. for the entire MR it still looks good: `Pass: 540872, ExpectedFail: 89, Skip: 548039, Duration: 35:09, Remaining: 58:40`
16:48 orowith2os[d]: snowycoder: were you still working on Kepler shenanigans?
16:49 orowith2os[d]: Haven't seen you in a second and I'm finally in a place to boot up a PC with a Kepler GPU.
16:54 orowith2os[d]: (on that note, did the changes for compression go back to Kepler on 7.0, or no?)
16:57 mhenning[d]: no, compression is turing+ for now
17:14 karolherbst[d]: phomes_[d]: if you are in benchmarking mood, can also run with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40900 😄
18:06 karolherbst[d]: Any NVK/NAK MRs I should review?
19:13 karolherbst[d]: yeah.. looks like that MS index reordering commit didn't cause any regressions nor fixes..
19:22 mhenning[d]: probably needs a test then
19:46 karolherbst[d]: I should have CTSed also my other MR...
20:01 phomes_[d]: karolherbst[d]: done 🙂
20:04 karolherbst[d]: oh wow, seems like it helps Serious Sam quite a bit
20:05 phomes_[d]: yes I was surprised as well. I did jump back to retest on main and it seems correct
20:06 karolherbst[d]: well more perf is more perf, I won't complain 😄
20:07 karolherbst[d]: maybe I should take a look at some of the slower games and check which of those shaders are causing perf issues..
20:08 karolherbst[d]: what was the issue with X4 again?
20:09 karolherbst[d]: ohh I do actually have x-com 2.. maybe I look into this one
20:12 karolherbst[d]: but also.. we really should land this one: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40897 🙃
20:27 orowith2os[d]: karolherbst[d]: FYI, you can simplify that match by having all the cases with the same result in the same line
20:27 karolherbst[d]: orowith2os[d]: I can't because they are all different types
20:28 karolherbst[d]:though
20:28 karolherbst[d]: I guess I could match a struct field name?
20:28 karolherbst[d]: not sure that's going to be simpler
20:28 orowith2os[d]: Not A | B | C => z?
20:28 karolherbst[d]: the inner type is a different type
20:29 karolherbst[d]: and it's even boxed
20:29 karolherbst[d]: so I don't think I could even match the field name this way
20:31 orowith2os[d]: Huh
20:31 orowith2os[d]: How often does this situation come up?
20:31 orowith2os[d]: Not often?
20:31 karolherbst[d]: not really
20:31 karolherbst[d]: it's pretty rare we need general code over all instruction types
20:32 orowith2os[d]: Mm, okey
20:33 orowith2os[d]: I was thinking of putting together a smol macro, but eh
20:33 karolherbst[d]: yeah... I was considering a macro as well, but...
20:33 karolherbst[d]: it's not like we add new instructions every week 🙃
20:33 karolherbst[d]: it happens maybe like a couple times a year
20:33 karolherbst[d]: if at all
20:40 orowith2os[d]: Though, it WOULD make it easier to work on in the fututr
20:41 orowith2os[d]: It's a 4-liner at most macro, and it makes life easier, soooooo....
22:01 chikuwad[d]: mel I addressed your comments on my MR again btw
22:39 mhenning[d]: okay, I'll probably take another look on monday
23:03 chikuwad[d]: no rush :)
23:14 karolherbst[d]: phomes_[d]: do you have some time to test this MR? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40903 I'm curious if any of the games you are tracking would benefit from that