04:58 kar1m0[d]: karolherbst[d]: Compared to nouveau?
08:55 karolherbst[d]: yeah
08:55 karolherbst[d]: could be we do sup-optimal things that just ends up the gpu not being relaly busy
09:33 karolherbst[d]: ohhhh.. I know what we should do.. move more things out of loops
09:33 karolherbst[d]: there seem to be instructions within the loop which operate on values never touched by the loop
09:34 magic_rb[d]: We learned this is uni, loop invariant code motion
09:37 karolherbst[d]: in theory a good idea, in practice... not always
09:39 magic_rb[d]: How come? We were thought its always a good idea
09:39 magic_rb[d]: \*taught
09:41 karolherbst[d]: it might increase register pressure
09:41 karolherbst[d]: on the CPU that might mean you load/store values from memory more often
09:41 karolherbst[d]: on the GPU it might reduce the amount of threads that can run in parallel
09:41 karolherbst[d]: or cause spilling
09:42 karolherbst[d]: sometimes it's just cheaper to recalculate the value over and over again if that lets you run more stuff at the same time
09:50 kar1m0[d]: karolherbst[d]: I think it's about the fact that Nvidia has higher power consumption compared to nouveau
09:50 karolherbst[d]: well nvidia is also faster
09:50 kar1m0[d]: I know
10:21 karolherbst[d]: ohhhh.. `nir_opt_sink` is nice 🙂
10:25 karolherbst[d]: that gives a 2.5% speed up or so
10:27 karolherbst[d]: getting better: https://gist.github.com/karolherbst/9e41eb70b3f5cf7f3644e595e846126f
10:28 karolherbst[d]: hey.. why isn't it doing it for the if 😄
10:28 karolherbst[d]: ohh depends on the prev block, nvm
10:29 karolherbst[d]: `div 32 %211 = ishl %50, %210 (0xc)` could be moved...
10:46 karolherbst[d]: up0 = isetp.ge.u32 ur6 0x1000 // delay=6
10:46 karolherbst[d]: p0 null = plop3 pT pT up0 LUT[0xaa] LUT[0x0] // delay=13
10:46 karolherbst[d]: @!p0 bra L13 // delay=1 wt=111111
10:46 karolherbst[d]: uhm...
10:46 karolherbst[d]: that's like 20 cycles 🙃
10:46 karolherbst[d]: that's almost a third of the entire loop
10:46 karolherbst[d]: and it happens like 3 times
10:48 karolherbst[d]: maybe I should wire up the BRA taking a UP source
10:54 karolherbst[d]: this should be:
10:54 karolherbst[d]: up0 = isetp.ge.u32 ur6 0x1000 // delay=6
10:54 karolherbst[d]: bra.u !up0 L13 // delay=1 wt=111111
10:59 karolherbst[d]: but mhh.. how to go about it...
10:59 karolherbst[d]: mhenning[d]: you don't have by any chance a WIP patch to deal with the above?
12:20 karolherbst[d]: what a pain to implement this 🙃
12:42 karolherbst[d]: something nukes my very beautiful bras...
12:50 mohamexiety[d]: plop is a funny instruction name tbh
12:55 karolherbst[d]: uhh.. I see what's going on now.. opt_jump_threads does uhm.. weird things
12:55 karolherbst[d]: `instr.op = clone_branch(replacement);` is breaking everything now
12:58 karolherbst[d]: nice...
12:59 karolherbst[d]: now I get latency crashes, progress..
13:18 karolherbst[d]: I uhm.. I think I've done it entirely correctly: `TILE_M=128 TILE_N=128, TILE_K=32 BColMajor=1 workgroupSize=256 824.963706 TFlops`
13:19 karolherbst[d]: wait
13:19 karolherbst[d]: wait wait wait wait
13:19 karolherbst[d]: it works?
13:19 karolherbst[d]: well sometimes
13:19 mohamexiety[d]: 1 pflops nvk
13:20 mohamexiety[d]: omw to tell phoronix rn
13:20 karolherbst[d]: it only works for some types
13:20 karolherbst[d]: but when it works it's like... a massive perf increase
13:20 karolherbst[d]: like massive
13:20 karolherbst[d]: 45-> 80 TFLops
13:21 mohamexiety[d]: wait is the reporting wrong then? this says 824 tflops :KEKW:
13:21 karolherbst[d]: that's a test where it produces the incorrect values 😄
13:21 mohamexiety[d]: ahh LOL
13:22 mohamexiety[d]: a little extra randomness is fine for ML stuff though, makes sure the model is interesting
13:23 karolherbst[d]: you might notice a pattern: https://gist.githubusercontent.com/karolherbst/e3f9dcd865d5d740cf3b07063e220f66/raw/3f5cc710d4b9190269b3823234e34a28fe480971/gistfile1.txt
13:24 karolherbst[d]: anywaya. speed spee dspeed
13:25 karolherbst[d]: ohhhhh....
13:25 karolherbst[d]: I see what's going on...
13:29 karolherbst[d]: anyway.. speeed
13:54 karolherbst[d]: something not right... pain
14:03 karolherbst[d]: mhhhhh
14:05 kar1m0[d]: karolherbst[d]: https://tenor.com/view/ninja-ninjaishyper-low-taper-fade-massive-crazy-gif-13735679314385451792
14:15 karolherbst[d]: uhhh.. I'm dum dum
14:17 karolherbst[d]: `@!%up531 bra pT L9` -> `bra.u %up531 L9`
14:17 karolherbst[d]: 🙃
14:24 karolherbst[d]: noooo.. my perf gains are all gone
14:24 karolherbst[d]: it does got rid of three instructions, but sadly the overall cycles spent are still the same.. mhh
14:29 karolherbst[d]: ohh maybe I can do a thing...
14:37 karolherbst[d]: mhh but those cycles matter a lot there... oof
14:37 karolherbst[d]: nuked the nop wait 2 thingies: 40 -> 42 tflops
15:05 karolherbst[d]: I'm also confused about this: https://gist.github.com/karolherbst/b77d0a6fe502c2db525691950b57ad9f
15:05 karolherbst[d]: `ur7 = lea ur1 19 ur6 // delay=11` specifically
15:05 karolherbst[d]: why the delay of 11?
15:24 karolherbst[d]: mhh guess one would have to take into account across the branches where it's used
15:59 mhenning[d]: Yeah, we don't track that across branches yet. It's on my TODO list
15:59 mhenning[d]: karolherbst[d]: and no, I don't have a branch for this, but it's also on my TODO list
16:00 karolherbst[d]: got rid of pointless prmt: `Instruction count: 736 -> 712`
16:03 karolherbst[d]: mhenning[d]: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/de2929aba281346adfbc745cfb4ae0fa836e3503
16:04 karolherbst[d]: going to submit this little change: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/049c81021d1301fa8bd93340d4531eab1f7d7e5f
16:05 karolherbst[d]: going to have impressive shader-db stats
16:17 karolherbst[d]: anyway. that gets rid of all the prmts in the coop shaders 🙂
16:21 karolherbst[d]: well.. sadly not _that_ impressive
16:57 karolherbst[d]: any good ideas what could be improved here? https://gist.githubusercontent.com/karolherbst/60a03e25d8e275704e46b491153730be/raw/3cb62892b04f692fe4b816516122caef12cd586b/gistfile1.txt
16:58 karolherbst[d]: this is the loop of the coop mat shaders
16:58 karolherbst[d]: but I'm running out of ideas
17:01 karolherbst[d]: anyway.. should upstream my stuff 🥲
17:18 mohamexiety[d]: syn-2-rs| ERROR: Subproject unicode-ident is buildable: NO
17:18 mohamexiety[d]: ../subprojects/syn-2.0.87/meson.build:35:8: ERROR: Neither a subproject directory nor a unicode-ident.wrap file was found.
17:18 mohamexiety[d]: did anyone see this before?
17:20 karolherbst[d]: yeah, somebody did something annoying and now it's broken, unbreak it with: `meson subprojects update --reset`
17:20 mohamexiety[d]: thanks!
18:54 mangodev[d]: been noticing [this issue](https://bugs.kde.org/show_bug.cgi?id=447797#c10) on nvk and it's been driving me insane
18:54 mangodev[d]: trying to find where the cursor plane is managed, but the code is all over the place
18:57 mangodev[d]: reportedly, radv/radeonsi *used* to have this issue, but was either silently or accidentally fixed some time between 2022 or 2023?
19:27 mangodev[d]: hmmmm
19:27 mangodev[d]: big part of my issues may be `-D legacy-x11=dri2`
20:45 gfxstrand[d]: We just deleted DRI2
21:16 illwieckz[d]: Impressive!
21:17 mangodev[d]: gfxstrand[d]: in what timespan is "just"
21:18 mangodev[d]: i thought there was work to *start* doing so a couple weeks ago or so
21:18 mangodev[d]: was it actually yanked out of mesa?
21:20 ristovski[d]: mangodev[d]: iirc this is the commit? https://gitlab.freedesktop.org/mesa/mesa/-/commit/08c6ba223bb70acf23e87ea50c9f8e66038ceec0
21:20 ristovski[d]: so ~2w ago apparently
21:21 mangodev[d]: ah
21:21 mangodev[d]: still probably worth to pull that old code out then
21:23 ristovski[d]: afaik `-Dlegacy-x11` remains, but is now deprecated and unused
21:25 mangodev[d]: still need to fix my crate imports though
21:26 mangodev[d]: currently seeing if doing some bash jank could help make it work with the new subproject name formatting
21:27 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1399503633242128384/image.png?ex=68893cd3&is=6887eb53&hm=be91ac15565e59b3c53b06bc6522c497bf257ce70f0797f940cb1fa2c0bfe0fe&
21:27 mangodev[d]: i guess maybe i shouldn't have tried redownloading everything :(
21:33 gfxstrand[d]: mangodev[d]: Yup. It's gone since right before the 25.2 branchpoint
21:36 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1399505840469315604/image.png?ex=68893ee1&is=6887ed61&hm=3bd88545d2106515ea4a48f73951e59a55f17a79ae1651a6ee474cdcb61aa9c9&
21:36 mangodev[d]: mangodev[d]: i'm now confused at the fact that this directory exists
21:36 mangodev[d]: but it says it's an empty path??
21:41 mangodev[d]: why is this in arch `mesa-git` 🫠
21:41 mangodev[d]: ```bash
21:41 mangodev[d]: # although removing _build folder in build() function feels more natural,
21:41 mangodev[d]: # that interferes with the spirit of makepkg --noextract
21:41 mangodev[d]: if [ -d _build ]; then
21:41 mangodev[d]: rm -rf _build
21:41 mangodev[d]: fi
21:42 ristovski[d]: average aur PKGBUILD be like
21:43 ristovski[d]: i've seen some cursed and very questionable things over the years
21:44 mangodev[d]: i guess that wasn't the culprit?
21:44 mangodev[d]: it also says at the very start that `.` isn't a meson source directory 🤔
21:44 mangodev[d]: -# even though i didn't change any filepaths
21:45 mangodev[d]: i wonder if something here is set up wrong (somehow)
21:45 mangodev[d]: ```bash
21:45 mangodev[d]: # Build only minimal debug info to reduce size
21:45 mangodev[d]: # CFLAGS+=' -g1'
21:45 mangodev[d]: # CXXFLAGS+=' -g1'
21:45 mangodev[d]: meson setup mesa _build "${meson_options[@]}"
21:45 mangodev[d]: meson configure --no-pager _build
21:45 mangodev[d]: ninja "$NINJAFLAGS" -C _build
21:45 mangodev[d]: it was working before, but i think it's too janky to be able to stand up straight anymore
21:49 mangodev[d]: i *did* have to change this part to work with the new subproject format
21:49 mangodev[d]: but idk if that broke things or not
21:49 mangodev[d]: ```bash
21:49 mangodev[d]: # Rust crates for NVK, used as Meson subprojects
21:49 mangodev[d]: declare -A _crates=(
21:49 mangodev[d]: ["proc-macro2"]="1.0.70"
21:49 mangodev[d]: ["quote"]="1.0.33"
21:49 mangodev[d]: ["syn"]="2.0.39"
21:49 mangodev[d]: ["unicode-ident"]="1.0.12"
21:49 mangodev[d]: ["rustc-hash"]="2.1.1"
21:49 mangodev[d]: )
21:49 mangodev[d]: for _crate in "${!_crates[@]}"; do
21:49 mangodev[d]: source+=("${_crate}-${_crates[$_crate]:0:1}-rs.tar.gz::https://crates.io/api/v1/crates/${_crate}/${_crates[$_crate]}/download")
21:49 mangodev[d]: done
21:49 mangodev[d]: the files are created and unzipped and everything
21:54 karolherbst[d]: okay uhhh...
21:54 karolherbst[d]: this might sound like a stupid question...
21:55 karolherbst[d]: but why are we emitting NAK_SV_TID_Y and NAK_SV_TID_Z when workgroup_size_variable is false and workgroup_size is [ N, 1, 1]? 😄
22:00 karolherbst[d]: and why didn't I notice before 🙃
22:08 mhenning[d]: we already have some special cases like that
22:08 mhenning[d]: see nak_nir_workgroup_has_one_subgroup in nak_nir.c
22:09 mhenning[d]: but it's possible we're missing a case
22:09 karolherbst[d]: yeah...
22:09 karolherbst[d]: it doesn't do anything for a 256x1x1 workgroup 🙂
22:09 karolherbst[d]: I'll just make nak_nir_load_sysval smarter
22:12 airlied[d]: you'd also have to ask wtf a N,1,1 is reading TID Y or Z for 🙂
22:12 karolherbst[d]: might be us
22:12 karolherbst[d]: nir_intrinsic_load_local_invocation_id e.g.
22:12 karolherbst[d]: or nir_intrinsic_load_invocation_id
22:12 karolherbst[d]: well not the last one
22:13 karolherbst[d]: anyway.. trivial patch, getting rid of quite a bit of alu
22:13 karolherbst[d]: 0 constant folding goes brrrrt
22:20 karolherbst[d]: there are like 8 shaders in total in shader-db benefiting here 😄
22:28 karolherbst[d]: anyway... perf numbers are looking good
22:33 gfxstrand[d]: karolherbst[d]: Because no one has bothered to force zeros based on individual dimensions. <a:shrug_anim:1096500513106841673>
22:34 karolherbst[d]: hey.. it impacts a total of 8 shaders in shader-db! 😄
22:46 bornecrantz[d]: gfxstrand[d]: Don't you mean `__we __just __deleted __DRI2`?
23:06 gfxstrand[d]: bornecrantz[d]: Yeah, more or less.
23:17 x512[m]: Somebody cares about DRI2?
23:17 x512[m]: It looked like a bad design from beginning.
23:42 gfxstrand[d]: It was better than DRI1. <a:shrug_anim:1096500513106841673>
23:47 HdkR: DRI4 will solve all the new problems.
23:48 karolherbst[d]: okay, enough value range for today 😄
23:48 x512[m]: DRI 1 is about globally locking GPU by each application?
23:50 karolherbst[d]: instructions: 748 -> 704 with value range analysis stuff
23:50 karolherbst[d]: should probably clean it up, because I'm also not 100% sure if it's all sound, but I think it is
23:51 karolherbst[d]: it's not doing much.. just tagging enough as nsw and nuw and letting optimizations kick in
23:51 karolherbst[d]: can get rid of a loooot of shifts if the ishl is nsw/nuw
23:53 gfxstrand[d]: x512[m]: Yup. And passing damage regions around because everyone's drawing to the front buffer. It's awesome.
23:58 karolherbst[d]: also.. anybody want to review ldsm stuff, so we can at least merge that one
23:58 x512[m]: Even my Haiku RADV port with custom GPU server experiment supported multiple clients.
23:59 x512[m]: Haiku still has an API for directly accessing front buffer. System pass applications a region of screen where drawing on framebuffer is allowed.