01:07mohamexiety[d]: phomes_[d]: got distracted but the main mr should be fixed now. thanks a lot for your help! ❤️
13:20zmike[d]: it seems like descriptor heap doesn't improve perf all that much vs descriptor buffer, which leaves me even less motivated to consider a zink implementation :/
14:33karolherbst[d]: ohh right.. I wanted to look into the imnmx.u32 r8 ur6 pT + r2ur pattern...
14:48karolherbst[d]: seeing patterns like this in a few shaders:
14:48karolherbst[d]: /*0000*/ BMOV.32.CLEAR RZ, B0 ;
14:48karolherbst[d]: /*0010*/ BSSY B0, 0x30 ;
14:48karolherbst[d]: /*0020*/ BSYNC B0 ;
14:48karolherbst[d]: /*0030*/ NOP ;
15:46mhenning[d]: karolherbst[d]: how do we end up with that?
15:46karolherbst[d]: I wished I'd knew
15:47karolherbst[d]: the shader is massive with impressive CFG
15:47karolherbst[d]:but
15:47karolherbst[d]: the label NAK referenced doesn't exist
15:48mhenning[d]: hmm weird
15:48mhenning[d]: zmike[d]: "doesn't improve perf" on what workload, out of curiosity?
15:53karolherbst[d]: mhenning[d]: https://gist.githubusercontent.com/karolherbst/b43a50ef16ec7fcf0322a9b809e726b3/raw/cfd673d96fdda25a2e0f854e72b5499423f1d5cb/gistfile1.txt
15:53karolherbst[d]: just straight up there after from_nir
15:53karolherbst[d]: and no idea where it's coming from
15:57mhenning[d]: oh, that's the shared memory initialization thing
15:58mhenning[d]: src/nouveau/compiler/nak/from_nir.rs line 4086
15:59karolherbst[d]: ohh the stuff inside `parse_block`? My line numbers are too much off
15:59karolherbst[d]: mhhhhhh
15:59mhenning[d]: yes
15:59karolherbst[d]: weird...
16:17karolherbst[d]: I should probably try to figure this out and see what's going on there.. but also check how much of a perf problem that is
16:18karolherbst[d]: but at least it's only in the first block..
16:19karolherbst[d]: anyway.. seems super unimportant 🙃
16:29zmike[d]: mhenning[d]: from vkd3d-proton apparently
20:57karolherbst[d]: mhenning[d]: any further high-level comments on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40392 ? I kinda want to start CTSing it, but that also includes checking on older GPUs and I'd rather do it once rather than again after more review 😄
21:04karolherbst[d]: mhh
21:04karolherbst[d]: r80 = ld.global.a64.constant.u8 [r80+ur8..10], pT // delay=1 wr:0
21:04karolherbst[d]: r80 = prmt r80 [0x4440] rZ // delay=1 wt=000001
21:04karolherbst[d]: r80 = shf.r.w.u32.hi rZ r80 r84 // delay=1
21:04karolherbst[d]: I _wonder_ if we should assume that from loads the high bits to be properly set...
21:06karolherbst[d]: like not really sure why we are so aggressive with prmt here..
21:07karolherbst[d]: ohh because of the `u2u32` mhh
21:07karolherbst[d]: mhhhhhh
21:09karolherbst[d]: I wanted to play around with the "dest_type" idea of loads, right? mhhh
21:36karolherbst[d]: or maybe I should do that in copy_prop? mhh
21:48karolherbst[d]: okay.. those prmts are annoying me 😄