00:01karolherbst[d]: anyway, I think it's fine to land without `nir_reassociate_cse_heuristic`, because I think it helps more with the UGPR + GPR encodings
00:01karolherbst[d]: probably
00:02karolherbst[d]: okay.. `nir_opt_reassociate` does nothing to the shader here 😄
00:02mhenning[d]: I could clean up what's already there if it, probably won't prioritize nir_reassociate_cse_heuristic
00:02karolherbst[d]: I _think_ reassociate moves convergent ones up, no?
00:03mhenning[d]: I think it tries to group convergent ones yes
00:03karolherbst[d]: mhhh
00:03karolherbst[d]: well I need them in the final iadd consumed by memory ops, so I think my opt is doing something differently there, no?
00:05karolherbst[d]: but mhh...
00:06karolherbst[d]: I don't see the ugpr form used, but I thikn that's because of some other opts I'm not doing yet.. mhh
00:07karolherbst[d]: ehh.. `r12..16 = ld.global.a64.strong.gpu.b128 [r8..10+ur4..6] // delay=2 wt=000011 rd:0 wr:1`
00:07karolherbst[d]: how did I miss that one earlier...
00:09karolherbst[d]: ohhh...
00:09karolherbst[d]: I uhm..
00:09karolherbst[d]: that opt didn't run 🙃
00:09karolherbst[d]: `Static cycle count: 11767` there we go
00:10karolherbst[d]: I accidentally ran the nak_nir_opt_algebraic_address opts earlier
00:10karolherbst[d]: and it aactually was just a regression
00:13karolherbst[d]: instructions/cycles/warps: 774/14984/20 (original) => 774/14984/20 (gpr+ugpr encoding) => 774/14984/20 (reassociate) => 730/11767/20 (my opt expression from above)
00:14karolherbst[d]: though that seems to help even without the gpr+ugpr stuff almost as good as with
00:16karolherbst[d]: I have to play around with that and see if it's generally a benefit for us...
00:17karolherbst[d]: soo.. let's get rid of the `r2ur` and `ru2r`
00:20karolherbst[d]: yeah.. that's gonna help a lot more
00:20karolherbst[d]: but that's for tomorrow me
00:21karolherbst[d]: ohh I forgot to scan for u2u64 oops
00:24karolherbst[d]: yeah.. I'll need some uub stuff to properly identify if I can do certain ops, that will help a lot more, but that's for tomorrow tomorrow me
01:38karolherbst[d]: okay.. without the lea opts from `nak_nir_lower_algebraic_late` the gpr+ugpr is a strict improvements, with those ops it's not, maybe I need to extend something there...
01:40karolherbst[d]: uhm..I meant using uldc
01:41airlied[d]: I usually dropped the lea opt when I was doing this stuff, as they seemed to interfere, and mostly would have gone into address calcs
01:41karolherbst[d]: yeah.. but all the address opts are happening before lea now
01:42karolherbst[d]: now it's just something going weird when using uldc
01:42karolherbst[d]: seeing fewer leas aand more iadds and generally more instructions which is weird, because with uldc I also see the annoying r2ur dropped..
01:42karolherbst[d]: I think it's the `iadd(is_used_by_non_ldc_nv)` rule that's kinda weird...
01:43karolherbst[d]: dunno
01:44karolherbst[d]: anyway... to get rid of a bit of 64 bit alu I'll need uub
01:44karolherbst[d]: seeing a bunch of iadd(i2i64(X >> a), b_const) << a( + 1) patterns...
01:45karolherbst[d]: yeah dunno what's up with uldc, I think it's RA being weird
01:46karolherbst[d]: or I need better expressions with uldc
01:47karolherbst[d]: airlied[d]: https://gist.github.com/karolherbst/0fd1a6344df9ea96b643eab630faca64 like it's totally doing the thing I want it to do
01:47karolherbst[d]: but the stats...
01:48karolherbst[d]: I'm sure it's RA, because I have like 30 movs more in the "optimized" shader 😢
01:49karolherbst[d]: `r5 = mov ur1` uhh yeah uhm... why.. 🙃
01:49karolherbst[d]: I'm sure it's related to the vector RA stuff that's a total disaster...
01:50karolherbst[d]: and I just get unlucky there
01:51karolherbst[d]: ohhh yeah
01:53karolherbst[d]: with my PHI vec RA patch: https://gist.github.com/karolherbst/8c27bec1801d797d454b17432a7cf103
01:53karolherbst[d]: much better
01:53karolherbst[d]: yeah so it's just RA being weird
01:55karolherbst[d]: I need to take rethink that patch, because it was a disaster in most other shaders
01:55karolherbst[d]: but it removes a loot of movs here
01:56karolherbst[d]: I had issues with that in vertex shaders that only update parts of vectors and ended up with higher GPR usages, because without my ra fix, RA just accidentally works better...
17:54_lyude[d]: Hmm. Now that we've got suspend/resume working again, I've been hitting an issue where gnome-shell seems to occasionally crash upon coming out of resume with this:
17:54_lyude[d]: Feb 06 12:46:23 GoldenWind kernel: nouveau 0000:c1:00.0: gsp: mmu fault queued
17:54_lyude[d]: Feb 06 12:46:23 GoldenWind kernel: r8169 0000:65:00.0 enp101s0: Link is Down
17:54_lyude[d]: Feb 06 12:46:23 GoldenWind kernel: Bluetooth: MGMT ver 1.23
17:54_lyude[d]: Feb 06 12:46:23 GoldenWind kernel: nouveau 0000:c1:00.0: gsp: rc engn:00000001 chid:6 gfid:0 level:2 type:31 scope:1 part:233 fault_addr:0000003ff52ef000 fault_type:0000000a
17:54_lyude[d]: Feb 06 12:46:23 GoldenWind kernel: nouveau 0000:c1:00.0: fifo:c00000:0006:0006:[gnome-shell[7240]] errored - disabling channel
17:54_lyude[d]: Feb 06 12:46:23 GoldenWind kernel: nouveau 0000:c1:00.0: gnome-shell[7240]: channel 6 killed!
17:55_lyude[d]: I'm probably going to add a few of the other missing parameters we came up with before for suspend/resume to see if that actually helps at all
19:49airlied[d]: might be worth tracking down what fault_type:0xa is
20:18karolherbst[d]: okay.. I know how to get rid of `legalize_ext_instr`, I just convert it to explicit calls on the sources, because instructions vary so much and a generic helper won't ever be useful unless with a huge amount of pain, e.g. tagging each source, but then also dealing with exceptions, like "source can only be uniform if this modifier is set to this value".
20:18karolherbst[d]: and I'll just wire up UGPR encodings while I work through it, e.g. for AST and others as well