01:21 anarsoul[m]: Nope. We do not transform the program in regalloc (besides spilling registers)
02:32 enunes: anarsoul: in addition to these, I had a simple copy propagation pass which removed a few more movs too, it could also include modifier folding while doing that, I had it in a MR a few months ago but dropped it from there to unblock and never actually pushed another MR for it, can push it again
02:32 anarsoul[m]: sure, sounds good
02:32 enunes: anarsoul: for 3) I think the challenge was blocks with discard
02:33 anarsoul[m]: well, it produces an extra mov whenever store_output source is not an ssa
02:35 anarsoul[m]: btw, I also implemented using combiner for vector multiplication if one of sources is scalar, it improves instruction count for some shaders, but regresses others due to increased register pressure
02:35 anarsoul[m]: shader-db still says that instructions are helped
05:39 anarsoul[m]: I ran glmark2-es on 24.3.4 and git main to compare the difference, but it looks like glmark shaders didn't get much improvement. It's still noticeable though
05:39 anarsoul[m]: 24.3.4: https://gist.github.com/anarsoul/a1578faee9c6dd2928a69f53280bc9d2
05:40 anarsoul[m]: git main: https://gist.github.com/anarsoul/a19b3ae3ab0d21f60cc856535f4b5a1b
06:27 anarsoul[m]: we probably should start using fp16 for varyings. It would halve memory bandwidth requirements for geometry-heavy workloads and will likely have noticeable performance improvement
07:49 anarsoul[m]: tried it, it's a really small change but unfortunately no measurable performance difference in glmark2
19:03 anarsoul: enunes: so for 3) discard is indeed the challenge, and we cannot get rid of mov here, however we also create a mov for ppir_op_dummy which is usually register load
19:05 anarsoul: this one can be optimized if "end" block has the only instruction which is this mov