21:21 karolherbst: imirkin: was there more to mov vs ld on constant memory than indirects?
21:21 karolherbst: trying to implement uniform registers, but it's quite a mess overall sadly :/
21:22 karolherbst: ohh uhhh
21:22 karolherbst: that's annoying
21:22 karolherbst: so mov can load from cb directly
21:22 karolherbst: ldc can only load indirectly
21:22 karolherbst: but
21:23 karolherbst: uldc can load directly and indirectly :/
21:24 imirkin: ldc can load directly too
21:24 imirkin: we just don't avail ourselves of that functionality :)
21:24 imirkin: coz it's the dumbest thing ever
21:24 karolherbst: ohh right.. RZ...
21:24 imirkin: LDC has to deal with barriers
21:25 imirkin: MOV does not (SM50+)
21:25 karolherbst: ahhh
21:25 imirkin: on earlier gens i expect it's ~the same
21:25 karolherbst: well... if I want to emit "mov u32 $ur0 c2[0xbc0]" I have to use LDC anyway
21:25 imirkin: but we'll use LDC for wide loads
21:25 karolherbst: umov is only valid for ur and imms
21:25 imirkin: right
21:25 imirkin: but ...
21:25 imirkin: the IR doesn't have to match the ops 100%
21:25 karolherbst: right
21:25 imirkin: just has to be a 1:1 correlation
21:25 imirkin: so you can just emit that mov as LDC or whatever
21:25 karolherbst: I was just wondering what the difference was
21:26 imirkin: even though in the nv50 ir, it'll be a mov
21:26 imirkin: although... i guess barriers
21:26 imirkin: so yeah. best make it a "ld"
21:27 karolherbst: mhhh
21:28 karolherbst: that kind of doesn't fit my approach
21:28 karolherbst: load propagation is failing right now anyway :/
21:28 imirkin: it may be built with certain assumptions
21:29 imirkin: tbh i don't quite remember
21:29 karolherbst: so.. what my plan was, that I mark all uniform values as uniform from the very begining
21:29 karolherbst: and just reject right before RA
21:29 imirkin: not sure what you're saying, but i also haven't thought about it at all
21:30 karolherbst: well, we have a nir pass which can tell which ssa value is uniform and which is not
21:30 karolherbst: so I just emite UGPRs instead of GPRS
21:30 karolherbst: and keep them
21:30 karolherbst: and right before RA after all opts and lowering, I check which values can stay ugprs and convert them to gprs if not
21:34 karolherbst: ehh.. I can just fix it up in the lowering pass
21:38 karolherbst: imirkin: the annoying part will be that indirects can also be uniform regs
21:39 karolherbst: global mem loads are even more annoying
21:40 karolherbst: g[$r5 + $ur3 + 0x344] is valid
21:40 karolherbst: but only for 32 bit addresses afaik
21:40 karolherbst: ohh nope
21:40 karolherbst: 64 bit as well
21:40 karolherbst: so
21:41 karolherbst: g[$r4d + $ur3 + 0x344] :)
21:42 imirkin: delightful :)
21:42 karolherbst: the ISA has a lot of goodies
21:42 imirkin: just coz it exists doesn't mean you must support it
21:42 karolherbst: DMUL wth a 32 bit constant :)
21:42 karolherbst: right.. but would be nice if we could
21:43 imirkin: sure
21:43 imirkin: getting the immediate offsets was a pretty big win
21:44 karolherbst: yeah
21:44 karolherbst: I am actually interesting how much the uniform regs help
21:44 karolherbst: I guess they can reduce gprs usages
21:44 karolherbst: and maybe it's faster overall
21:47 imirkin: i guess it can also avoid annoying math if you're smart
21:47 imirkin: e.g. some uniform offset from a dynamic index
21:48 karolherbst: yeah
21:48 karolherbst: for compute
21:49 karolherbst: the work_group_id stuff is probably considered uniform
21:49 karolherbst: and the you can do local_id + uniform_part