04:18 anholt: which NVISA_*_CHIPSET is nvc0?
04:18 imirkin: GF100
04:18 imirkin: or whichever one is #define'd to 0xc0 :)
04:19 anholt: ha, obvious now
04:23 imirkin: anholt: https://nouveau.freedesktop.org/CodeNames.html#NVC0
04:24 imirkin: the chip has a register at address 0 which has the chip id in it, that's the nvxx thing (in hex)
04:24 imirkin: but then there is also G(family)(iteration) naming too
04:24 imirkin: e.g. GF100
04:24 imirkin: usually xx7/8 are the smaller chips
04:24 imirkin: and xx0/4 are the bigger ones
04:25 imirkin: but within a family, we never differentiate anything in nouveau
04:25 imirkin: (except tesla, which has DX10.0 and DX10.1 chips - the latter gained some functionality)
04:27 imirkin: (specifically texgather, cube arrays, texquerylod, and some non-shader functionality around xfb, seamless cubemaps)
04:28 anholt: ooh, interesting. nir fuse-ffma ends up hurting overall.
04:28 imirkin: yeah, definitely
04:28 imirkin: we try to be somewhat careful about fusing, i think?
04:28 anholt: are there some restrictions on modifiers or something?
04:28 imirkin: yes
04:29 anholt: ok, makes sense then.
04:29 imirkin: depending on the ISA you're targeting
04:29 imirkin: like the nv50 isa, src3 == dst
04:29 anholt: so extra movs if you need src3 again
04:30 imirkin: yeah, or if there's any funny business
04:30 imirkin: also limited modifiers (like neg, abs, etc)
04:31 imirkin: and also no constbufs i think?
04:31 imirkin: or less?
04:31 imirkin: (not to mention immediates)
04:31 imirkin: in general const/imm can only go into the second source
04:31 imirkin: nv50 has 4- and 8-byte instruction encodings
04:32 imirkin: the 4-byte ops are paired, pretty sure there's dual-dispatch
04:32 imirkin: but you can't encode as much stuff into 4 bytes, so more restrictions
04:32 imirkin: also if you go over 64 regs, you have to use the long encoding
04:32 imirkin: (out of 128 on nv50)
04:33 imirkin: (it's 256 as far as RA is concerned - everyhting is allocated as half-regs, multiplies want half-regs for the 16-bit variants)
04:38 mhenning: is the half-reg thing only for nv50? I thought we were doing RA in terms of 32-bit regs on nvc0 but I might be confused
04:39 imirkin: correct
04:39 imirkin: yes, on nvc0, RA is in 32-bit units
04:40 imirkin: on nv50, RA is in 16-bit units
04:40 imirkin: the target has a "getFileUnit()" which indicates the unit of RA
04:40 anholt: was there 16-bit support that was dropped?
04:40 imirkin: anholt: no. integer mul takes half-regs on nv50
04:40 imirkin: there are 24- and 16-bit integer mul modes
04:40 imirkin: the 16-bit ones take half-regs
04:41 imirkin: (and are combined to build up a 32-bit mul)
04:41 anholt: yeah, ok
04:42 imirkin: there might be other situations where those are used, but that's the only one i can think of
04:42 imirkin: oh, also i made extensive use of 16-bit regs for image stuff
04:42 imirkin: for the address calculations and format conversions
04:42 imirkin: (compute-only)