00:01pmoreau: I think I’ll go to bed rather than RE those bits right now.
00:03karolherbst: :D good choice
00:03pmoreau: The Volta do seem to be more sparse than previous gens’ ones.
00:03karolherbst: I think skeggsb wanted too look into this already
00:03karolherbst: so better wait what he decides to do
00:03karolherbst: pmoreau: yeah, that's why I think they have less forms/layouts now
00:05pmoreau: I’ll go back to my memory management then, tomorrow. Need to clean up things and figure out why some tests are generating some OOR_ADDR errors (and failing).
00:05pmoreau: (Whole error in dmesg:
00:05pmoreau: [ +0.286677] nouveau 0000:01:00.0: gr: TRAP ch 3 [007f905000 g_arg_w_2char.t]
00:05pmoreau: [ +0.000005] nouveau 0000:01:00.0: gr: M2MF 80000002 [PUSH_NOT_ENOUGH_DATA]
00:05pmoreau: [ +0.000007] nouveau 0000:01:00.0: gr: GPC0/TPC1/MP trap: global 00000000  warp 000e [OOR_ADDR]
00:06pmoreau: I’m a bit surprised about that “PUSH_NOT_ENOUGH_DATA”.
01:38imirkin: pmoreau: mem alignment has to be to the size of the load
01:39imirkin: i.e. a 1-byte load is 1-byte-aligned, a 16-byte-load is 16-bytes aligned
01:40imirkin: karolherbst: i think you've misunderstood some stuff regarding handleManualTXD. also note that cube array sampling *is* 3+1, i.e. a vec3 coord + float array index. texture(samplerCubeArray) takes a vec4 "P" argument.
12:55pmoreau: imirkin: Makes sense!
13:27pmoreau: So, after playing a bit with the ISA, it seems that at least ADD and CVT (and I would assume every instruction that accepts cmem operands but isn’t ld or st) requires those cmem operands to be at least 4-bytes aligned.
13:29pmoreau: I need to add a check in the folding pass to not fold `ld.u8 %r1 c[0x0][0x1]; cvt.u32.u8 %r2 %r1` into `cvt.u32.u8 %r2 c[0x0][0x1]` for example, as it will not work (the produced instruction means `cvt.u32.u8 %r2 c[0x10][0x0]`)
13:52imirkin: pmoreau: yeah, what i said only applies to loads. regular instructions take 32-bit args.
13:54pmoreau: imirkin: Something like this should work, right? https://hastebin.com/afelijaxen.lua It worked on my small example at least.
13:55pmoreau: Maybe the check for const memory is useless as it won’t work for any of the memory types (except FILE_GPR)?
13:58imirkin: gmem should work too for smaller loads
13:58imirkin: also technically we represent conversions from predicate -> gpr as a "cvt u8"
13:58imirkin: i'd say sf == FILE_MEMORY_*
13:59imirkin: (maybe add a helper)
13:59pmoreau: Ah, we don’t use the set? operation for predicate -> gpr?
14:00pmoreau: Having an helper function `isMemory()` or similar sounds good.
14:01pmoreau: Oh wait, there already is `isMemoryFile()` :-D
14:26pmoreau: Time to play with the I2I ISA now, to try to figure out what is valid. Cause the compiler will never use that instruction for cvt.u32.u8 for example, but rather some moves and BFEs.
15:39pmoreau: nvdisasm does print out things like I2I.U64.U64, so it should handle 64-bit values, but when running code with `I2I.S64.S32 R2, R2;`, I get “ILLEGAL_INSTR_ENCODING”.
15:40pmoreau: Maybe it needs some “extended” flag to read from/write to a 64-bit value.
16:44imirkin_: pmoreau: there's stuff that's "valid" in nvdisasm but isn't for real
16:45imirkin_: pmoreau: there's some current I2F fanciness that we already use in nouveau
16:45imirkin_: for the "take byte N of value and convert it into float" stuff
16:46pmoreau: Fancy stuff like the .B1/B2/B3 on the source operand?
16:46imirkin_: or H1 :)
16:46imirkin_: iirc you still have to divide yourself
16:46pmoreau: Haven’t found out that one yet, but maybe it doesn’t exist on the I2I
16:46imirkin_: for the unpackUnorm() stuff
16:46imirkin_: yeah, i think that's just I2F
16:47pmoreau: (I’m currently re-RE’ing the different bits of the I2I instruction.
16:47imirkin_: yeah, i think that some stuff like I2I.S64.U64 and so on aren't valid.
16:48pmoreau: If nvdisasm does print some U64/S64 as acceptable src/dest, I would assume that there’s at least one valid combination with a U64/S64 as src or dest.
16:48pmoreau: Just have to find out which one
16:51imirkin_: not necessarily
16:51imirkin_: shared bitfield lookups
16:55pmoreau: Ah, maybe
17:43karolherbst: Now I have to deal with dvec4 inputs in. ir, which are just single variables. Now either change the assignSlot code to deal with 64 bit slots or lower that in nir directly...
17:44karolherbst: I guess I ave to write a nir pass then
22:19imirkin_: grrr. skeggsb, can you do /mng chanserv set #nouveau mlock +nts
22:19imirkin_: msg, of course.