00:28pmoreau: \o/ vstore_private is now passing for all types, except short4 and ushort4 😆
00:29pmoreau: Main fix was to make sure address registers were used for addressing private memory as well
00:39pmoreau: And vload_private is completely fixed! Getting quite close to passing basic now: mainly some more 8-/16-bit support for conversions, supporting SV_WORK_DIM, and debugging one or two additional test.
01:58karolherbst: pmoreau: for the conversion stuff we have that in nir
01:58karolherbst: pmoreau: just need to convert over to use those nir convert thingies
01:58karolherbst: that adds the rounding stuff and everything as well
01:59karolherbst: and then we can add lowering based on the intrinsics
01:59karolherbst: which.. we also have
01:59karolherbst: ask jenatali, they had to deal with it on their end as well
02:00karolherbst: anyway.. I want to ditch those u2u and whatever alu op handling
02:00karolherbst: those are just soooooo painful
02:00karolherbst: but we can only convert after optimizations or so
02:00karolherbst: do the opt
02:01karolherbst: lower shit (like alu conversions to intrinsics), do lowering of unsupported conversion combinations with a private and whatever MS is using and the opt shit again
02:01karolherbst: but the int8/int16 lowering also handles some of that...
02:01karolherbst: it's ... strange
02:02karolherbst: I lost the overview at some point
05:51mangix: what's with this channel and nick changeS?
05:52mangix: apparently I have to leave it and then do so
05:57HdkR: Likely +M
05:58HdkR: If you're leaving to change alias and identify with nickserv
15:55pmoreau: karolherbst: I’ll look at that NIR conversion stuff then, thanks! I will look at fixing the inmath_long(2|4) multiplication tests first.
16:10pmoreau: Ah right, the spilling issue with two levels of compound values…
16:12pmoreau: Allow max amount of registers: “type: 5, local: 0, shared: 0, gpr: 13, inst: 84, bytes: 592”
16:12pmoreau: Limit to 16 registers due to thread count: “type: 5, local: 12, shared: 0, gpr: 14, inst: 104, bytes: 776” and the kernel now fails due to running into that spilling issue. 🙃
19:24pmoreau: Oh great, the `ld $r0 s[0xb4]; add $r0 $r0 $r1 -> add $r0 s[0xb4] $r1` optimiser does not check that the offset isn’t too large for the instruction encoding, so I actually end up with `add $r0 s[0x34] $r1` because of that… :sigh:
19:26pmoreau: I think there is a function somewhere that says whether that folding is valid or not, let’s see if I can tweak it.
19:36karolherbst: pmoreau: uhhh...
19:37karolherbst: pmoreau: I hope it's not an signed vs unsigned thing
19:37karolherbst: or are there indeed just 7 bits?
19:37pmoreau: It seems to only be 7 bits
19:38pmoreau: 0x7c seems to be the biggest offset that can fit in that form.
19:38pmoreau: Any higher and I start modifying the destination of the instruction.
19:41karolherbst: so it's... 4 bits?
19:42karolherbst: pmoreau: are you checking with nvdisasm?
19:42karolherbst: or maybe there are always 7 bits, because for 8 bit ops? :D
19:42pmoreau: envydis, but I could check with nvdisasm
19:43karolherbst: yeah.. don't trust envydis :D
19:45pmoreau: Yup, 5 bits
19:45pmoreau: /*0000*/ IADD R0, g [0x1f], R0; /* 0x042007802000fe01 */
19:45pmoreau: /*0008*/ IADD R64, g [0x1f], R0; /* 0x042007802000ff01 */
19:45pmoreau: /*0010*/ IADD R0, g [0x1f].U8, R0; /* 0x0420078020003e01 */
19:45karolherbst: and how do you get to 0x34?
19:46karolherbst: ohh wait
19:46karolherbst: it might be just indexing things
19:46karolherbst: but I remember nv50 being that strange
19:48pmoreau: With the size shift I guess, since 0xb4 will actually be stored as 0x2d if I’m not mistaken. So if you do `& 0x1f` on that and that remultiply by the size (4 bytes), that would get you 0x34.
19:48pmoreau: *and then
20:14pmoreau: Hihihi, and it strikes again for cvt :-D
20:15pmoreau: Everything was going fine, converting those s8 to floats, and then…
20:15pmoreau: cvt rn f32 $r12 s8 u8 s[0x1f]
20:15pmoreau: cvt rn f32 $r13 s8 u16 s[0x0]
20:57karolherbst: yeah well...
21:19pmoreau: karolherbst: Re !10711: did you had time to check that RA patch? And is there anything more you would like to check ask for, or would it be ready for merging? All patches should have at least one review.
21:20karolherbst: will take a look tomorrow
21:20pmoreau: Thank you!
21:23pmoreau: I think I’m starting to get enough patches to start a second series pretty soon with more Nouveau patches. I will try to fix most remaining issues for basic in it + the conversion rework, and then I might switch to the clover side of things with the mem alloc size restrictions, and possibly reworking things like compilation and propagating back errors occurring during grid launch.
21:42raket: karolherbst: GM200: 'vbetool vbestate restore' after unloading nouveau/unbinding ...but there's just a blinking underscore, any ideas? , doing int 13h works fine (320x200x256).
21:43karolherbst: yeah no idea...
21:44raket: so i rather run bash in int 13h? :-)
21:45karolherbst: no idea
21:45raket: ok ty
21:51pmoreau: karolherbst: Do you know what happens if Nouveau tries to emit`mov b16 $r2 $r63` (or whatever the “zero-register” is there) on Fermi+? Does it actually emit that, or does one end up with `mov b16 $r1l $r31h` like I just did on Tesla, I’m guessing due to Tesla having those 16-bit registers? I’m wondering if I should just change 16-bit `loadImm()` for everyone and force it to 32-bit values, or just do it for Tesla.