22:26 Triang3l: Hi frogs! gerddie: agd5f: I'm having some weird issues with hand-written Evergreen shader programs, I wonder if you could have some ideas of what I may be doing wrong
22:27 Triang3l: (aside from the fact that I'm messing with that being horribly wrong probably :D)
22:28 Triang3l: But the issue is that I can't really do anything on the ALU that's more complex than single GPR/GPR or GPR/inline op2 without hanging the GPU (Barts)
22:30 Triang3l: Overall I'm trying to unpack two int16s or uint16s from vertex indices and convert them to float, to use them as vertex positions
22:30 Triang3l: from 32-bit vertex indices
22:31 Triang3l: but even unpacking is a challenge in itself
22:31 Triang3l: What I'm trying to do at first is:
22:32 Triang3l: 0: ALU @ 3, slot upper bound 2 - 1, barrier
22:33 Triang3l: 1: Export_done R0.0001 to position 60, barrier
22:33 Triang3l: 2: Export_done R0
22:34 Triang3l: * 2: Export_done R0.0000 to parameter 0, end
22:42 Triang3l: 3: (Last) R0.y (write) = AND_INT R0.x, literal.x
22:42 Triang3l: 4: Literal: x = 0xFFFF, y = frog
22:43 Triang3l: And that's all of the shader, but it still hangs…
22:44 Triang3l: If I remove the literal and use just GPRs, or GPR as src0 and inline constant 0 as src1, it works (or at least doesn't hang)
22:45 Triang3l: But with a 2-component literal (src1 component X) or a 4-component one (src1 component W), oof
22:48 Triang3l: The literal is included in the ALU clause slot count minus one, and is placed after the last instruction in the group where it's used, the group also contains just 1 instruction
22:49 Triang3l: so probably that's the simplest literal usage possible, but it still doesn't work for some reason
22:50 Triang3l: And looks like no read port or load cycle issues there
22:50 Triang3l: with just one instruction in a group especially
22:52 Triang3l: If I understand correctly, a srcN in different instructions in a group can reference up to 4 constants, here I'm using just one for one src1 (it also worked with inline 0)
22:54 Triang3l: and just one componrnt of one GPR for one src0
22:54 Triang3l: component *
23:04 Triang3l: I also tried unambiguously vector-only op3 R0.y = BFE_INT R0.x, L.x, L.x, getting the same hang
23:06 Triang3l: Another issue is that I can't co-issue a vector instruction and a trans instruction without getting a hang
23:06 Triang3l: Specifically I'm trying to:
23:08 Triang3l: - R127.y (no write) = MOV PV.y, (unused) PV.y
23:09 Triang3l: - (Last, trans-only) R0.x = FLT_TO_INT PV.x, (unused) PV.x
23:09 Triang3l: INT_TO_FLT *
23:10 Triang3l: Again, doesn't look like that's a port/cycle issue
23:11 Triang3l: If I understand correctly, PV and PS usage doesn't count towards read port, GPR load cycle or constant load cycle limits
23:13 Triang3l: But even if those were GPRs, I'm reading different components for each srcN at each cycle
23:14 Triang3l: and trans instructions have somewhat relaxed cycle usage overall
23:15 Triang3l: Yet such simple code still hangs for me, and I have absolutely no idea why
23:17 Triang3l: The full shader I'm trying to execute is https://gitlab.freedesktop.org/Triang3l/mesa/-/blob/Terakan/src/amd/terascale/vulkan/terakan_device.c#L43
23:18 Triang3l: But the only part if the ALU logic that doesn't cause a hang is that lone INT_TO_FLT in the end
23:19 Triang3l: in the ALU logic *
23:19 airlied: you sure it's the shader hanging and not a later stage?
23:23 Triang3l: airlied: I was thinking about the possibility of that, but for now for testing, I'm exporting 0001 rather than XY01 as the position, so there shouldn't be any covered pixels, I think
23:24 Triang3l: With the same 0001 export, whether the hang happens depends on that ALU clause
23:27 Triang3l: The SQ config/context registers are mostly the same as in R600g, with the exception that I'm dividing (number of threads - number of pixel shader threads) by 5 rather than 6 to calculate LS/HS/ES/GS/VS thread counts
23:28 Triang3l: 1 GPR in VS resources, 4 clause temps
23:32 Triang3l: (likely R600g r600_asm.c would have been helpful here, but I haven't started separating it from Gallium itself yet)