16:32 kolera83: the best way is for the alus to do double refinements, that means linked lists in nested lookup tables, all permuted together
16:33 kolera83: for an example: so you put all possible results into one refinement, and subtract that from level two for an example
16:40 kolera83: that is formed in really easy way, it's in a format where 1024 indexes and offsets are linked based of arbitrary kind such as it appears like lookup table inside the lookup table
16:41 kolera83: the compiler for it is very easy, cause the flattened
16:41 kolera83: so to speak alu codegen is just adding magic numbers together in a case switch
16:43 kolera83: but it has such an execution mode, that it executes 1024 alus together
16:43 kolera83: in only 5-6 operations it does 1024 arithmetical or logical ops
16:43 kolera83: any kind
18:50 tonyk: mareko: I tried to read CP_HQD_IB_RPTR, but it's always zero on gfx 10.1. mmCP_HQD_IB_CONTROL is zero as well
18:51 tonyk: but mmCP_SCRATCH_INDEX seems to be related to it, it has always the same value when I print it during a reset, and the value it's an offset that it's really close to the command that hung the GPU
19:30 agd5f: tonyk, HQD registers are instanced. there is a copy per hardware queue slot. You need to use grbm_gfx_cntl to select the instance you want to query
19:32 tonyk: aha, thank you