00:06karolherbst[d]: mhhhhhh
00:17karolherbst[d]: it seems like the GPR.32 form doesn't work properly.. maybe it still needs 64 bit alignment..
00:27karolherbst[d]: mhenning[d]: okay.. not sure if it was lcssa that was turning convergent things into divergent ones, but I hit one with nak_lower_cf where a phi turns divergent after it: https://gist.github.com/karolherbst/8f0aa56169434a5c6d57db321d0aaf58 (two shaders one before the lowering and one after)
00:27karolherbst[d]: any suggestions?
00:28karolherbst[d]: I'm checking `nak_block_is_divergent`, but maybe I do it incorrectly?
00:28karolherbst[d]: I check the block of the `store_shared_nv`, but uhm.. the block is already divergent..
00:29karolherbst[d]: I guess it wasn't before lcssa..
00:29karolherbst[d]: ohh yeah..
00:29karolherbst[d]: I think that was the issue
00:31karolherbst[d]: `dEQP-VK.compute.shader_object_binary.cooperative_matrix.khr_a.subgroupscope.mul.sint8_sint8.workgroup.colmajor.linear`
00:34karolherbst[d]: but something is very odd here..
00:34karolherbst[d]: ohh... 🥲
00:34karolherbst[d]: I'm dum dum
00:34karolherbst[d]: I have to add more checks, sigh
00:35karolherbst[d]: mhhh
00:38karolherbst[d]: mhhhhh
00:39karolherbst[d]: div block b2: // preds: b1 b5
00:39karolherbst[d]: con 32 %104 = phi b1: %14 (0x0), b5: %486
00:39karolherbst[d]: div 32 %105 = iadd %40, %104
00:39karolherbst[d]: @store_shared_nv (%119, %105, %120 (0x0)) (base=0, offset_shift_nv=0, access=none, align_mul=1, align_offset=0)
00:39karolherbst[d]: so `%40` is divergent and `%104` is convergent and I fold the `iadd` in
00:39karolherbst[d]: and I wonder if that is a special thing about phis?
00:40karolherbst[d]: `con 32 %486 = iadd %104, %11 (0x1)` is the other phi source
01:18mhenning[d]: karolherbst[d]: nak_block_is_divergent is only designed to work before lower_cf
01:18mhenning[d]: although I also haven't looked at this too closely yet
01:18karolherbst[d]: yeah, I only use on the load/store when it's a global one, because of the vector constraints
01:18karolherbst[d]: but it's a different issue
01:19karolherbst[d]: karolherbst[d]: basically this ^^
01:19karolherbst[d]: the phi turns divergent after lower_cf
01:19karolherbst[d]: maybe I should dig into lower_cf and figure out why that happens so I can predict it better..
01:20mhenning[d]: It's possible that lower_cf is setting the divergence incorrectly
01:20karolherbst[d]: mhhhh
01:20karolherbst[d]:maybe
01:20mhenning[d]: divergence_analysis can only run before lower_cf so we have this weird setup where lower_cf tries to preserve it
01:20karolherbst[d]: it's a value that gets increased each loop iteration
01:21mhenning[d]: okay, I'll need to think about this more tomorrow
01:22karolherbst[d]: yeah.. my gut feeling says that it would be considered divergent based on how NAK treats it.. unless nak converges each iteration?
01:24mhenning[d]: we do converge at the top of each iteration (that is, we reconverge divergent continues) but also nak is unnecessarily strict about what is allowed in a ureg
01:25karolherbst[d]: right
02:25karolherbst[d]: mhhh apparently loads do sign-extend and I wonder if it's worth the effort to _also_ add sign extension support to our loads 😄
02:26karolherbst[d]: there is a `SIGN_EXTEND` index mhhh
02:27karolherbst[d]: but apparently that means something else
02:28karolherbst[d]: we have dest_type mhh
02:53mhenning[d]: oh yeah, that's the difference between loading an s8 and a u8
02:55mhenning[d]: does sign-extension make sense for a u8 load in nir? nir has proper 8-bit types so there's nothing to extend. it only gets put in a 32-bit register once we translate to nak ir
02:56karolherbst[d]: yeah.. I was wondering how we could model it with the intrinsics we have now
02:56karolherbst[d]: load_global_nv + i2i32 -> load_global_nv(dest_type=s8)?
02:57karolherbst[d]: and then the load is 32 bits
02:57karolherbst[d]: could safe us some prmts
02:58mhenning[d]: or we could try doing that opt in nak. might not be too out of place in opt_prmt
02:58karolherbst[d]: possibly
02:59karolherbst[d]: no idea how often that happens tho and I already have too many MR pending that do show benefits 🙃 Haven't seen the i2i32 pattern yet, or at least I can't remember
13:34karolherbst[d]:fun
13:34karolherbst[d]: MUFU uses different swizzeling 🥲
13:34karolherbst[d]: 0 -> .H0
13:34karolherbst[d]: 1 -> .H1
13:35karolherbst[d]: maybe that's normal for fp16 ops that only operate on scalars
14:43marysaka[d]: zmike[d]: Just rebased and it's passing GLCTS 4.6.8.0
14:44zmike[d]: nice
14:44zmike[d]: ...but also you did actually check to make sure the mesh tests ran, yes?
14:45marysaka[d]: I ran `KHR-Single-GL46.meshShader.*` and the whole set is passing so I guess?
14:46marysaka[d]: will queue a full CTS run now, probably need to setup some DE first on the test bench tho
14:51zmike[d]: sounds good
14:51zmike[d]: if that glob passed then everything is passing
18:54karolherbst[d]: MUFU.F16 is read for a final review, and now we also got proper swizzles for scalar F16 operations that even helped a few shaders https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40392
20:13_lyude[d]: Well that's interesting. airlied[d] looks like the cursor can break on that flashing thing as well: ```
20:13_lyude[d]: [15682.192319] nouveau 0000:c1:00.0: [drm] *ERROR* failed, fb_id=0 handle=2 size=256x256 modifier=0 offset=0 format=34325241
20:16_lyude[d]: I'm getting the feeling it might be race condition related too, I've noticed every now and then when I'm dragging a window with my cursor across to another display it will freeze up, I happened to do that right before I got the flashing again
20:18_lyude[d]: actually, I think I can reproduce this reliably using gtk4-demo's cursor thing
20:23_lyude[d]: or not, hm.
20:24_lyude[d]: nope i got it