01:14 karolherbst[d]: okay.. I indeed found a shader where we don't vectorize as well as we should have:
01:14 karolherbst[d]: div 16 %339 = @load_global_nv (%331, %270, %271 (true)) (base=2, access=readonly|reorderable, align_mul=8, align_offset=2)
01:14 karolherbst[d]: div 16 %353 = @load_global_nv (%331, %270, %271 (true)) (base=0, access=readonly|reorderable, align_mul=8, align_offset=0)
01:14 karolherbst[d]: will try to figure out tomorrow where it comes from
01:15 karolherbst[d]: the concerning part is that in NAK this writes to two 32 bit regs, so this could even help with gpr usage
01:16 karolherbst[d]: (the solution is probably to run `nir_opt_load_store_vectorize` after `nir_opt_offsets`...)
07:17 phomes_[d]: if anyone has time for review then I would like to land 40686
07:24 marysaka[d]: will take a look
09:00 marysaka[d]: gfxstrand[d]: mhenning[d] I was doing some testing on barriers with the blobs and NVIDIA emit things for `VK_QUEUE_FAMILY_FOREIGN_EXT` while we don't.... shouldn't we do something about that?
09:00 marysaka[d]: To be precise they do (in this order):
09:00 marysaka[d]: - if any of the queue idx is `VK_QUEUE_FAMILY_FOREIGN_EXT`, they emit a `SET_REFERENCE` (on Blackwell this is a `NVC86F_WFI`) followed by a `MEMBAR`
09:00 marysaka[d]: - if the source is `VK_QUEUE_FAMILY_FOREIGN_EXT`, they emit a `L2_FLUSH_DIRTY`
09:03 gfxstrand[d]: I think the kernel already does when it switches contexts
09:05 marysaka[d]: hmm if it's done at context switch it could be a problem for TSG support then I guess?
09:06 marysaka[d]: need to grep on the kernel side a bit, I know we do have a MEMBAR on the kernel side for some stuffs but unsure about the L2_FLUSH_DIRTY
09:09 marysaka[d]: right we do have a membar when we emit fences to ensure the value are visible before the interrupt but I don't think it's related anyway
12:33 karolherbst[d]: mhh I found some interesting shader: do a bunch of texture loads, put it all in an array, access it indirectly in a loop...
12:36 karolherbst[d]: anyway, load/store vectorization has "interesting" impact on scratch accesses
15:54 mhenning[d]: marysaka[d]: That sounds plausible to me, although I'm not super familiar with that part of the spec
21:41 _lyude[d]: boy this screen flashing bug is mean ._.
21:41 _lyude[d]: I think i'm more or less back on where I started trying to fix this
21:53 karolherbst[d]: look what I just found: `NV902D_MME_DMA_READ_FIFOED 0x0560` ?!?
21:53 karolherbst[d]: I wonder if that's available pre turing..