11:30snowycoder[d]: Now kepler image read finally always works, but image write fails with `OOR_ADDR` in dmesg
13:18gfxstrand[d]: Still. Sounds like progress.
13:24gfxstrand[d]: OOR_ADDR is a weird one. Does it give any more detail?
13:27snowycoder[d]: Nope,
13:27snowycoder[d]: gr: TRAP ch 3 [007fbb0000 deqp-vk[38446]]
13:27snowycoder[d]: gr: GPC0/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 000e [OOR_ADDR]
13:28snowycoder[d]: And I'm passing directly the image address, so it should be the "base case" of writing to pixel (0, 0, 0)
13:30snowycoder[d]: Oh nevermind, I hardcoded the out_of_bounds predicate to true -.-.
13:30snowycoder[d]: It's always the stupidest mistakes
14:17snowycoder[d]: gfxstrand[d]: Can I query the n. of bytes loaded/store for an image access? my current code works only if I know at compile-time the n. of bytes I load/store, but I kind of worry that we don't have that info with bindless images.
14:17snowycoder[d]: src_type and dest_type almost do that, but for u8 access they always remain u32
14:25snowycoder[d]: The `image_deref_load` intrinsic does not have any info that differentiates a u32 load from a u8 load (destination and dest_type is the same and format is empty).
14:56gfxstrand[d]: Does the suldga/sustga do format conversion?
14:58snowycoder[d]: gfxstrand[d]: sustga can, but suldga cannot.
14:58snowycoder[d]: I can use sustga.p (formatted) with suldga.b (unformatted) but it fails when there are multiple components
14:58snowycoder[d]: Do we need to convert formats at runtime?
14:59gfxstrand[d]: Okay, so for loads we should already be getting format conversion. I did that a couple months ago.
14:59gfxstrand[d]: And the only thing we see in the shader is UINT formats.
15:00gfxstrand[d]: For stores, I'm not sure. `sustga.p` should be able to handle however many components we throw at it. But it probably also expects us to pass the format in somehow. Are we doing that?
15:01snowycoder[d]: Yep, it gets the formats as a source (taken from the descriptor)
15:01snowycoder[d]: gfxstrand[d]: You mean the `load_raw_nv` intrinsic?
15:02gfxstrand[d]: Yes? I don't remember what I did but that sounds like something I would have typed. 😂
15:03snowycoder[d]: Ok, I need to explore that part a bit more, I've just been lowering `image_deref_load` with `format=none` and I have seen no `load_raw_nv`
15:04gfxstrand[d]: Oh, I may have put the lowering in the wrong spot.
15:04gfxstrand[d]: I did that before I understood all the address stuff
15:05gfxstrand[d]: snowycoder[d]: Back to your original question, for loads you can get it from the format attached to the load instruction. For stores, we don't have that information at compile time.
15:06gfxstrand[d]: We can tuck it in the descriptor, though.
15:09snowycoder[d]: I always see `format=none` in all `image_deref_load`, having it at runtime won't help (it's a instruction modifier, we would need to emit all possible instructions and jump to the correct instruction)
15:13snowycoder[d]: Unless we always round the address, load 128bits and then shuffle things around, but I don't see codegen doing any of that
15:20snowycoder[d]: ohhh, format only gets populated after `nvk_nir_lower_descriptors` that translates the `deref_load` into `bindless_load`.
15:20snowycoder[d]: I think we can just postpone `lower_image_addr` and everything should work fine
15:39tiredchiku[d]: https://www.phoronix.com/news/NVIDIA-CUDA-Upgrade-Post-Volta
15:39gfxstrand[d]: snowycoder[d]: Right... In that case, we can chase it out of the variable. `nir_deref_get_variable()` should always work on image derefs.
16:06gfxstrand[d]: But it may still be PIPE_FORMAT_NONE for stores.
16:47snowycoder[d]: gfxstrand[d]: That's ok, if I postpone `lower_image_address` after `lower_descriptors` everything should work, we don't need format for stores and image load conversion is already done by you (thanks)!
16:49gfxstrand[d]: But don't wet need the descriptor to fetch the extra data for address lowering from?
16:56snowycoder[d]: huh, right.
16:56snowycoder[d]: so we either lower format before descriptor (splitting `nir_lower_tex`), or we make `lower_formatted_image_load` work with `suldga` and drag format down the chain a bit.
18:24gfxstrand[d]: Or we pull the format code into the Kepler lowering and lower everything all at once.
18:48snowycoder[d]: Nice, that's even better