00:05Nopic: Yes, native gitlab accounts don-t work there. I'll try, thanks!
00:08Nopic: BTW, a GK104 (IIRC) does work on the same kernel, so it's probably not a general issue with the system, kernel or setup.
00:17Nopic: OK, recaptcha is failing just as hopelessly as the kernel, maybe I'll retry another day. :( Thanks anyway!
05:15jimcarrey: I never said that you are stupid, i said you are making a mistake thinking that i do not know in life
05:15jimcarrey: i doubt that someone getting as far as berkeley university can be stupid, what i say that i am also thoroughly experienced
05:16HdkR: Ace Ventura, Pet Detective.
05:27simanogray: HdkR: i am pretty clear guy in the head, tough as hell, smart as heck, maybe not as fully motivated ever since you cracked up pmy systems, and implanted a chip on me, but you ain't gonna harm me either, my support is big
05:28HdkR: What is it, the Got Talent series? I don't watch much western content.
10:58karolherbst[d]: mhenning[d]: any chance you'll find time to review https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37998 again? I think the version is good enough for the time being and I'll try to come up with more nicer code by trying to figure out how I can use MOVM for transposing matrices, which should allow me to drop a lot of the col/row major handling code...
16:52mhenning[d]: Sure, I'll take another look
17:11karolherbst[d]: mhh I think I'm running into an issue where the base alignment of a buffer is incorrect...
17:12karolherbst[d]: *base address isn't properly aligned
17:12karolherbst[d]: only hit it in a deqp-runner run...
17:13mhenning[d]: for ssbos? I think we require them to be 128-bit aligned
17:13karolherbst[d]: yeah...
17:13karolherbst[d]: 32x2 load or stores are faulting
17:14karolherbst[d]: the offset is clearly at least 0x8 aligned
17:14karolherbst[d]: (the shader shifts by a lot)
17:14mhenning[d]: I thought misaligned loads just loaded from a lower address rather than faulting, but I'm not totally sure
17:16karolherbst[d]: mhhh
17:16karolherbst[d]: `nouveau 0000:52:00.0: gsp: Xid:13 Graphics SM Warp Exception on (GPC 0, TPC 0, SM 0): Misaligned Address`
17:20karolherbst[d]: parts of the shader: https://gist.github.com/karolherbst/d201cbef8446796bf0b161133ee2bb81
17:20karolherbst[d]: the two load and stores at the end are faulting if vectorized
17:20karolherbst[d]: probably
17:20karolherbst[d]: only difference in the shader
17:20karolherbst[d]: I've added comments for the alignment of the values (if I haven't messed up)
17:21karolherbst[d]: so ignoring the base address it _looks_ they are properly aligned
17:21karolherbst[d]: and matrix load/mat require row/col or 16b alignment minimum
17:21karolherbst[d]: whatever is lower
17:22karolherbst[d]: this is a moment where I want to have printf support π
17:27karolherbst[d]: Anyway, not really sure where to dig to figure out what's what here, and if we might be missing something somewhere
17:29karolherbst[d]: there is a bit of indirection on the ssbo, so not quite sure
17:30mhenning[d]: are the registers properly aligned in the final assembly?
17:34karolherbst[d]: That would be a different error, no?
17:34mhenning[d]: I don't know
17:34karolherbst[d]: but anyway, they are
17:35karolherbst[d]: could also be a subtle CTS bug
17:36karolherbst[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37998/diffs?commit_id=dc87e61798c3151dbea598d4518860ac41b2fd6c has the details on the alignment requirements here
17:36karolherbst[d]: but also running the crashing test alone works just fine
17:37karolherbst[d]: so I'm pretty confident it's a messed up base pointer
17:37mhenning[d]: maybe try with NVK_DEBUG=trash_memory ?
17:37mhenning[d]: In case it's assuming zeroed memory
17:38karolherbst[d]: no difference in behavior
17:39karolherbst[d]: maybe there is a property I have to update...
17:39karolherbst[d]: but the validation string doesn't really seem to care about that..
17:40karolherbst[d]: maybe I should debug the test and see what's happening there
17:41mhenning[d]: Could also try with validation on
17:41karolherbst[d]: no error
17:41karolherbst[d]: but also not sure if it validates base ptr alignments of matrix load/store instructions...
17:42karolherbst[d]: maybe I also misunderstood the alignment guarantees, but it sounds like that merging two 32 bit loads should always work out (if the offset is properly aligned)
17:43karolherbst[d]: anyway.. didn't anybody wanted to wire up printf? Otherwise I might just do it for this one π
17:44mhenning[d]: iirc snowycoder[d] had printf working at some point
17:45snowycoder[d]: mhenning[d]: Oh, hello, I do have it working, even if it's a bit hacky (the print buffer offset is passed to NAK through fragment shader structs)
17:46karolherbst[d]: mhh does that work in compute shaders? π
17:47snowycoder[d]: Probably?
17:47snowycoder[d]: I mean, as long as it populates `nak_fs_key`
17:49snowycoder[d]: karolherbst[d]: If you want to take a look: https://gitlab.freedesktop.org/SnowyCoder/mesa/-/commit/158ab2d085b465fcf11bfc29b68dc505ad6068dd
17:49snowycoder[d]: (I was in the middle of rebasing so I may have introduced some bugs)
17:52karolherbst[d]: re: `Should we expose printf to SPIR-V???` it's part of the opencl extended instruction set already
17:52karolherbst[d]: `vtn_opencl.c` handles it
17:54karolherbst[d]: which is also the main reason why all the printf code exists in the first place π
17:59snowycoder[d]: OpenCL can only be used in NVK through ZINK, right? I thought ZINK handled the printf lowering and BO alloc-dealloc
18:00snowycoder[d]: (I know next to nothing about how OpenCL interacts with NVK, sorry)
18:16karolherbst[d]: yeah, but in theory nothing prevents a vulkan driver to support the extended instruction set
18:22snowycoder[d]: True, I wanted to avoid allocating memory when the print isn't used, although I guess it's not that slow
18:25karolherbst[d]: well it's just for debugging, so whatever
19:57karolherbst[d]: snowycoder[d]: well compute shaders don't set the fs_key thing ever, and I doubt other stages do so either
19:58karolherbst[d]: probably want to make the offset part of `nak_compiler`?
19:59karolherbst[d]: mhhh
20:00karolherbst[d]: I'll figure something out tomorrow
20:02karolherbst[d]: there is a compute specific root buffer thing somewhere
20:02karolherbst[d]: *root desc
20:03snowycoder[d]: The buffer is global to the device so we could also hardwire the buffer somewhere, other drivers do that but nvkmd doesn't allow it yet.
20:03karolherbst[d]: yeah.. but it's a bit weird
20:03karolherbst[d]: like the issue is that it's not designed to be used cross-stage afaik
20:03karolherbst[d]: but again, it's just for debugging, so might not matter
20:13snowycoder[d]: If we refactor the header files a little bit we might be able to find the offset directly from nak, avoiding all the "pass the offset as an option" weirdness
20:14karolherbst[d]: or just lower nir_intrinsic_load_printf_buffer_address to a constant
20:15snowycoder[d]: Yeah, but that would mean that the buffer address has a fixed virtual address
20:15karolherbst[d]: yeah...
20:15karolherbst[d]: and?
20:15karolherbst[d]: ohh waait
20:15karolherbst[d]: shader cache...
20:16snowycoder[d]: Yep, it needs a bit of tweaking in nvkmd to allow the same address every time
20:16karolherbst[d]: you probably want to have per stage addresses anywaay
20:17snowycoder[d]: Why? it always uses atomic operations and it all ends up in the same place anyways
20:18karolherbst[d]: ahh compute only has nvk_descriptor_state
20:18karolherbst[d]: snowycoder[d]: strings are referenced by id
20:18karolherbst[d]: or rather format table entries
20:18karolherbst[d]: they aren't part of the buffer
20:19karolherbst[d]: so if you use the same buffer across different shaders you'll run into id conflicts
20:19karolherbst[d]: ehh.. compute uses `nvk_root_descriptor_table`
20:21karolherbst[d]: ohh.. maybe I just do offset of...
20:21karolherbst[d]: that should work
20:23karolherbst[d]: mhhh
20:23karolherbst[d]: but I need to pass it in π₯²
20:24snowycoder[d]: karolherbst[d]: We (and most other recent drivers) actually use hashes instead of ids, those don't have conflicts but don't work with caches (only useful for internal debugging?).
20:24snowycoder[d]: Also, all the other drivers seem to have a per-device printf buffer
20:25mhenning[d]: Maybe I'm missing something but shouldn't it just be a pointer in the root descriptor?
20:26snowycoder[d]: karolherbst[d]: We should refactor `nvk_root_descriptor_table` in a common header so we can share it with nak
20:26karolherbst[d]: snowycoder[d]: ahh
20:26karolherbst[d]: mhenning[d]: yeah but you also have to know where to pull from the root desc
20:26mhenning[d]: There's no need for printf_buffer_offset just load it like all the root desc stuff
20:27karolherbst[d]: yeah, but how does nak know of the offset
20:27mhenning[d]: the same way that nvk_nir_lower_descriptors does it
20:27karolherbst[d]: ohh...
20:27karolherbst[d]: good idea
20:27mhenning[d]: snowycoder[d]: No, that's deliberately outside of nak
20:27mhenning[d]: stuff related to the binding model goes in nvk, not nak
20:28mhenning[d]: so this might need to be an nvk lowering rather than a nak lowering
20:28snowycoder[d]: If we lower it in nvk we aren't able to use `printf` in the nak lowerings
20:28karolherbst[d]: should just use `lower_sysval_to_root_table` instead
20:29mhenning[d]: snowycoder[d]: oh, true. hmm
20:29karolherbst[d]: ohh nvk_nir_lower_descriptors is so early?
20:29karolherbst[d]: mhh yeah.. it is
20:30mhenning[d]: iirc it's in the middle of nak stuff. So between preprocess and postprocess
20:30karolherbst[d]: though it is late enough for my use case π
20:30karolherbst[d]: but yeah...
20:30karolherbst[d]: kinda want the printf stuff to be lowered as late sa possible
20:31snowycoder[d]: I lowered printf just vefore `lower_cf`, it was useful to debug interlock lowering (that's in nak right now)
20:31karolherbst[d]: yeah...
20:31karolherbst[d]: that's kinda where we want it to be
20:31snowycoder[d]: We might miss some optimizations, but that doesn't really matter, it's just for debug
20:34snowycoder[d]: Also, we're already passing offset to nak with `sample_locations_offset` and we'll be doing that again with interlock.
20:35mhenning[d]: Okay, yeah I see why you did that
20:37karolherbst[d]: anyway, let's see if this works...
20:38karolherbst[d]: mhhh
20:39karolherbst[d]: https://gist.githubusercontent.com/karolherbst/a3459c04e3deb0261e6123daf8ebd53c/raw/c66b0f820dc1b3d9b8afec1bc5310422ba20c66e/gistfile1.txt
20:39karolherbst[d]: seems to work
20:40karolherbst[d]: snowycoder[d]: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/df38f9785155f92ad64c9577c8d0190d7d05d6a0
20:40karolherbst[d]: or something like that
20:42snowycoder[d]: Do we want to create a struct for generic (non-fs) offsets to pass around?
20:43snowycoder[d]: I think that the solution is either that or sharing the header.
20:43karolherbst[d]: okay...
20:43karolherbst[d]: soooooo
20:43karolherbst[d]: I have a problem
20:43karolherbst[d]: the shader that's causing issues is like nuking the context and the code doesn't print the printf buffer in that case π
20:44mhenning[d]: can you print and then skip the problematic store
20:44karolherbst[d]: I think we should read it out also on context losses
20:44karolherbst[d]: because.. it's atomic stuff
20:44karolherbst[d]: it's probably fine
20:44karolherbst[d]: maybe
20:45mhenning[d]: snowycoder[d]: possibly yeah. I think it shouldn't go in the shader key since it doesn't really depend on that though.
20:46mhenning[d]: karolherbst[d]: are our memory mappings even guaranteed to be valid in that case?
20:46karolherbst[d]: it's host memory
20:46karolherbst[d]: I think...
20:46karolherbst[d]: or normally you do allocate it on the host
20:46karolherbst[d]: yeah...
20:46karolherbst[d]: `NVKMD_MEM_GART | NVKMD_MEM_COHERENT`
20:47mhenning[d]: but does the kernel tear that down on context loss? or does userspace have to dealloc still?
20:47mhenning[d]: not sure how the error handling works tbh
20:47karolherbst[d]: well.. it's still mapped
20:47karolherbst[d]: let me check π
20:49karolherbst[d]: uhhhh...
20:49karolherbst[d]: why is deqp
20:52karolherbst[d]: now...
20:52karolherbst[d]: if I'd know how to print the buffer..
20:54karolherbst[d]: ohh it's part of vk_check_printf_status
20:54karolherbst[d]: mhhhh
20:55karolherbst[d]: it's empty, so I guess it's not the store that's problematic but the loads
20:55karolherbst[d]: wait.. I print on loads...
20:55karolherbst[d]: maybe it's not flushed out yet π₯²
20:55karolherbst[d]: maybe I just noop the load/stores ...
20:57karolherbst[d]: are you kidding me π
20:59karolherbst[d]: the addresses are all aligned
21:00karolherbst[d]: ohh mhh, I should print the addresses of the stores, maybe those are wrong?
21:01snowycoder[d]: I just print everything, that's why I allocated 40KiB of bufferπ
21:01karolherbst[d]: I allocate 1MiB in rusticl π
21:01karolherbst[d]: you never know
21:01karolherbst[d]: like
21:01karolherbst[d]: each thread prints something
21:01karolherbst[d]: that's a lot of stuff to print
21:13karolherbst[d]: man....
21:13karolherbst[d]: what's going on here π
21:22karolherbst[d]: mhhhh
21:22airlied: pretty sure most of my alignment problems have been trying to load a vector from a non-vector aligned address
21:23karolherbst[d]: yeah, but like
21:23karolherbst[d]: it's all aligned on paper
21:24karolherbst[d]: like vulkan/spir-v even guarantees it
21:25karolherbst[d]: and the addresses I'm seeing being used are also aligned...
21:33karolherbst[d]: AHHHHHH
21:42karolherbst[d]: I even added in shader detection for unaligned load/stores and... it still faults...
21:52sonicadvance1[d]: Sounds like when I get SIGBUS on aligned addresses :headempty:
21:52karolherbst[d]: something weird is going on and I have no idea what π
21:53karolherbst[d]: https://gist.githubusercontent.com/karolherbst/09b4facadabe0bda418da986b161baf0/raw/fff41eee4199ae2221ca965ed133d103c728f104/gistfile1.txt
21:53karolherbst[d]: like wtf...
21:53karolherbst[d]: it still faults
21:55karolherbst[d]: and this one doesn't https://gist.githubusercontent.com/karolherbst/4e4283d4a286d72c91ac7290716fbc96/raw/02e29e4f0924fd5fb80099dedb76915f0a450e6d/gistfile1.txt
21:56karolherbst[d]: uhhh.. what's going on
21:56karolherbst[d]: I'm sure it's something super silly somewhere
23:51karolherbst[d]: mhhhhhh
23:51karolherbst[d]: It makes no sense...
23:55airlied: you sure those 32x2 aren't ending up on 32-bit alignment instead of 64-bit?
23:56airlied: like I see align_mul = 16 and 8, but maybe the base addr or offset? not sure just guessing
23:57airlied: esp the the load global with lea address calc
23:57karolherbst[d]: yes
23:57karolherbst[d]: 100% sure
23:58karolherbst[d]: I dump the values
23:58karolherbst[d]: like
23:58karolherbst[d]: printf like
23:58karolherbst[d]: and each thread has 0x8 aligned addresses
23:58karolherbst[d]: something really really odd is going on