01:42 esdrastarsis[d]: chikuwad[d]: Rebase and fix `VK_EXT_device_address_binding_report` and `VK_NV_shader_atomic_float16_vector` :happy_gears:
04:06 karolherbst[d]: `@P0 SULD.P.1D.EF.INVALID0.IGN P0, R0, [R0], UR95, UR31 ;` mhh...
04:09 karolherbst[d]: okay, so UGPRs are 8 bits on SM100+, but they are for sure only 6 bits previously, because some instructions use the two bits for other things...
04:10 karolherbst[d]: now that makes everything kinda ugly 🙂
04:12 mhenning[d]: yeah, they extended the fields even though there aren't enough ugprs to use all the new bits
04:18 karolherbst[d]: well even if they had one ugpr more it does cause conflict in the encoding
04:18 karolherbst[d]: the madness is pretty visible with SULD/SUST and SUATOM
04:19 karolherbst[d]: 40..46 is the UGPR bindless handle, 46..54 is an immediate. They kept the 40..48 in SM100 but ditched the immediate. in SM120 another UGPR emerged at 48..54
04:21 karolherbst[d]: I think the bindless handle is still the UGPR at 40 and the immediate got reintroduced as the new UGPR source at 48..
04:21 karolherbst[d]: don't have hardware to test it...
04:21 karolherbst[d]: so I guess I'll need to update my MR and ask somebody to runt he CTS on it
04:21 karolherbst[d]: I know that fossils for sure end up using the UGPR bindless handle at leas
04:21 karolherbst[d]: t
04:34 mhenning[d]: I have an sm120 plugged in. I can give it a go
04:37 mhenning[d]: I assume it's https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384 ?
04:38 mhenning[d]: That branch falls over right away under cts here
04:38 mhenning[d]: ERROR - Test dEQP-VK.pipeline.monolithic.sampler.border_swizzle.r32g32_uint.rgia.transparent_black.gather_0.with_swizzle_hint: Fail: See "results/2026.02.08.nak_all.0/c11.r1.log"
04:39 mhenning[d]: along with
04:39 mhenning[d]: ERROR - dEQP error: thread '<unnamed>' (82750) panicked at ../src/nouveau/rust/bitview/lib.rs:313:1:
04:39 mhenning[d]: ERROR - dEQP error: assertion failed: (val & u64_mask_for_bits(bits)) == val
04:40 karolherbst[d]: yeah...
04:40 karolherbst[d]: that's why I'm looking into it...
04:40 karolherbst[d]: ran the nvdisasm things
04:41 karolherbst[d]: but RZ is R255 on SM100+ 🙂
04:41 karolherbst[d]: uhm.. URZ is UR255
04:41 karolherbst[d]: so hence the assert
04:46 karolherbst[d]: mhenning[d]: should be fixed now. At least nvdisasm_tests doesn't scream at me anymore
04:46 karolherbst[d]: _but_ I'm fairly certain that the UGPR source at 48 is something else, because it also exists in the non uniform encoding...
04:47 karolherbst[d]: or does it...
04:47 karolherbst[d]: uhm...
04:47 karolherbst[d]: there was another encoding
04:47 karolherbst[d]: 79b and 79d mhh let's see..
04:49 karolherbst[d]: looks like it's gone with SM100
04:52 mhenning[d]: oh, I have a table of the encoding numbers if it helps
04:53 karolherbst[d]: probably yes
04:53 mhenning[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1470281449574760682/table.tar.xz?ex=698ab9ce&is=6989684e&hm=f1eac0e79b8a7734fd760b6bdfe59f8c13ac7af75f140023b6a67c1458b88f48&
04:53 mhenning[d]: I keep meaning to put it on the web somewhere and keep putting it off
04:54 karolherbst[d]: well
04:54 karolherbst[d]: 0x79b does exist on ampere
04:54 karolherbst[d]: despite your table saying no
04:54 karolherbst[d]: or is it hw verified?
04:54 mhenning[d]: It's just based on nvdisasm error codes
04:54 karolherbst[d]: `0000079b 00000000 00000000 0001c200` disassembles to `@P0 SUST.P.1D.EF.INVALID0.IGN [R0], R0, 0x0, 0x0, 0x0 ;`
04:54 karolherbst[d]: on SM86
04:55 karolherbst[d]: the issue is
04:55 karolherbst[d]: sometimes it matters what the sched stuff is
04:55 karolherbst[d]: `0000079b 00000000 00000000 00000000` disassembles to nothing
04:56 mhenning[d]: Bit 90 and 91 sometimes matter, and those are labeled in the table
04:56 karolherbst[d]: yeah.. 91 is the "UGPR" flag
04:56 mhenning[d]: I haven't seen other bits matter like that before
04:56 mhenning[d]: karolherbst[d]: more or less, yeah
04:56 karolherbst[d]: but yeah.. some opcodes have restrictions on the sched part
04:57 karolherbst[d]: and nvdisasm just fails to disassemble if it's incorrect
04:57 karolherbst[d]: no sure what exactly is doing that, but I usually see it with memory ops
04:57 mhenning[d]: hmm not sure
04:58 karolherbst[d]: aanyway.. `0000079b 00000000 00000000 0001c200` also doesn't disassemble on SM100, so... no idea 🙂
04:58 mhenning[d]: New branch also falls over with
04:58 mhenning[d]: ERROR - Test dEQP-VK.compute.shader_object_spirv.workgroup_memory_explicit_layout.alias.u16_array_to_i64_default_func_read_barrier: Crash: See "results/2026.02.08.nak_all.1/c2.r2.log"
04:58 mhenning[d]: ERROR - dEQP error: MESA: warning: ../src/nouveau/vulkan/nvkmd/nouveau/nvkmd_nouveau_ctx.c:153: DRM_NOUVEAU_EXEC failed: No such device (VK_ERROR_DEVICE_LOST)
04:59 karolherbst[d]: mhhh
04:59 karolherbst[d]: yeah, will have to run the CTS in full on my GPU again after I made a bunch of changes
05:03 karolherbst[d]: anyway.. that's a problem for tomorrow me
05:06 meohsaykenyaok: we end up banning all the western hardware’s anyways, I all the equipment’s I have and start a real blood conflict with you, for the context to anyone irrelevant. no one fancies red hat monster oldfarts to ruin your vacations by persistent comments how they will knock you out, nor their tattood anal aces from takevor events operating around our investments like hotels or other
05:06 meohsaykenyaok: properties, or the lasts steroid munched anal buffolos to sit daily in your phones with interpol, none fancied your humiliations no one liked your military attack’s or forgery of documents, this all is going to be straightened out inside the real conflict resolution where all such trash will be continuously handled.
05:16 karolherbst[d]: ohhhhhh... uhh
05:19 karolherbst[d]: okay.. it's not failing because of the UGPR sust thing 😄
05:19 karolherbst[d]: mhenning[d]: is it better on cd597d9d797bbf99b150e88f2e97614d0823df1b?
05:21 karolherbst[d]: also.. why is this shader using 24 gprs and not like ... 16
05:36 karolherbst[d]: ahh yeah uhm... not using `NVK_DEBUG=trash_memory` masked the issue for me 🥲
05:44 karolherbst[d]: okay.. something up with suld, but sust with UGPR seems to work alright
05:49 karolherbst[d]: I wonder if something silly is going on like the handle is 32 bit, but the register needs to be 64 bit aligned...
05:59 karolherbst[d]: I don't belive this...
05:59 karolherbst[d]: it's really thaat
06:00 karolherbst[d]: ....
06:00 karolherbst[d]: yikes
06:05 karolherbst[d]: seems like the upper bits can be undef..
06:06 karolherbst[d]: but like RA sucks..
06:07 karolherbst[d]: I know that the UGPRs of tex instructions are 64 bits..
06:07 karolherbst[d]: and they take the sampler ptr out of the high bits
06:07 karolherbst[d]: I wonder if that's somehow related
06:09 karolherbst[d]: anyway, mystery solved or something...
08:16 chikuwad[d]: esdrastarsis[d]: yes I am aware of my unmerged MRs
11:39 mohamexiety[d]: karolherbst[d]: did you do fp8 support as part of your compute improvements? I dont remember
13:24 karolherbst[d]: nope
13:29 karolherbst[d]: mohamexiety[d]: do we need it for anything serious tho? 😄
13:30 karolherbst[d]: not sure anything in the ISA supports it besides `F2FP`
13:31 mohamexiety[d]: ada and blackwell should have wide support. anyways was thinking about what could make the newer dlss versions crash/get a black screen and one of the things is that the newer presets use fp8 now on ada and blackwell
13:32 mohamexiety[d]: but after thinking about it more it's probably not that since we just run the cuda binaries as is
13:32 karolherbst[d]: mohamexiety[d]: yeah.. but I think nvidia just upconverts it in the shader
13:32 karolherbst[d]: crashes are probably some weird runtime bits
13:35 karolherbst[d]: okay it's tomorrow me and how to I deal with the UGPR alignment stuff without making shader stats worse lol
13:48 karolherbst[d]: okay doing an undef ain't the solution either, because allocations of vectors is quite sup-optimal 😢
13:48 karolherbst[d]: maybe I leave out surface instructions for now..
17:49 mhenning[d]: karolherbst[d]: Yes, that one passes at least the first two minutes
17:49 karolherbst[d]: mhenning[d]: yeah but there are more issues...
17:49 karolherbst[d]: see my messages following
17:50 karolherbst[d]: I've CTSed the MR as it is and there are a handful of regressions but nothing substantial
17:50 karolherbst[d]: also
17:50 karolherbst[d]: I've disabled UGPR for surface ops because it's a massive PITA
17:50 karolherbst[d]: though I do wonder if it's less broken on Blackwell...
17:51 karolherbst[d]: it would be kinda funny if Hopper/Blackwell doesn't use 64 bit aligned UGPRs for the bindless handle...
17:52 karolherbst[d]: I tried to legalize the 32 bit handle to 64 bit with undef and such, but the shaders just became way worse than just doing ur2r
17:52 mhenning[d]: yeah, for something like that we probably want to add a new type of register constraint in RA
17:52 karolherbst[d]: yeah..
17:53 karolherbst[d]: but that's kinda out of scope of the MR, so I just legalized it to GPR anyway and left TODOs
17:53 karolherbst[d]: but the encoding seems to be doing what I expected it to do
17:53 mhenning[d]: sure, makes sense
17:54 karolherbst[d]: the other issue with my MR was that I got a UGPR shared handle with a stride applied and legalized `[ur4.x8+urz]` to `[rz.x8 + ur4]` so that's also extra fun
17:56 karolherbst[d]: legalizing to GPR leads to better shader than keeping the UGPR, but I suspect we want to move the decision of whether something is an UGPR or GPR way erlier, because dealing with it that late ain't fun...
17:58 karolherbst[d]: but also not quite sure why that value didn't got promoted to the uniform_addr slot on the nir side already...
17:58 karolherbst[d]: ohh wiat, because I removed it from my MR.. right...