00:11gfxstrand[d]: Thanks. I saw. I'll take a look tomorrow if I don't get lost in the `.ndv` rabbit hole.
00:12gfxstrand[d]: https://tenor.com/view/alice-in-wonderland-wonderland-alice-curiosity-trouble-gif-7492639
00:40airlied[d]: gfxstrand[d]: I split the spearate ds out into an MR, https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35780 if it helps
00:41gfxstrand[d]: Honestly, at this point I'm okay with picking patches out of your branch in whatever order I get to looking at them and MRing them myself if you don't want to bother.
00:42gfxstrand[d]: I mean, feel free, but I'm looking at stuff in the branch anyway so I can play with and tweak it and actually run tests on my experiments.
00:42airlied[d]: I think that is the only real standalone piece left, everything else is texture fun 🙂
01:22karolherbst[d]: uhhh.. the phi situation is annoying... I think I'll have to figure out what optimization is triggering only on scalar phis, because apparently it's the relevant difference here...
11:06karolherbst[d]: maybe it makes sense to run shader-db regularly to find regressions: https://gist.github.com/karolherbst/b006948fcd944995d1cbbc3a03e65dd7
11:18mohamexiety[d]: KHR_maint9 allows 2d view of 3D to apply to sparse too. there’s something I don’t understand though, so for us, if we’re making a 2D view we set the Z tiling to be 0.. does this break sparse? I am thinking that no it doesn’t because well why would you have 3D tiling for a 2D view but I am genuinely not sure. to be clear the tiling layout doesn’t change it’s just the 3D part gets removed
11:32karolherbst[d]: the 3D image doesn't have z tiling already, no?
11:34mohamexiety[d]: well that's the thing I am not sure about
11:35karolherbst[d]: there is VK_EXT_image_2d_view_of_3d and VK_IMAGE_CREATE_2D_VIEW_COMPATIBLE_BIT_EXT required for the non sparse path. I am not sure what changes with sparse, but that flag causes NIL_IMAGE_USAGE_2D_VIEW_BIT to be set
11:38karolherbst[d]: "Add a property to enable sparse support with VK_EXT_image_2d_view_of_3d" well..
11:38karolherbst[d]: it _sounds_ like it should be fine then
11:39karolherbst[d]: as like the 3D image itself is already only 2D tiled
11:39karolherbst[d]: this 2d view of 3d is such a disaster in GL, because at image creation time you don't know that the application wants to do that and.... 🥲
11:40mohamexiety[d]: from what I can tell it was mainly made for GL emulation
11:41karolherbst[d]: zink always sets that flag
11:41karolherbst[d]: which is one solution to this problem
11:42karolherbst[d]: I think nvidia detiled the image at runtime once it detected this use pattern
11:42karolherbst[d]: the nouveau gl driver just emits an if/else 2d/3d read thing 🙃
11:42karolherbst[d]: anyway, it's a disaster in GL
11:45karolherbst[d]: so the good thing about VK_EXT_image_2d_view_of_3d is, that you know at 3d image creation time that 2D image views will be created, so you just disable any z tiling and are done
11:46mohamexiety[d]: oh huh
11:46mohamexiety[d]: if we know ahead of time then there's no issues nor tiling differences
11:46mohamexiety[d]: I didn't know that, derp
11:46karolherbst[d]: yeah, that's why I pointed that one 😄
11:47karolherbst[d]: so yeah, I think it should just work unless sparse is weird
11:50mohamexiety[d]: yeah that part was unclear, thanks!
11:51karolherbst[d]: I'm curious if the way zink does things here is causing performance problems elsewhere...
11:51karolherbst[d]: because like.. it's gonna get hit with every 3d image created
11:51karolherbst[d]: regardless of 2d image views ever being created
11:52karolherbst[d]: but probably also doesn't matter much
11:52karolherbst[d]: but optimizing for the common case sounds like a real pita here
11:55pac85[d]: I don't think 3d images are used that often by applications are they?
11:56karolherbst[d]: yeah.... not saying it matters in avarage, but applications that do might be hit by a perf penalty, but who knows
11:56karolherbst[d]: I don't even know what besides the CTS is hitting this case...
11:56karolherbst[d]: but 3D images are more common than creating 2D views of them 😄
11:58pac85[d]: Perhaps the 2d view thing could be emulated by carefully sampling things
11:58pac85[d]: Dunno
11:59karolherbst[d]: that's gonna be annoying from a pipeline pov
11:59karolherbst[d]: like the same shader can be used on 2D images
12:00pac85[d]: I see
12:00karolherbst[d]: the main use case here is that applications won't have to write shaders with a uniform depth coord
12:00pac85[d]: Anyeya perhaps some game shipped with that voxel cone GI thing that was popular some years afo
12:01karolherbst[d]: just feels weird to hurt the common path for a niche feature nobody? is even using besides the GL CTS
12:02karolherbst[d]: but for sanity reasons it's the easiest way to make the CTS path
12:49gfxstrand[d]: mohamexiety[d]: Yeah, it breaks sparse, at least for now.
12:50mohamexiety[d]: according to Karol it looks fine though?
12:50mohamexiety[d]: since we dont use z tiling with that bit :thonk:
12:51gfxstrand[d]: But sparse requires Z tiling
12:51gfxstrand[d]: We're just not asserting it right now.
12:51mohamexiety[d]: ugh
12:52mohamexiety[d]: do I just report the property as unsupported or is there anything I could do to work around?
12:52zmike[d]: Just leave it false for now
12:52gfxstrand[d]: Initially, yeah, that's the simple answer.
12:53zmike[d]: Afaik this is just cts
12:53gfxstrand[d]: We might be able to make it work. There's a Z offset thing. I just haven't figured it out yet.
12:53zmike[d]: Nothing uses GL sparse in the wild
12:53zmike[d]: Except for some demo apps I've seen
12:54zmike[d]: If you don't set the property then zink will just explode on the CTS cases the same as it does now because there's no way to detect this support
12:54mohamexiety[d]: ohhh
12:54zmike[d]: And if you do set it then there is no functional difference for how zink will use the images anyway
12:54zmike[d]: It's just a fix for validation
12:55mohamexiety[d]: yeah that makes a lot of sense. thanks!
12:56zmike[d]: Half the stuff you see in glemulation is just cutting convenient holes in validation for stuff drivers already do
12:56zmike[d]: Or don't do, as the case may be
13:41karolherbst[d]: gfxstrand[d]: in the class headers?
13:42karolherbst[d]: ehh.. tex hader more likely
13:42gfxstrand[d]: yeah
13:42karolherbst[d]: `TEXHEAD_BL_GOB_DEPTH_OFFSET` mhh
13:43karolherbst[d]: yeah it's kinda weird it's just for depth as if it's intended to solve this issue...
13:43karolherbst[d]: though just one bit?
13:43karolherbst[d]: ehh 2
13:44gfxstrand[d]: Yeah, so it's restricted to `z_log2 <= 2`, which I think is fine
13:44karolherbst[d]: mhhh
13:49karolherbst[d]: I wonder if I should upstream my idea of `NIR_DEBUG=dce` which runs dce after every pass...
14:29mohamexiety[d]: where is vertex io stuff in our compiler?
14:30mohamexiety[d]: `nak_nir_lower_vtg_io`?
14:38mohamexiety[d]: also bitfield ops, if they're in one place
15:15gfxstrand[d]: Okay, I think I got .ndv sorted. I'm gonna CTS this: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35795
15:15gfxstrand[d]: mohamexiety[d]: Bitfield ops aren't in one place
15:16gfxstrand[d]: mohamexiety[d]: Yes, more or less.
15:20gfxstrand[d]: gfxstrand[d]: And I need to go to the office so I can plug in ALL THE GPUs and make sure I plumbed the right bits through. There's a bunch of ops where we didn't mark where NDV goes and IDK if they don't have it or if we just failed at RE.
15:21gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1388177452211114136/9ypfgu.png?ex=6860087f&is=685eb6ff&hm=cf903c4eb5bd25e4fadbf2d4f22cc6539b7144d2d24e4f706411e06deb34c3a5&
15:22gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1388177568708038758/9ypfj5.png?ex=6860089b&is=685eb71b&hm=2bb3b2ac31b34c4fc48cffce38a9ee2a345167eb562223baf782c386de70ee84&
15:23gfxstrand[d]: I guess I could probably do that with DRM shim....
15:23gfxstrand[d]: 🤔
15:51mohamexiety[d]: gfxstrand[d]: so if I want to add support for `bitfield_extract8/16`, `bitfield_reverse8/16/64`, where should I start? I notice there's a table for `nir_options` in api.rs
15:52mohamexiety[d]: is it as simple as adding them there and then we get them via the common nir impl? or is there more to do in our side of things?
16:30gfxstrand[d]: for bitfield_reverse64, there should be a new nir_lower_int64 flag. We set those in nak/api.rs
16:30gfxstrand[d]: For the 16-bit ones, it's probably lower_bit_sizes and that's controlled in nak_nir.c
16:30mohamexiety[d]: thanks!
16:31mohamexiety[d]: gfxstrand[d]: how do these flags work btw? like are they some common impl that all drivers call into?
16:32gfxstrand[d]: Yeah.
16:33gfxstrand[d]: set `nir_lower_bitfield_extract64 | nir_lower_bitfield_reverse64` and the pass we call in `nak_nir.c` will do the right thing
16:33mohamexiety[d]: there isnt an extract64 interestingly but yeah
16:34gfxstrand[d]: And it's not like 64-bit bitfield reverse is a problem for us. It's just two 32-bit bitfield reverses and flipping which is the top vs. bottom.
16:34gfxstrand[d]: mohamexiety[d]: Yeah but the lowering got added for something else so you might as well turn it on
16:35gfxstrand[d]: 8 and 16-bit are a bit more work because that's a callback not flags
17:11gfxstrand[d]: Okay, let's try Blackwell on vulkan-cts-1.4.3 and see how it goes
17:24gfxstrand[d]: airlied[d]: Do we know what's up with those instruction dependency hacks?
17:25gfxstrand[d]: Oh, right. It's just that they moved stuff around
17:33gfxstrand[d]: Really? `.kind_bpp == 3` for z/s? *sigh*
17:50mohamexiety[d]: gfxstrand[d]: but what to do there? I am a bit confused
17:51mohamexiety[d]: ```c
17:51mohamexiety[d]: if (bit_size >= 32)
17:51mohamexiety[d]: return 0;
17:51mohamexiety[d]: if (bit_size & (8 | 16))
17:51mohamexiety[d]: return 32;
17:51mohamexiety[d]: there's these which I guess I should remove, but.. is that it?
18:03gfxstrand[d]: You probably just need to add cases to the switch, not adjust anything there
18:04gfxstrand[d]: Or maybe the default case is good enough?
18:05mohamexiety[d]: default does nothing, it's just break :thonk:
18:05mohamexiety[d]: but like I am confused, what does adding cases there do/lead to?
18:06mhenning[d]: Adding cases there causes more lowering
18:07mhenning[d]: returning 0 requests no lowering. Returning a different value requests lowering to that bit size
18:08mohamexiety[d]: hm, so in this case I guess we want something like this for reverse8/16?
18:08mohamexiety[d]: ```c
18:08mohamexiety[d]: if ((bit_size == 16 || bit_size == 8) && nak->sm >= 70)
18:08mohamexiety[d]: return 0;
18:08mohamexiety[d]: break;
18:08mohamexiety[d]: given volta+ should have native int8/16
18:09mohamexiety[d]: or do we need to treat them as 32?
18:10mhenning[d]: yeah, if you're adding native support for 8/16-bit reverse then you'd want to do something along those lines
18:11mohamexiety[d]: what else would adding support involve other than this?
18:14mhenning[d]: The new bit sizes need to be handled in nak/from_nir.rs and the any new instruction flags need to be encoded in sm70_encode
18:15mhenning[d]: wait, sorry are you trying to add native versions of those or just get the common lowering?
18:15mhenning[d]: A glance at https://kuterdinel.com/nv_isa/BREV.html suggests we don't have a native version
18:15mohamexiety[d]: oh huh
18:15mohamexiety[d]: interesting, thought we would have that. so I guess the only option is the common lowering then
18:16mohamexiety[d]: (sorry btw, this is like my first time actually delving in compiler stuff beyond simple texturing things)
18:17mohamexiety[d]: for the common lowering I guess we would just add them in there and request 32
18:20mhenning[d]: I think we already use the common lowering for those. `bitfield_reverse` is an alu op so it hits the common code:
18:20mhenning[d]: if (bit_size & (8 | 16))
18:20mhenning[d]: return 32;
18:21mhenning[d]: so we should already lower it fine.
18:25mohamexiety[d]: ahh thought it needed explicit listing in the case. nice
18:34mohamexiety[d]: now the only thing left is this:
18:34mohamexiety[d]: > Add a property to indicate the implementation will return (0,0,0,0) or (0,0,0,1) to vertex shaders that read unassigned attributes.
18:34mohamexiety[d]: where do we do vertex attribute read?
18:35mohamexiety[d]: looking in `nak_nir_lower_vtg_io.c` there's not really much there
18:36mohamexiety[d]: or.. nah it's there, derp
18:39mohamexiety[d]: unassigned attributes is this case, right?
18:39mohamexiety[d]: ```c
18:39mohamexiety[d]: if (vtx == NULL)
18:39mohamexiety[d]: vtx = nir_imm_int(b, 0);
19:01mhenning[d]: Uh, that might mean unbound vertex attribute buffers? in which case it's probably a hardware thing
19:23mohamexiety[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35804 definitely needs a sanity check for the compiler bits but yeah gfxstrand[d]
20:05airlied[d]: gfxstrand[d]: there is a whole new set of magic in the disasm of those fields that we used to put .yld in
20:09mhenning[d]: airlied[d]: when I looked at the way the disassembler handles this, I wasn't sure that the field meaning actually changed - it's possible that nvdisasm is just printing them now where it previously didn't
20:11airlied[d]: indeed yes it prints them for 120 where it hasn't before
20:12gfxstrand[d]: airlied[d]: Does interleaved z24s8 not work? Or did you just not want to bother?
20:13airlied[d]: I went with not bother, it didn't work straight away so I moved on, maybe worth revisiting later if anyone cares
20:13gfxstrand[d]: Okay
20:14airlied[d]: since we have to keep a temp stencil around anyways, it felt nicer to just use the memory for good
20:15gfxstrand[d]: We don't need temp stencil for z24, just z32
20:16airlied[d]: ah indeed, anyways I was just being lazy 🙂
20:23airlied[d]: in theory you should just drop the bit and fixup the kind_bpp to the appropriate value (not sure if that is 32bpp)
20:25karolherbst[d]: mhenning[d]: ever ran unigine heaven on nvc0? Ambient occlusion is broken and I kinda thought it worked in the past.. but it's also broken on 25.0....
20:26mhenning[d]: I can't remember running heaven, no
20:27karolherbst[d]: mhhh.. maybe I should check an older GPU first...
20:28karolherbst[d]: anyway.. I wrote patches for the util_framebuffer_init stuff
20:39karolherbst[d]: okay.. at least it's broken all the way down to kepler
20:39karolherbst[d]: which means it's probably also an issue on fermi 😢
20:39airlied[d]: btw if anyone is interested in an nvc0 hacky workaround https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35221
20:39airlied[d]: we are already shipping it in RHEL 8
20:42karolherbst[d]: I arleady reviewed that one
20:42karolherbst[d]: so just assign to marge or so?
20:43marysaka[d]: mohamexiety[d]: actually the default is 0,0,0,1 no?
20:44marysaka[d]: I think NV blob remove AST with 0,0,0,1 from what I recall
20:47mohamexiety[d]: hm
20:47mohamexiety[d]: I did find upstream tests for this so I'll check it out
20:48marysaka[d]: might be wrong then :aki_thonk:
20:49mohamexiety[d]: no no I didn't test it yet
20:49mohamexiety[d]: I went with 0 0 0 0 as I read something as 0 but might have misunderstood
20:52gfxstrand[d]: mohamexiety[d]: Not really. It needs a reverse and then a prmt because reverse will move the bottom 8 to the top 8
20:52gfxstrand[d]: Which would actually probably be better done in NAK than in NIR
20:52gfxstrand[d]: Because we can just do a `prmt`
20:53marysaka[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1388261011567411251/image.png?ex=68605651&is=685f04d1&hm=049831cd1bd310742a07d7e8f1d8c5bc8142d6387f37832f8ffd8c58d8ed081f&
20:53marysaka[d]: mohamexiety[d]:
20:53gfxstrand[d]: mohamexiety[d]: Yeah, that's a HW thing. I'm not actually sure what nvidia does
20:53marysaka[d]: only store is o_color.z = 0 here
20:54gfxstrand[d]: Oh, wait... It's the same as for null descriptors or OOB
20:54gfxstrand[d]: So I think 0, 0, 0, 0
20:55gfxstrand[d]: There really need to be tests for this
20:55mohamexiety[d]: there is, just not in a release yet
20:55mohamexiety[d]: will pull upstream cts tomorrow and try
20:56gfxstrand[d]: You may need to pull CTS from garrit
20:56mohamexiety[d]: > dEQP-VK.pipeline.*.vertex_input.misc.unbound_input*
20:56mohamexiety[d]: I dont get why it's not prefixed with `maint9.*` or so but whatever
21:06airlied[d]: karolherbst[d]: wow I missed it, thanks!
21:15gfxstrand[d]: Okay, I think I have the Blackwell copy thing figured out more for real.
21:17airlied[d]: why it needs the bpp?
21:17gfxstrand[d]: More specifically what information it's communicating
21:17gfxstrand[d]: It's not a bpp
21:17gfxstrand[d]: It's what NIL calls `GOBType`
21:17airlied[d]: well it's the layout the 3D engine uses for that format behind the scenes
21:17gfxstrand[d]: Which it can no longer determine from the PTE kind
21:17airlied[d]: just no way to infer it in the copy engine
21:18gfxstrand[d]: Yup
21:18gfxstrand[d]: But now it's all properly modeled in NIL
21:18airlied[d]: nice!
21:18gfxstrand[d]: And the copy code is just a `nil_to_nvc9b6_gob_type()`
21:19gfxstrand[d]: Also my GOB dumper I wrote in crucible can dump anything, including Z/S 8-bit and Z/S GOBs
21:19gfxstrand[d]: So if someone feels like implementing Z/S host image copy, it should be pretty straightforward
21:19gfxstrand[d]: Not that I actually recommend anyone do that...
21:20gfxstrand[d]: But yeah, doing a CTS run now. I'll marge it tonight if CTS is happy
21:20gfxstrand[d]: Then I need to read the actual Z/S patch
21:20gfxstrand[d]: But that's a problem for Monday
21:26gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35807
21:27gfxstrand[d]: I might wait to merge until I've read the actual Z/S patch. Seems like a big "depth/stencil on Blackwell" MR makes more sense than just copies. But whatever...
21:27gfxstrand[d]: I'm shuffling patches
21:28mohamexiety[d]: ahh nice find!
21:36gfxstrand[d]: I think once I've reviewed the Z/S patch and figured out something less hacky for `.yld`, I can do an actual for real conformance run. :transtada128x128:
21:37gfxstrand[d]: I wonder if 32-bit builds on Fedora work again. 🤔
21:38mohamexiety[d]: nicee. that'd be really cool.. pretty quick bringup to conformance tbh given the scope of the changes
21:39airlied[d]: looks like llvm-devel is paralell installable again
21:41airlied[d]: mohamexiety[d]: don't jinx it, the maximal reconvergence fix might take 6 months 😛
21:41mohamexiety[d]: :nervous:
21:41gfxstrand[d]: The maximal reconvergence test doesn't fail for me
21:41mohamexiety[d]:phew
21:41gfxstrand[d]: Maybe airlied[d] just needs a bigger GPU. 😛
21:42mohamexiety[d]: 😝
21:42airlied[d]: the 5080 eats enough power 😛
21:42gfxstrand[d]: But you could be eating 2x that much!
21:42gfxstrand[d]: *nom nom*
21:43mohamexiety[d]: it should be winter in australia now, right? it doubles up as heating! :KEKW:
21:43airlied[d]: I'm more worried about leaving overnight CTS runs and ending up with no house 😛
21:43mohamexiety[d]: yeah :bleakekw:
21:43gfxstrand[d]: The Collabora office hasn't burned down yet. <a:shrug_anim:1096500513106841673>
21:44airlied[d]: I just need to get freon suppression for my office
21:45gfxstrand[d]: When I'm in on Monday, I need to re-route some cables in my box, though. They're concerningly close to the fans and that thing really doesn't leave much extra room in the case for, well, anything.
21:46airlied[d]: we also got offered a PCIE server only blackwell, needs external cooling, I could just run that open on the desk 😛
21:46karolherbst[d]: that moment you try to bisect something and you notice it randomly uses your system GL impl for whatever reason 🙃
21:51gfxstrand[d]: Unexpected results:
21:51gfxstrand[d]: dEQP-VK.api.driver_properties.conformance_version,Fail
21:51gfxstrand[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_binding_buffer.multiple.compute_comp_buffers32_sets1,Timeout
21:51gfxstrand[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_binding_buffer.multiple.graphics_comp_buffers32_sets1,Timeout
21:51gfxstrand[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_binding_buffer.multiple.graphics_frag_buffers32_sets1,Timeout
21:51gfxstrand[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_binding_buffer.multiple.graphics_vert_buffers32_sets1,Timeout
21:51gfxstrand[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_residency_buffer.multiple.compute_comp_buffers32_sets1,Timeout
21:51gfxstrand[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_residency_buffer.multiple.graphics_comp_buffers32_sets1,Timeout
21:51gfxstrand[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_residency_buffer.multiple.graphics_frag_buffers32_sets1,Timeout
21:51gfxstrand[d]: dEQP-VK.binding_model.descriptor_buffer.sparse_residency_buffer.multiple.graphics_vert_buffers32_sets1,Timeout
21:51gfxstrand[d]: dEQP-VK.binding_model.descriptor_buffer.traditional_buffer.multiple.compute_comp_buffers32_sets1,Timeout
21:51gfxstrand[d]: dEQP-VK.binding_model.descriptor_buffer.traditional_buffer.multiple.graphics_comp_buffers32_sets1,Timeout
21:51gfxstrand[d]: dEQP-VK.binding_model.descriptor_buffer.traditional_buffer.multiple.graphics_frag_buffers32_sets1,Timeout
21:51gfxstrand[d]: dEQP-VK.binding_model.descriptor_buffer.traditional_buffer.multiple.graphics_vert_buffers32_sets1,Timeout
21:51gfxstrand[d]: dEQP-VK.info.device_extensions,Fail
21:52airlied[d]: nice!
21:53karolherbst[d]: okay! ambient occlusion works on 24.2..
21:55gfxstrand[d]: Also, ugh... the IO offset patch. I was sure I'd already fixed that. :blobcatnotlikethis:
22:00airlied[d]: did I get is_signed backwards there?
22:00airlied[d]: yup I think is_signed should be is_unsigned 🙂
22:13gfxstrand[d]: Oops