02:17mhenning[d]: sigh. if SSARef sometimes contains a Box then it can't be copy, which means Src and Dst can't be copy, which means sprinkling .clone() just about everywhere...
02:37gfxstrand[d]: Ugh...
02:38gfxstrand[d]: *grumble*
02:38gfxstrand[d]: Maybe this is a good excuse to be better about our movement semantics various places?
02:39gfxstrand[d]: But yeah, copies happen probably more often than they should.
02:40karolherbst[d]: yeah.....
02:40karolherbst[d]: in the hmma tests I was seeing a looooot of movs
02:40karolherbst[d]: even in the final binary
02:40karolherbst[d]: I think Dave even has a hacky patch for it
02:42karolherbst[d]: https://gitlab.freedesktop.org/airlied/mesa/-/commit/1ecdee24d628df77af03badc5421ef00fe1f2544 maybe?
02:45airlied[d]: I think that was part of it, even after all my opts the last shader still had a lot of unnecesary movs compared to nvidia
02:49karolherbst[d]: RA is hard (tm)
02:50karolherbst[d]: codegens RA was vec4 based and could deal with vector registers reasonable well, but it also broke in very funny ways due to it
02:50karolherbst[d]: like not being able to RA 20 regs and just fail due to reasons (tm)
02:52karolherbst[d]: well.. gonna fix up ampere first and then hopefully next week will be able to do perf tuning
02:53airlied[d]: I got the coopmat bench from 9TFlops to 19 across my branch on the turing I was using, and nvidia were at 27
02:56airlied[d]: I think proper use of uregs is quite valuable, and getting address calcs sorted
02:56airlied[d]: which also seems like something non-coopmat would benefit from
02:58airlied[d]: https://gitlab.freedesktop.org/airlied/mesa/-/commits/nak-coop-matrix-hacks?ref_type=heads actually had some more horrors
03:16HdkR: uregs are good since it frees up regular register ALUs :)
03:20HdkR: Alternative take, any time you're not using ureg, you're burning your vector ALU for no reason, which could be better served by the "real" work :D
08:24Lynne: the ffv1 decoder in ffmpeg is about 100x slower (5fps vs 500fps) on nvk than the binary blob, just in case anyone's interested
09:46karolherbst[d]: how much do we caare about volta? 🙃 I was considering if I want to test the MMA stuff there, but it only has the old HMMA.884 stuff (and 444 even)
10:03gfxstrand[d]: We don't care about ML on Volta
10:04gfxstrand[d]: We care about glxgears
10:11gfxstrand[d]: And we don't care how fast they spin.
10:19mohamexiety[d]: to be fair doing matmul stuff at locked idle clocks is not exactly going to go well :KEKW:
10:20mohamexiety[d]: and it's not like volta can do DLSS or whatever either, you have to be deliberately using it for AI stuff which.. yeah
10:21marysaka[d]: I mean AGX Xavier exists but we need to figure out Tegra support first anyway
10:23mohamexiety[d]: oh right I keep forgetting Xavier is Volta and not Maxwell
11:15karolherbst[d]: yeah... was mostly curious about the tegra side of things..
11:16karolherbst[d]: but I can always look into it once there is a pressing need or so
11:25karolherbst[d]: on my desktop it says like 3 hours instead of 16...
11:26karolherbst[d]: uhh.. something broken with sparse or do I need a new kernel?
11:26karolherbst[d]: still on.. *checks noes* 6.13.5
13:54jja2000[d]: karolherbst[d]: From what I gathered, it loads, but somewhere programs still report no memory amount.
13:54jja2000[d]: Using the MR Faith and Mary made
13:56jja2000[d]: I tested this a bit ago on TX2
14:37gfxstrand[d]: It should advertise memory unless we broke something.
16:13karolherbst[d]: gfxstrand[d]: do I need to do something to get the `dEQP-VK.sparse_resources.image_sparse_residency.mutable` tests to pass or are they maybe new?
16:13karolherbst[d]: seeing a lot of fails there
16:13gfxstrand[d]: They should pass
16:14gfxstrand[d]: If they don't, there's a regression somewhere
16:14gfxstrand[d]: We passed conformance like a month ago
16:14karolherbst[d]: I kinda don't think it's my MR but who knows
16:14karolherbst[d]: but I can check
16:14karolherbst[d]: if there is a regression I mean
16:18cwabbott: they're new tests
16:20karolherbst[d]: okay 🙂 they also fail on 25.1
16:20karolherbst[d]: sounds like new work for Faith 🙃
16:21karolherbst[d]: in either case, not my MR
16:22cwabbott: I asked for them in order to force qcom to fix their driver
16:22cwabbott: sorry, you're collateral damage
16:22karolherbst[d]: ahhh
16:22karolherbst[d]: don't worry
16:22karolherbst[d]: a bug is a bug
16:23karolherbst[d]: just hope it's not too bad to fix it in NVK
16:23karolherbst[d]: what's the point of those tests tho?
16:23karolherbst[d]: or rather, what are they testing
16:23cwabbott: that you can reinterpret a sparse image as a different format like normal images
16:23cwabbott: the spec doesn't allow the driver to disallow that so it has to be supported
16:24karolherbst[d]: I see
16:34mohamexiety[d]: hm I dont remember why/what made me block this off while doing sparse. will have to look later then :thonk:
16:45gfxstrand[d]: karolherbst[d]: Go ahead and file an issue
16:46gfxstrand[d]: mohamexiety[d]: I don't remember, either. Maybe that's a leftover from before we supported standard block sizes?
17:46gfxstrand[d]: In any case, that shouldn't matter since that's how image views work anyway.
18:11gfxstrand[d]: And the whole reason we do sparse for everything is so that we can make aliasing work.
18:45jja2000[d]: gfxstrand[d]: It does not when using Mesa compiled the MR and using the devenv to open vkcube/glxgears but also vkinfo/glxinfo
18:45jja2000[d]: I think I posted the error here somewhere
18:47jja2000[d]: Ah dammit the long link disappeared
18:49jja2000[d]: Copy pasted from a couple months ago:
18:49jja2000[d]: You: vkcube exits with:
18:49jja2000[d]: $ NVK_I_WANT_A_BROKEN_VULKAN_DRIVER=1 DISPLAY=:1 meson devenv vkcube
18:49jja2000[d]: Selected WSI platform: xcb
18:49jja2000[d]: WARNING: NVK is not a conformant Vulkan implementation, testing use only.
18:49jja2000[d]: Selected GPU 0: NVK GP10B, type: IntegratedGpu
18:49jja2000[d]: vkcube: ../src/vulkan/wsi/wsi_common.c:1772: wsi_select_memory_type: Assertion `!"" "No memory type found"' failed.
18:49jja2000[d]: You: I retried forcing glxgears to use zink (since it normally uses `tegra`, not nouveau) and got the following:
18:49jja2000[d]: $ NVK_I_WANT_A_BROKEN_VULKAN_DRIVER=1 MESA_LOADER_DRIVER_OVERRIDE=zink DISPLAY=:1 meson devenv glxgears
18:49jja2000[d]: WARNING: NVK is not a conformant Vulkan implementation, testing use only.
18:49jja2000[d]: glxgears: ../src/gallium/drivers/zink/zink_screen.c:3350: zink_internal_create_screen: Assertion `i == ZINK_HEAP_HOST_VISIBLE_COHERENT_CACHED || i == ZINK_HEAP_DEVICE_LOCAL_LAZY || i == ZINK_HEAP_DEVICE_LOCAL_VISIBLE' failed
18:49jja2000[d]: Ah, memtype
18:50jja2000[d]: I think vkinfo will list 0MB memory aswell
18:58gfxstrand[d]: Okay, more stuff to fix on Tegra. I think Mary might have had a patch for that.
19:11jja2000[d]: Alright, I'll sub to the MR and I'll see if there's activity
19:12jja2000[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33959 I'm assuming it'll be added to this
22:35mohamexiety[d]: airlied[d]: if you want to toy with it a bit, I added in the RE'd qmd stuff here, as well as a fake header https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34331/commits
22:35mohamexiety[d]: it doesn't compile because it's missing a few fields I am not sure how to get and I am tempted to place these fields in random locations or zero them to see how it works but maybe it could be useful for you while I continue looking into it next week
22:35mohamexiety[d]: these are the missing fields:
22:35mohamexiety[d]: error[E0432]: unresolved import `QMDV05_00_QMD_MINOR_VERSION`
22:35mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_QMD_MAJOR_VERSION` in module `clcdc0`
22:35mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_API_VISIBLE_CALL_LIMIT` in module `clcdc0`
22:35mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_API_VISIBLE_CALL_LIMIT_NO_CHECK` in module `clcdc0`
22:35mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_SAMPLER_INDEX` in module `clcdc0`
22:35mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_SAMPLER_INDEX_INDEPENDENTLY` in module `clcdc0`
22:35mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_BARRIER_COUNT` in module `clcdc0`
22:35mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_SHADER_LOCAL_MEMORY_HIGH_SIZE` in module `clcdc0`
22:35mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_SHADER_LOCAL_MEMORY_LOW_SIZE` in module `clcdc0`
22:35mohamexiety[d]: error[E0425]: cannot find function, tuple struct or tuple variant `QMDV05_00_CONSTANT_BUFFER_SIZE_SHIFTED4` in module `clcdc0`
22:35mohamexiety[d]: error[E0425]: cannot find function, tuple struct or tuple variant `QMDV05_00_CONSTANT_BUFFER_VALID` in module `clcdc0`
22:36mohamexiety[d]: major/minor version is actually easy I just forgot about it. but the rest I guess we could just zero them or place them in random locations
22:51mohamexiety[d]: also added in the QMD RE in a comment