05:55 fdobridge_: <g​fxstrand> @karolherbst Have you run rusticl on nvk recently?
08:45 fdobridge_: <g​fxstrand> @karolherbst @airlied The modifiers situation is worse than I though. See point 2 in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27672
09:00 fdobridge_: <a​irlied> yup that agrees with what I worked out last week
12:21 fdobridge_: <k​arolherbst🐧🦀> not yet, but I should 😄
12:45 fdobridge_: <!​DodoNVK (she) 🇱🇹> Didn't you try getting rusticl working back in 2022?
13:13 fdobridge_: <k​arolherbst🐧🦀> it kinda worked on nouveau, but the problem was the broken compiler
13:13 fdobridge_: <k​arolherbst🐧🦀> so before NAK NVK had the same problem
13:13 fdobridge_: <k​arolherbst🐧🦀> it's really pointless to run rusticl on any vulkan impl not supporting `int16` or `int8` properly
15:28 fdobridge_: <r​hed0x> @gfxstrand can i bother you for a few minutes?
15:29 fdobridge_: <r​hed0x> I'm trying to get started working on NVK
15:30 fdobridge_: <r​hed0x> and I picked conservative rasterization as something to potentially look into because it's
15:30 fdobridge_: <r​hed0x> A: not yet implemented
15:30 fdobridge_: <r​hed0x> B: hopefully just a bunch of registers
15:31 fdobridge_: <r​hed0x> conservative rasterization happens to be missing from the nvidia headers, so I dumped the pushbufs from the proprietary driver
15:31 fdobridge_: <r​hed0x> ```
15:31 fdobridge_: <r​hed0x> [0x0000001f] HDR 2001008c subch 0 NINC | [0x0000001f] HDR 80000452 subch 0 IMMD
15:31 fdobridge_: <r​hed0x> mthd 0230 unknown method <
15:31 fdobridge_: <r​hed0x> .VALUE = 0x31103 <
15:31 fdobridge_: <r​hed0x> <
15:31 fdobridge_: <r​hed0x> [0x00000021] HDR 80010452 subch 0 IMMD <
15:31 fdobridge_: <r​hed0x> mthd 1148 unknown method mthd 1148 unknown method
15:31 fdobridge_: <r​hed0x> .VALUE = 0x1 | .VALUE = 0x0
15:31 fdobridge_: <r​hed0x>
15:31 fdobridge_: <r​hed0x>
15:31 fdobridge_: <r​hed0x> 0x31113 for underestimate
15:31 fdobridge_: <r​hed0x> 0x31100 for overestimate with 0.0
15:31 fdobridge_: <r​hed0x> 0x31102 for overestimate with 0.5
15:31 fdobridge_: <r​hed0x> 0x31103 for overestimate with 0.75 (max)
15:31 fdobridge_: <r​hed0x> ```
15:31 fdobridge_: <r​hed0x> 0x1148 also shows up in the GL driver for conservative rasterization
15:32 fdobridge_: <r​hed0x> one thing that irritates me a bit is that first call to method 0x0230
15:33 fdobridge_: <r​hed0x> pretty much everything NVK does goes through `NVC0_FIFO_PKHDR_IL(int subc, int mthd, uint16_t data)`
15:33 fdobridge_: <r​hed0x> which results in the HDRs starting with 0x80000000
15:33 fdobridge_: <r​hed0x> this one starts with 0x200000
15:34 fdobridge_: <r​hed0x> pretty much everything NVK does goes through `P_IMMD => NVC0_FIFO_PKHDR_IL(int subc, int mthd, uint16_t data)` (edited)
15:34 fdobridge_: <m​arysaka> There is an MR for conservative rasterization btw https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25668
15:34 fdobridge_: <r​hed0x> damnit :c
15:36 fdobridge_: <m​arysaka> for 0x200000 I'm pretty sure you want P_MTHD @rhed0x
15:37 fdobridge_: <m​arysaka> so for example
15:37 fdobridge_: <m​arysaka>
15:37 fdobridge_: <m​arysaka> ```c
15:37 fdobridge_: <m​arysaka> P_MTHD(p, NVC597, SET_MME_MEM_ADDRESS_A);
15:37 fdobridge_: <m​arysaka> P_NVC597_SET_MME_MEM_ADDRESS_A(p, high32(data_addr));
15:37 fdobridge_: <m​arysaka> P_NVC597_SET_MME_MEM_ADDRESS_B(p, low32(data_addr));
15:37 fdobridge_: <m​arysaka> /* Start 3 dwords into MME RAM */
15:37 fdobridge_: <m​arysaka> P_NVC597_SET_MME_DATA_RAM_ADDRESS(p, 3);
15:37 fdobridge_: <m​arysaka> P_IMMD(p, NVC597, MME_DMA_WRITE, 20);
15:37 fdobridge_: <m​arysaka> ```
15:38 fdobridge_: <m​arysaka> the 3 `P_NVC597_` calls will be in increment mode (so 0x200000 if I'm not mistaken)
15:38 fdobridge_: <m​arysaka> and the P_IMMD will cause the sequence to end
15:38 fdobridge_: <r​hed0x> okay different question then: any idea why that MR sets up a macro just to set the extra overestimate?
15:39 fdobridge_: <k​arolherbst🐧🦀> `P_IMMD` should be immediate or hdr+ value afaik
15:39 fdobridge_: <k​arolherbst🐧🦀> not sure we have any smartness in place to append it to the last one
15:39 fdobridge_: <m​arysaka> yes but here it will end the last sequence right?
15:39 fdobridge_: <r​hed0x> whats does HDR mean here?
15:39 fdobridge_: <k​arolherbst🐧🦀> ohh sure
15:39 fdobridge_: <k​arolherbst🐧🦀> header
15:39 fdobridge_: <r​hed0x> right, that makes sense
15:39 fdobridge_: <m​arysaka> see here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25668
15:40 fdobridge_: <k​arolherbst🐧🦀> maybe we should write header instead of HDR because that means something else these days 😄
15:40 fdobridge_: <r​hed0x> yeah I'm already staring at that
15:40 fdobridge_: <r​hed0x> ```c
15:40 fdobridge_: <r​hed0x> void
15:40 fdobridge_: <r​hed0x> nvk_mme_set_conservative_raster_state(struct mme_builder *b) {
15:40 fdobridge_: <r​hed0x> struct mme_value new_state = mme_load(b);
15:40 fdobridge_: <r​hed0x> struct mme_value old_state = nvk_mme_load_scratch(b, CONSERVATIVE_RASTER_STATE);
15:40 fdobridge_: <r​hed0x>
15:40 fdobridge_: <r​hed0x> mme_if(b, ine, new_state, old_state) {
15:41 fdobridge_: <r​hed0x> nvk_mme_store_scratch(b, CONSERVATIVE_RASTER_STATE, new_state);
15:41 fdobridge_: <r​hed0x> mme_set_priv_reg(b, new_state, mme_imm(BITFIELD_RANGE(3, 23)), mme_imm(0x418800));
15:41 fdobridge_: <r​hed0x> }
15:41 fdobridge_: <r​hed0x> }
15:41 fdobridge_: <r​hed0x> ```
15:41 fdobridge_: <r​hed0x> I just dont really understand why that macro exists
15:41 fdobridge_: <m​arysaka> that remind me we should also make use of the "prefetch" bit someday if we start writing commands on the GPU directly...
15:42 fdobridge_: <m​arysaka> (my knowledge of this is quite rusty and based on the original researches we did for the Switch, I should update the names in my brain someday 😅 )
15:43 fdobridge_: <m​arysaka> to only set the private reg if the state changed
15:43 fdobridge_: <m​arysaka> because it is costy as it needs to wait on the firmware side
15:43 fdobridge_: <r​hed0x> is changing that particularly expensive?
15:43 fdobridge_: <m​arysaka> because it is costy as it needs to wait on the firmware side to reply (edited)
15:43 fdobridge_: <r​hed0x> ah
15:43 fdobridge_: <m​arysaka> yeah it's the "falcon" methods
15:44 fdobridge_: <m​arysaka> not sure how they route that with GSP
15:44 fdobridge_: <r​hed0x> prop doesnt seem to call a macro or am I overlooking that?
15:46 fdobridge_: <m​arysaka> prop? `mme_set_priv_reg` here is the implementation of the `NVK_MME_SET_PRIV_REG` macro if that's what you are talking about :aki_thonk:
15:46 fdobridge_: <r​hed0x> ```
15:46 fdobridge_: <r​hed0x> [0x0000001f] HDR 2001008c subch 0 NINC
15:46 fdobridge_: <r​hed0x> mthd 0230 unknown method
15:46 fdobridge_: <r​hed0x> .VALUE = 0x31103
15:46 fdobridge_: <r​hed0x> ```
15:47 fdobridge_: <r​hed0x> thats what i got out of the prop driver
15:47 fdobridge_: <r​hed0x> with the value differing depending on over/underestimation and the extra size
15:50 fdobridge_: <m​arysaka> hmm maybe @vdpafaor know?
15:57 fdobridge_: <m​arysaka> @rhed0x on what gen are you testing?
15:57 fdobridge_: <r​hed0x> ampere
15:57 fdobridge_: <m​arysaka> hmm
15:57 fdobridge_: <m​arysaka> I know that on Maxwell/Pascal it goes via 0x418800 priv reg but maybe they changed that on Turing/Ampere?
15:59 fdobridge_: <k​arolherbst🐧🦀> lemme check something...
15:59 fdobridge_: <m​arysaka> 0x418800 being `gr_pri_gpcs_setup_debug` as per nvgpu (<https://github.com/alliedvision/linux_nvidia_jetson/blob/4609206e6594f1eb21e43e69afa8974cf20cc096/kernel/nvgpu/drivers/gpu/nvgpu/hal/gr/init/gr_init_gv11b.c#L61>)
15:59 fdobridge_: <k​arolherbst🐧🦀> we actually poke that reg from GL :ferrisUpsideDown:
15:59 fdobridge_: <k​arolherbst🐧🦀> and we never knew or something
15:59 fdobridge_: <k​arolherbst🐧🦀> check the `_conservative_raster_state` mme things in the gl driver
16:00 fdobridge_: <k​arolherbst🐧🦀> pre volta we have a `send (extrinsrt 0x0 $r2 0 12 11) /* sends 0x418800 */`
16:00 fdobridge_: <k​arolherbst🐧🦀> ehh pre turing
16:00 fdobridge_: <k​arolherbst🐧🦀> but in turing we also write to the same: `send(0x00418800);`
16:07 fdobridge_: <m​arysaka> it's possible that moved on ampere :aki_thonk:
16:16 fdobridge_: <r​hed0x> thanks btw :ferris_happy:
17:13 fdobridge_: <b​enjaminl> I only have maxwell to test with, and got an MME call when I dumped the proprietary driver that is pretty similar to `nvk_mme_set_conservative_raster_state` from that MR
17:13 fdobridge_: <b​enjaminl> the 0230 method you're getting is definitely new
17:15 fdobridge_: <b​enjaminl> that 0x31103 value also looks like it's packed differently, so you'll probably have to test it with a bunch of different parameters and figure out what the bit ranges are
17:16 fdobridge_: <g​fxstrand> They may have fixed it on Turing
17:17 fdobridge_: <g​fxstrand> That wouldn't surprise me at all
17:17 fdobridge_: <g​fxstrand> going through FALCON is perf death so they'd want to fix that up
17:17 fdobridge_: <b​enjaminl> a bit more context on why the macro is checking the previous value is that mesa's dynamic state tracking puts over/underestimation and enable/disable in the same field, but in the hardware toggling enable/disable is much cheaper than toggling over/under
17:18 fdobridge_: <b​enjaminl> so without the check, we would be pessimistically assuming that `mode = OVER` means that the previous value was `enabled = false; mode = UNDER;`, and setting both
17:35 fdobridge_: <r​hed0x> maybe i should try that MR later
18:58 fdobridge_: <p​homes_> I just tested on turing (tu104):
18:58 fdobridge_: <p​homes_> Test run totals:
18:58 fdobridge_: <p​homes_> Passed: 48/343 (14.0%)
18:58 fdobridge_: <p​homes_> Failed: 100/343 (29.2%)
18:58 fdobridge_: <p​homes_> Not supported: 195/343 (56.9%)
18:58 fdobridge_: <p​homes_> Warnings: 0/343 (0.0%)
18:58 fdobridge_: <p​homes_> Waived: 0/343 (0.0%)
20:00 phodius: hi when will EGL_EXT_image_dma_import be functional in nvk?
22:04 phodius: can i get an invite link to the discord #nouveau?
22:23 fdobridge_: <!​DodoNVK (she) 🇱🇹> phodius: https://discord.gg/ZAzuXNZw4k
22:23 fdobridge_: <e​nergetic_parrot_03598> thanks
22:48 fdobridge_: <g​fxstrand> @vdpafaor Where are we at with NAK on Maxwell? Do you have a good sense for how much is left?
23:05 fdobridge_: <b​enjaminl> haven't had time to work on it in a while, but the big missing pieces currently are:
23:05 fdobridge_: <b​enjaminl>
23:05 fdobridge_: <b​enjaminl> - scheduling (I've been testing with `NAK_DEBUG=serial`, I suspect some of the instruction latencies are different, but haven't looked into it)
23:05 fdobridge_: <b​enjaminl> - atomics _mostly_ don't pass the CTS yet
23:05 fdobridge_: <b​enjaminl> - there are a fair number of 3d-related instructions that haven't been implemented yet
23:05 fdobridge_: <b​enjaminl> haven't had time to work on it in a while, but the big missing pieces currently are:
23:05 fdobridge_: <b​enjaminl>
23:05 fdobridge_: <b​enjaminl> - scheduling (I've been testing with `NAK_DEBUG=serial`, I suspect some of the instruction latencies are different, but haven't looked into it)
23:05 fdobridge_: <b​enjaminl> - atomics _mostly_ don't pass the CTS yet
23:05 fdobridge_: <b​enjaminl> - there are a fair number of 3d-related instructions that haven't been implemented yet (edited)
23:06 fdobridge_: <b​enjaminl> this reminds me... I have an MR from a month that's like 90% done to fix a bunch of texture op test failures
23:10 fdobridge_: <g​fxstrand> Yeah, we need to get scheduling figured out
23:15 fdobridge_: <b​enjaminl> does it work on turing?
23:17 fdobridge_: <e​nergetic_parrot_03598> what would it take to get wayland compositors running on nvk it looks like its just needs the EGL_EXT_image_dma_import, where would the code be located at if it was implemented ?
23:19 fdobridge_: <g​fxstrand> Yeah, scheduling on Turing is fine
23:20 fdobridge_: <g​fxstrand> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24795
23:21 soreau: gfxstrand: does that need extra kernel bits?
23:21 fdobridge_: <g​fxstrand> Nope
23:21 fdobridge_: <g​fxstrand> I think it's mostly working (modulo nouveau GL's modifiers being bullshit)
23:22 soreau: gfxstrand: does it work on all hw?
23:22 fdobridge_: <g​fxstrand> Should
23:22 soreau: well, for nvk supported hw
23:23 fdobridge_: <g​fxstrand> Oh, and someone needs to implement linear render hacks.
23:24 soreau: for wsi?
23:25 fdobridge_: <g​fxstrand> For WSI, we can use the PRIME blit path.
23:26 fdobridge_: <g​fxstrand> But in order to support modifiers we need to be able to render to things with `DRM_FORMAT_MOD_LINEAR` and that's not something NVIDIA hardware wants to do.
23:28 soreau: hm, needs a bigger hammer?
23:30 fdobridge_: <g​fxstrand> Yeah, we need something. There's a few options.
23:57 fdobridge_: <e​nergetic_parrot_03598> i get WARNING: NVK is not a conformant Vulkan implementation, testing use only.
23:57 fdobridge_: <e​nergetic_parrot_03598> Selected GPU 0: TU106, type: DiscreteGpu
23:57 fdobridge_: <e​nergetic_parrot_03598> vkcube: ../src/vulkan/wsi/wsi_common_drm.c:441: wsi_configure_native_image: Assertion `!"Failed to find a supported modifier! This should never " "happen because LINEAR should always be available"' failed.