IRC Logs of #nouveau on irc.freenode.net for 2023-07-25

00:10 fdobridge: <karolherbst🐧🦀> Passed: 1/1 (100.0%) :3
00:12 fdobridge: <karolherbst🐧🦀> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24312
00:13 fdobridge: <karolherbst🐧🦀> the volta+ code emitter got that part right 🙃
00:13 fdobridge: <karolherbst🐧🦀> tomorrow I'll try to see how much that fixes
01:48 fdobridge: <gfxstrand> Feel free to make an NVK MR as well
02:10 dakr: airlied, gfxstrand: I just sent out a quite lengthy mail regarding the new uAPI and performance. However, we can also keep discussing this in IRC.
02:12 dakr: https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next-vm-resv should remove the impact of the amount of mappings from EXEC latency entirely. However, as mentioned in the mail the GPUVA manager parts are WIP and pretty hacked up still.
02:26 fdobridge: <airlied> sounds good, I'll let @Faith see if they have any concerns that it will end up faster
02:26 fdobridge: <airlied> @gfxstrand just created https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/239 do dump nvidia driver cmd buffers from a layer
02:26 fdobridge: <airlied> it's based on your shader dumper
02:28 fdobridge: <gfxstrand> Cool
02:31 fdobridge: <airlied> dumping cond render shows I've even less clue how it should work
02:38 fdobridge: <gfxstrand> 🙃
03:06 fdobridge: <airlied> it does appears nvidia do a copy from local to host memory
03:08 fdobridge: <gfxstrand> Woohoo? 🙄
03:09 fdobridge: <gfxstrand> I guess that tells us we're not crazy
03:10 fdobridge: <airlied> it also does some MME on 3D class,
03:10 fdobridge: <airlied> which I can't dump
03:16 fdobridge: <gfxstrand> I mean if we have to do a tiny DMA to GART, I guess that's probably okay. The annoying but will be having to allocate GART-only memory. Most other stuff is fine with it being either VRAM or system.
03:17 fdobridge: <airlied> the command buffer upload area is GART
03:17 fdobridge: <airlied> nvidia also inline the qmds
03:24 fdobridge: <gfxstrand> They are now but don't we want VRAM with a WC map long-term?
03:24 fdobridge: <gfxstrand> Why would we want to force anything to be system RAM?
03:25 fdobridge: <gfxstrand> I mean, I guess the GPU pulling the command buffer across PCI once as it streams it through probably isn't any more PCI traffic than building it with a WC map.
03:26 fdobridge: <airlied> but yeah we'd probably want to leave it optional, in which case we do need a special allocation
03:28 fdobridge: <gfxstrand> It wouldn't be too horrible to add a third stream. A bit of a pain but not too horrible.
03:36 fdobridge: <airlied> yeah I might just put that together for the cond render path
03:36 fdobridge: <airlied> just to get it out of the way
03:37 fdobridge: <gfxstrand> IIRC, there's a bit of infra to build for it.
03:39 fdobridge: <airlied> yeah have to write it through cmd pools a well
03:47 fdobridge: <gfxstrand> Yeah. Annoying
04:00 fdobridge: <airlied> https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/240 is new cond render MR with the changes in it
10:18 fdobridge: <phomes> cond render cts passes here, but Sascha Willems conditionalrender test flickers and randomly disables draws
10:21 fdobridge: <airlied> Oh good find, I should test with it unless you can spot what going wrong
10:22 fdobridge: <phomes> I commented on the MR
11:56 fdobridge: <karolherbst🐧🦀> okay.. let's try that pascal CTS run with my MR and see how much stuff is left to fix 🙃 I kinda want to make it dmesg clean enough, so new issues are more apparent
13:02 fdobridge: <karolherbst🐧🦀> uhm.. did somebody trashed old gen support? `Pass: 58009, Fail: 50442, Crash: 2422, Skip: 483146, Flake: 981, Duration: 1:00:04, Remaining: 2:22:31` 🙃
13:03 fdobridge: <karolherbst🐧🦀> I hope it's not all that `ILLEGAL_INSTR_ENCODING` error I was fixing
13:08 fdobridge: <karolherbst🐧🦀> mhhh.. a lot of `VK_ERROR_DEVICE_LOST` stuff...
13:09 fdobridge: <karolherbst🐧🦀> mhhhh
13:09 fdobridge: <karolherbst🐧🦀> @gfxstrand I think we want to crash the CTS on `VK_ERROR_DEVICE_LOST` at least for CTS runs, so deqp-runner doesn't mark a lot of things as failed....
13:21 fdobridge: <karolherbst🐧🦀> mhh.. but I should also figure out what's broken there
13:23 fdobridge: <gfxstrand> `VK_MESA_ABORT_ON_DEVICE_LOSS=1`
13:24 fdobridge: <karolherbst🐧🦀> ahh, cool. Let me add this to my script
13:53 fdobridge: <gfxstrand> Grep for it first to make sure I didn't phone type it wrong. 🙃
13:54 fdobridge: <karolherbst🐧🦀> `MESA_VK_ABORT_ON_DEVICE_LOSS=1`
13:56 fdobridge: <karolherbst🐧🦀> let's see if that's better
13:58 fdobridge: <karolherbst🐧🦀> `Pass: 501, Fail: 23, Crash: 25, Skip: 2450, Flake: 1, Duration: 26, Remaining: 4:51:45` yep...
13:59 fdobridge: <karolherbst🐧🦀> that `dEQP error: deqp-vk: ../src/nouveau/vulkan/nvk_query_pool.c:757: nvk_CmdCopyQueryPoolResults: Assertion 'nvk_cmd_buffer_device(cmd)->pdev->info.cls_eng3d >= TURING_A' failed.` comes up quite often though
13:59 fdobridge: <mohamexiety> last time I ran a full CTS run on my GP108 it was this bad yeah
13:59 fdobridge: <mohamexiety> anything < Turing is really slow..
14:00 fdobridge: <karolherbst🐧🦀> nah.. it's going to be faster, but I have to run only 2 threads
14:00 fdobridge: <karolherbst🐧🦀> or I'll risk instabilities
14:03 fdobridge: <karolherbst🐧🦀> @gfxstrand what was the problem with the NVK_MME_COPY_QUERIES macro on pre turing?
14:03 fdobridge: <karolherbst🐧🦀> out of registers?
14:03 fdobridge: <karolherbst🐧🦀> or missing `mme_tu104_read_fifoed`?
14:04 fdobridge: <gfxstrand> Both
14:04 fdobridge: <karolherbst🐧🦀> mhh
14:04 fdobridge: <gfxstrand> I barely got it to fit in 23 regs on Turing. No way in hell it's gonna nicely fit in 7.
14:05 fdobridge: <karolherbst🐧🦀> ahh yeah...
14:05 fdobridge: <gfxstrand> I mean, we could maybe do something where we feed it in as a push pre-Turing but it really should just be a compute shader.
14:06 fdobridge: <gfxstrand> I built it in MME just because I could. 🙃
14:06 fdobridge: <karolherbst🐧🦀> valid reason
14:07 fdobridge: <karolherbst🐧🦀> but yeah.. I think this should go into a compute shader instead 😄
14:09 fdobridge: <karolherbst🐧🦀> or mhhh
14:10 fdobridge: <karolherbst🐧🦀> I wonder if one could also just use the copy engine here instead...
14:11 fdobridge: <karolherbst🐧🦀> I mean.. still do it inside the macro, but the strided copy should be fine to do in the copy engine, no?
14:11 fdobridge: <karolherbst🐧🦀> or is there something I'm missing here
14:41 fdobridge: <gfxstrand> No, there's stuff you have to do
14:41 fdobridge: <gfxstrand> Like subtract
14:50 fdobridge: <karolherbst🐧🦀> mhhh
14:53 fdobridge: <karolherbst🐧🦀> but that's just for timestamps, right?
14:53 fdobridge: <karolherbst🐧🦀> but yeah.. looks annoying
14:54 fdobridge: <gfxstrand> Everything but timestamps
14:54 fdobridge: <gfxstrand> I mean, maybe there's a trick we can do with the semaphores but IDK.
15:01 fdobridge: <gfxstrand> Two questions: 1) Does this affect Turing? 2) What all did it fix?
15:03 fdobridge: <gfxstrand> Does that branch also handle `push_count = 0` properly?
15:03 fdobridge: <gfxstrand> Ah, yes, I see Dave's patch in there.
15:04 fdobridge: <gfxstrand> Building now
15:13 fdobridge: <karolherbst🐧🦀> 1. no, 2. image loads with `MESA_FORMAT_NONE`
15:14 fdobridge: <gfxstrand> So what hardware does it cover? Maxwell by chance?
15:14 fdobridge: <karolherbst🐧🦀> but I'm also confused on why we do the surface format conversion inside the shader if the format is known...
15:14 fdobridge: <karolherbst🐧🦀> yeah.. Maxwell and Pascal
15:14 fdobridge: <karolherbst🐧🦀> I only tested on pascal though
15:14 fdobridge: <gfxstrand> Really old hardware needs format conversion in the shader and I think the code just got carried forwards.
15:15 fdobridge: <karolherbst🐧🦀> it's at least some of those `ILLEGAL_INSTR_ENCODING` errors
15:15 fdobridge: <karolherbst🐧🦀> yeah... Pre Maxwell we need it
15:15 fdobridge: <gfxstrand> There's absolutely no reason to do that on Maxwell+
15:15 fdobridge: <karolherbst🐧🦀> at least for loads...
15:15 fdobridge: <karolherbst🐧🦀> stores seems to be capable
15:15 fdobridge: <gfxstrand> I had to fix Turing too
15:16 fdobridge: <gfxstrand> I don't remember what I fixed but I fixed something. 🙃
15:16 fdobridge: <karolherbst🐧🦀> you fixed it for `NONE`, but we still do it for formatted ones, no?
15:16 fdobridge: <gfxstrand> Maybe?
15:16 fdobridge: <gfxstrand> We should probably stop
15:16 fdobridge: <karolherbst🐧🦀> yeah...
15:16 fdobridge: <karolherbst🐧🦀> maybe
15:16 fdobridge: <karolherbst🐧🦀> I don't know
15:16 fdobridge: <gfxstrand> We should stop
15:16 fdobridge: <gfxstrand> I don't think there's any perf benefit to it
15:16 fdobridge: <karolherbst🐧🦀> I think it depends on how everything is set up
15:16 fdobridge: <gfxstrand> I mean, maybe delete some channels
15:17 fdobridge: <karolherbst🐧🦀> maybe the shader format can overwrite the image views one?
15:17 fdobridge: <karolherbst🐧🦀> or something?
15:17 fdobridge: <gfxstrand> No
15:17 fdobridge: <karolherbst🐧🦀> I didn't dig deep enough yet
15:17 fdobridge: <gfxstrand> I mean yes but that's irrelevant
15:17 fdobridge: <gfxstrand> There's no case in which the format in the shader mismatching the view is allowed or useful.
15:17 fdobridge: <karolherbst🐧🦀> I see
15:17 fdobridge: <karolherbst🐧🦀> yeah then I guess it can probably go
15:31 fdobridge: <gfxstrand> Running the CTS on your branch now. I've verrified a 64 ms submit time so thanks for the fixes!
15:44 fdobridge: <karolherbst🐧🦀> all those " subc 0 mthd 0000 data 00000000" errors are kinda concerning me...
18:32 fdobridge: <karolherbst🐧🦀> @gfxstrand could we disable `CmdCopyQueryPoolResults` for pre turing or something? I think the crashes make the CTS run quite slow or something
18:33 fdobridge: <karolherbst🐧🦀> but it kinda looks like a 1.0 core feature 😢
18:35 fdobridge: <karolherbst🐧🦀> though those channel crashes also slow things down a lot 😄
18:36 fdobridge: <karolherbst🐧🦀> ` fifo: fault 01 [WRITE] at 0000000000000000 engine 00 [gr] client 01 [GPC0/T1_0] reason 02 [PTE] on channel 5 [007fae2000 deqp-vk[729863]]`
18:36 fdobridge: <karolherbst🐧🦀> something something tex
18:36 fdobridge: <gfxstrand> Not really, no. It's required for 1.0
18:36 fdobridge: <gfxstrand> So there's no feature we can shut off
18:37 fdobridge: <gfxstrand> I've got a denylist in my runner script precisely for that reason
18:37 fdobridge: <karolherbst🐧🦀> yeah.. though a dead channel is more expensive than a normal crash :/ I think I'll try to fix that null pointer on tex ops thing, whatever that might be
18:38 fdobridge: <karolherbst🐧🦀> I just want to have a proper baseline 🥲
18:58 fdobridge: <gfxstrand> yeah
19:09 fdobridge: <karolherbst🐧🦀> I switch to three threads, but still: `Pass: 284020, Fail: 8679, Crash: 15811, Warn: 4, Skip: 1346696, Flake: 290, Duration: 4:35:13, Remaining: 58:22`
19:14 fdobridge: <gfxstrand> Which HW is this?
19:15 fdobridge: <karolherbst🐧🦀> GP108
19:15 fdobridge: <karolherbst🐧🦀> maybe... I should have choosen a faster GPU
19:22 fdobridge: <mohamexiety> it's not influenced too much by that
19:23 fdobridge: <mohamexiety> I tried it on a 780 which is a bit faster than the 1030 and it still took hours
19:24 fdobridge: <karolherbst🐧🦀> I should check how stable a 20 threads run is... though I suspect the kernel will crash
19:24 fdobridge: <karolherbst🐧🦀> maybe I should figure out why it crashes
19:24 fdobridge: <karolherbst🐧🦀> would speed up everybody
20:59 fdobridge: <karolherbst🐧🦀> impressive `Pass: 343881, Fail: 10460, Crash: 19121, Warn: 4, Skip: 1632826, Timeout: 2, Flake: 360, Duration: 5:58:02, Remaining: 0`
21:04 fdobridge: <karolherbst🐧🦀> at least my MR is going well so far: `Pass: 2011, Fail: 2, Crash: 2, UnexpectedPass: 32, ExpectedFail: 161, Skip: 9792, Duration: 1:51, Remaining: 5:08:07`
21:09 fdobridge: <mohamexiety> 5 hours :c
21:09 fdobridge: <mohamexiety> a good start though
21:17 fdobridge: <airlied> thats about how long my turing laptop takes
21:19 fdobridge: <karolherbst🐧🦀> what...
21:19 fdobridge: <karolherbst🐧🦀> on turing it takes like 1 hour or something for me.. mhh
21:45 fdobridge: <airlied> Yeah the cpu seems to matter a lot
22:02 fdobridge: <karolherbst🐧🦀> yeah... though dead channels really limit throughput here...
22:03 fdobridge: <karolherbst🐧🦀> I'm sure we are missing something silly in pre Turing...
23:23 fdobridge: <gfxstrand> dakr: How long do you think it will take to go back and forth with people on the memory manager changes? IDK how responsive they've been.
23:24 fdobridge: <gfxstrand> @airlied I think I'm going to go ahead and make a Mesa MR. We'll leave it `Draft:` for now but that way people know it's coming. I think we're good to pull NVK into mesa/main as soon as the UAPI lands in drm-next.
23:24 dakr: gfxstrand: You mean the GPUVA manager changes for common dma-resv, extobj and evicted obj tracking?
23:26 fdobridge: <airlied> @faith we can always land the no_share flag in the uapi even if we don't wire it up just yet
23:27 gfxstrand: dakr: yeah
23:27 gfxstrand: airlied: Yeah, we can. That's kind-of what I'm thinking. As long as the flag works and we reject exports when it's set, that gives us the back-door we need.
23:31 dakr: gfxstrand: I already discussed the patch with the Intel folks, Matt even proposed it. I think I remember Boris liked it as well. I'm not sure about amdgpu though, if it also suits its needs.
23:32 dakr: I will get it in shape and push it on the mailing list soon. Usually folks are pretty responsive, hence I'd be optimistic.
23:33 gfxstrand: Okay, cool.
23:34 gfxstrand: Obviously, it'd be nice to have that when we land initial support but, like airlied said, we can land without as long as we have the NO_SHARE flag in the UAPI.
23:34 gfxstrand: dakr: Uh... I think your branch broke the old UAPI. That or I badly broke something on rebase. (-:
23:35 gfxstrand: Wait... that can't be right...
23:35 gfxstrand: I just did a full NAK run and that uses the old uapi.
23:35 gfxstrand: Something must have broken in the reabase
23:35 gfxstrand: crazy
23:39 airlied: seems unlikely to be kernel side, since the paths are pretty separated
23:40 dakr: There's shouldn't be much different than the ktime tracking, I tend to think something has gone wrong with the rebase.
23:40 fdobridge: <karolherbst🐧🦀> @gfxstrand I've already merged https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24312 so you might want to rebase again
23:40 fdobridge: <karolherbst🐧🦀> didn't want to wait until my run is complete, but it's promising: `Pass: 147272, Fail: 19, Crash: 215, UnexpectedPass: 1865, ExpectedFail: 10567, Skip: 697981, Timeout: 3, Flake: 78, Duration: 2:38:11, Remaining: 3:31:46`
23:45 fdobridge: <gfxstrand> I'm rebasing now
23:47 gfxstrand: Okay, seems to be running fine now. No idea what happened there.
23:48 fdobridge: <gfxstrand> Also, even though it looks like I lost it, I still have the patch from Thomas to fix D32S8. I just squashed it into Dave's patch to add support.