00:36fdobridge: <airlied> don't they just work?
00:36fdobridge: <airlied> (if you don't unplug them 😛
01:56fdobridge: <airlied> oh it explodes on probe bleh
02:20fdobridge: <airlied> @gfxstrand what compositor are you testing with?
02:20fdobridge: <airlied> is it using nouveau GL or zink?
02:20fdobridge: <gfxstrand> Gnome shell. Nouveau GL. Standard F40 setup except for the kernel
03:12fdobridge: <airlied> okay think I have it reproduced at least
03:15fdobridge: <gfxstrand> There's probably an easier way to repro than two GPUs but it would probably require something targeted.
05:50fdobridge: <airlied> @gfxstrand https://paste.centos.org/view/raw/b7704acb helps avoid crashing, but gives me garbage tiled rendering
05:51fdobridge: <airlied> it aligns with what radv does, since it assume VRAM will probably get pulled into GART so tells the kernel that in advance
06:28fdobridge: <!DodoNVK (she) 🇱🇹> https://gitlab.freedesktop.org/mesa/mesa/-/commit/8a0afd127602023ee74c0d901303f3366b62ae06 💸
08:37fdobridge: <Sid> I'm so smart
08:37fdobridge: <Sid> I built the kernel without the nouveau kernel module
08:42fdobridge: <Sid> thankfully my pkgbuild does dirty builds too 😌
09:06fdobridge: <!DodoNVK (she) 🇱🇹> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29080 :triangle_nvk:
12:58fdobridge: <gfxstrand> Right... I'm a bit surprised the kernel didn't protect us a bit there. Still... Makes sense, I guess.
13:36fdobridge: <karolherbst🐧🦀> anybody here still doing nouveau GL vs zink benchmarking?
13:37fdobridge: <Sid> I was not, but I can if you want me to
13:38fdobridge: <karolherbst🐧🦀> I'd be interested how much this MR closes the gap in games: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28641
13:38fdobridge: <karolherbst🐧🦀> this MR is needed for silly gl driver internal reasons (command submissions is a mess)
13:38fdobridge: <karolherbst🐧🦀> but it should improve performance in most games
13:41fdobridge: <Sid> hm, mind if I test it out later? I'll have to re-build the entirety of my mesa package since it's a nouveau gallium patch
13:42fdobridge: <karolherbst🐧🦀> yeah, it's not urgent
13:42fdobridge: <karolherbst🐧🦀> *no
15:54fdobridge: <gfxstrand> Okay, I've got a better (IMO) version of that patch in my tree now and I'm CTSing it to see if I blew anything up too badly. Then I'll look into the tiling stuff.
15:58fdobridge: <gfxstrand> @karolherbst What's the difference between `NOUVEAU_GEM_DOMAIN_CPU` and `NOUVEAU_GEM_DOMAIN_GART`.
15:59fdobridge: <gfxstrand> @karolherbst What's the difference between `NOUVEAU_GEM_DOMAIN_CPU` and `NOUVEAU_GEM_DOMAIN_GART`? (edited)
16:07fdobridge: <gfxstrand> Oh, maybe that's used for ancient cache tracking. That would make sense.
16:08fdobridge: <karolherbst🐧🦀> yeah, honestly no idea 🙂
16:08fdobridge: <karolherbst🐧🦀> but yeah.. sounds like old GPU stuff
16:08fdobridge: <karolherbst🐧🦀> we don't use `NOUVEAU_GEM_DOMAIN_CPU` at all in mesa though
16:25fdobridge: <karolherbst🐧🦀> @gfxstrand maybe we should remove `NOUVEAU_GEM_DOMAIN_CPU` and see who complains?
16:25fdobridge: <karolherbst🐧🦀> because I'm sure _nothing_ uses it
16:26fdobridge: <karolherbst🐧🦀> mhhh
16:26fdobridge: <karolherbst🐧🦀> though the kernel sets NOUVEAU_GEM_DOMAIN_CPU if neither VRAM nor GART are set
16:27fdobridge: <karolherbst🐧🦀> anyway.. seems to be `TTM_PL_SYSTEM` vs `TTM_PL_TT`
16:31fdobridge: <gfxstrand> Ugh... This is all deeply cursed.
16:39fdobridge: <gfxstrand> The kernel query for BO info tells me what domain it's currently in, not what domains it was created to support.
16:39fdobridge: <gfxstrand> Useless.
16:46fdobridge: <redsheep> I'm curious on the motivation for work like this, do you think there's actual hope for the nouveau GL driver getting improved to the point of being the driver of choice?
16:49fdobridge: <karolherbst🐧🦀> well.. the author cared enough
16:50fdobridge: <karolherbst🐧🦀> but maybe it's significant enough also with toolkits
16:50fdobridge: <karolherbst🐧🦀> also
16:50fdobridge: <karolherbst🐧🦀> not all GPUs will support vulkan
16:50fdobridge: <redsheep> Yeah fair, I was just curious about your perspective
16:50fdobridge: <karolherbst🐧🦀> and it's just moving code around
16:51fdobridge: <karolherbst🐧🦀> however, if one would like start to refactor big chunks of the driver I'd simply suggest to start a new one as that's probably less time consuming to get proper performance
16:51fdobridge: <mohamexiety> from an earlier MR from the author:
16:51fdobridge: <mohamexiety> > I have been working on a new gallium driver on the side for nouveau [...]
16:51fdobridge: <mohamexiety> so he's working on something new entirely for later
16:53fdobridge: <redsheep> Oh I missed that line, that's exciting. It does seem more ideal in the long run than having good GL be limited to vulkan capable cards, and always going through more abstraction
18:36fdobridge: <airlied> Yeah it's likely the bo pin on the import side should explode, but not sure taking out the compositor is a win
18:36fdobridge: <airlied> The export side has no real idea
18:46fdobridge: <!DodoNVK (she) 🇱🇹> I'm not sure if the gallium driver even works with GSP (2D engine stuff had to be removed from NVK to fix the weird GR class errors)
18:54fdobridge: <mhenning> The gallium driver is intended to work with gsp, and does work on my machine for at least basic stuff
19:09fdobridge: <gfxstrand> Yeah... I didn't think having the import fail if the BO doesn't support GART is actually that bad. That shouldn't cause the compositor to die. It'll probably pass the failure to the client.
19:15fdobridge: <airlied> Actually I expect the pin works, but the client isn't prepared to render to system ram properly
19:15fdobridge: <airlied> Hence faults
19:16fdobridge: <airlied> Though I've no idea about kinds and sysmem, and there might be bugs in the bo migration
19:38fdobridge: <!DodoNVK (she) 🇱🇹> I definitely know some relatively old modifier changes basically broke it on my PRIME setup (I discovered those errors with only the NVIDIA GPU enabled in the compositor)
22:28fdobridge: <gfxstrand> I don't think it's the fault of the HW that tiling isn't working. I've got a theory I'm going to test quick...
22:35fdobridge: <gfxstrand> Yeah, so the tiling issues have nothing to do with this, actually.
22:36fdobridge: <gfxstrand> The tiling issues are because the old GL driver ignores modifiers and tries to pull bits out of the BO. If the BO is shared across GPUs, it appears to be a non-nouveau BO and therefore doesn't have a PTE kind or tile mode. If NVK renders and passes to NVK, it's fine.
22:38fdobridge: <gfxstrand> If, on the other hand, I use weston with NVK+Zink on one GPU (running with the X11 back-end on top of nouveau GL + GNOME) and gears on the other GPU, I get one correct frame and then a timeout.
22:38fdobridge: <gfxstrand> Which is cool because it gives us a reproducer for the timeout issue. 😄
22:38fdobridge: <gfxstrand> We've got a crucible test I can probably modify to make a nicer reproducer
23:02fdobridge: <airlied> @gfxstrand I'm surprised shared across GPUs gives a non-nouveau bo
23:02fdobridge: <airlied> I thought we just checked some function pointer whch I was pretty sure would be always the same
23:03fdobridge: <gfxstrand> hrm... Maybe?
23:04fdobridge: <gfxstrand> My conjecture there may be wrong
23:04fdobridge: <gfxstrand> It looks very much like nouveau GL treating it as linear, though. Not just a misc tiling problem.
23:14fdobridge: <airlied> my thoughts were tiling/kind not being used after sysmem migration in some place, but need to add a bit of debug printfs to work that out
23:16fdobridge: <gfxstrand> But Turing+ ignores kind for color surfaces AFAIK
23:19fdobridge: <gfxstrand> IDK why running the experiment on Weston+Zink+NVK causes timeouts
23:22fdobridge: <gfxstrand> Weston+OGL causes timeouts, too.
23:22fdobridge: <gfxstrand> The timeout is in gears
23:29fdobridge: <airlied> not device lost but timeouts?
23:37fdobridge: <gfxstrand> Well, the device gets lost because of a timeout
23:37fdobridge: <gfxstrand> But there's no other message in dmesg
23:37fdobridge: <gfxstrand> It smells very much like the EGL timeouts I've been seeing