IRC Logs of #nouveau on irc.freenode.net for 2023-08-28

00:02 fdobridge: <gfxstrand> I think you underestimate just how much only caring about a single rectangle that never resizes and doesn't have to be decorated simplifies things. It gets rid of like 90% of the hard problems.
00:02 fdobridge: <prop_energy_ball> I meant that the compromises are not only beneficial for simple cases.
00:03 fdobridge: <prop_energy_ball> They also make doing things much easier for any app
00:03 fdobridge: <gfxstrand> Yes. Easier and wrong.
00:03 fdobridge: <prop_energy_ball> What do you consider wrong?
00:04 fdobridge: <gfxstrand> Things getting out-of-sync.
00:05 fdobridge: <prop_energy_ball> Like size and window size?
00:05 fdobridge: <prop_energy_ball> Like swapchain size and window size? (edited)
00:05 fdobridge: <prop_energy_ball> Or something else
00:05 fdobridge: <gfxstrand> One of Wayland's core principals was "every frame is perfect". It achieved that at the cost of simplicity. Try too hard and fast to reintroduce that simplicity and you get cracks, tearing, widgets updating at different times. Basically you get X11 again.
00:06 fdobridge: <gfxstrand> One possible answer is to say, "No one cares. That was a dumb goal. Make things easy again."
00:07 fdobridge: <gfxstrand> But if you think you can achieve that goal while making everything massively simpler, I'm very curious to know how.
00:07 fdobridge: <gfxstrand> Then Vulkan made everything worse by trying to layer on 6 different window systems. 🤡
00:09 fdobridge: <prop_energy_ball> My solution would be that you allocate your own images away from the swapchain, they can be any size and you just QueueSubmit the ones you want. No OUT_OF_DATE mismatching or recreation to deal with.
00:10 fdobridge: <prop_energy_ball> Size of the window == size of dmabuf
00:10 fdobridge: <prop_energy_ball> Even simpler than what we have right now and would still solve the problem imo
00:10 fdobridge: <prop_energy_ball> Gamescope already does this really because *handwaving* gamescope wsi layer magic needs it
00:14 fdobridge: <gfxstrand> Yeah, too much got encapsulated. I'll grant that. Not as much too much as some are tempted to think but too much. In particular, the way you have to throw away the whole swap chain when you out-of-date or suboptimal instead of just allocating some new images and phasing out the old ones.
00:14 fdobridge: <prop_energy_ball> Yeah...
00:15 fdobridge: <gfxstrand> That one is Android's fault.
00:15 fdobridge: <prop_energy_ball> Something something gralloc cruft?
00:16 fdobridge: <gfxstrand> Android swap chains
00:16 fdobridge: <prop_energy_ball> Will admit, never actually done low level Android stuff :)
00:17 fdobridge: <prop_energy_ball> But if its as shortsighted as what I have experienced with gralloc in the past...
00:17 fdobridge: <prop_energy_ball> That is not to say I'm giving GBM passes 🐸🐸🐸🐸🐸🐸
00:21 fdobridge: <gfxstrand> It's not so much shortsighted. They had reasons for everything they did. But also they have a very different design from Linux which is very different from Windows at the time which is very different from Windows now.
00:26 fdobridge: <gfxstrand> And making big changes to window systems is on the same order of magnitude as making changes to hardware. "Just make Android less dumb" isn't as easy as it sounds, even if it is all "just software".
01:46 raket: nouveau driver with gtx 780ti and nvidia driver with gtx 1080. optimus works... but crashes if toggling fullscreen :D
01:58 raket: gtx 780ti in pci-e 16x slot 2 and gtx 1080 in pci-e 16x slot 1. nouveau is blacklisted. The nvidia-driver happily loads the driver for the 1080 card but refuse to do so on the 780ti because it's not supported anymore. Loading the nouveau driver after the nvidia driver will enable the 780ti and start outputting on the dp-port... using DRI_PRIME=1 will use the gtx 1080 to render, To be honest, this is just
01:58 raket: silly. I think i shouldn't use this setup.
16:57 fdobridge: <gfxstrand> Well... NAK seems to be spilling. 🤯
20:03 fdobridge: <gfxstrand> `Pass: 400546, Fail: 935, Crash: 1943, Skip: 1729042, Timeout: 2, Missing: 495, Flake: 407, Duration: 1:42:28`
20:03 fdobridge: <gfxstrand> That's with 16 registers.
20:04 fdobridge: <gfxstrand> There's still bugs. I haven't hooked up mem<->mem copies yet and that's a lot of the crashes.
20:07 fdobridge: <mohamexiety> _nice!_ \o/
20:09 fdobridge: <mohamexiety> spilling 16 registers?
20:10 fdobridge: <gfxstrand> I need to do a bunch of refactoring and there's still some bugs but I'm really happy with it now.
20:11 fdobridge: <gfxstrand> I restricted everything down to 16 registers and spill to get everything to fit in the bottom 16.
20:18 fdobridge: <mohamexiety> aha, I see
20:19 fdobridge: <karolherbst🐧🦀> @gfxstrand up for some pre turing test later this week?
20:19 fdobridge: <karolherbst🐧🦀> I might have a kernel patch soon you wanna try out
20:19 fdobridge: <gfxstrand> Okay. What does it do?
20:19 fdobridge: <karolherbst🐧🦀> makes channel recovery less broken
20:20 fdobridge: <karolherbst🐧🦀> I've bisected the kernel and kinda figured out since when it became super unstable pre turing
20:20 fdobridge: <karolherbst🐧🦀> not sure _when_ I'll have a patch, but I don't think it will take too much as it's basically just messed up locking
20:24 fdobridge: <gfxstrand> Cool!
20:24 fdobridge: <gfxstrand> Yeah, let me know when you have the patch. I'm happy to give it a go.
20:25 fdobridge: <karolherbst🐧🦀> the painful part is, it's like three regression in one kernel release 🥲
20:26 fdobridge: <karolherbst🐧🦀> and I also have to check how that behaves on other GPUs, because atm I'm only checking with fermi, because that's what a user reported the bug... but at least it seems a bigger rework landed in 6.2 touching most the channel recovery stuff, so that gave me quite some good leads.
20:31 airlied: so the rework that stabilises turing+ made the older ones flakier?
20:31 karolherbst: not sure if that's a turing+ rework, but it was part of ampere enablement
20:33 karolherbst: anyway, it's kinda between c358f53871605a1a8d7ed6e544a05ea00e9c80cb and b084fff210bfd00de5cdef1802291272c77f581d
20:34 karolherbst: with the first commit it just runs into a locking assert
20:35 karolherbst: the latter one kinda introduced the current behavior, at least on fermi
20:38 karolherbst: but at least that assert should also be hit on other gens...
20:41 fdobridge: <karolherbst🐧🦀> if you have some spare cycles you can also check if running on top of 6.1 is way more stable on older gens, so we can be kinda sure it's the same issue. Not sure if you have the setup to just loop running the CTS while also continue working on NAK 😄
20:42 fdobridge: <karolherbst🐧🦀> but I think I'll have a patch ready by tomorrow
20:42 fdobridge: <gfxstrand> I don't have to loop the CTS. It pretty reliably blows up. 🤡
20:42 fdobridge: <karolherbst🐧🦀> heh
20:42 fdobridge: <karolherbst🐧🦀> yeah, but my point is, on 6.1 it shouldn't blow up
20:43 fdobridge: <karolherbst🐧🦀> or maybe less so...
20:43 fdobridge: <karolherbst🐧🦀> there might be more issues which were already there
20:45 fdobridge: <gfxstrand> Yeah, Maxwell has definitely gotten worse, I think.
20:46 fdobridge: <karolherbst🐧🦀> yeah... I'll have to test with other GPUs as well, but luckily a user reported a problem which like crashes contexts left and right immediatly in epiphany and on 6.1 there is not even a freeze in the GUI, while on 6.2 it just brings down the GPU and nouveau 🙃
20:47 fdobridge: <karolherbst🐧🦀> so hopefully other GPUs are also affected, but I'll test that tomorrow
20:48 fdobridge: <karolherbst🐧🦀> all the gens at least use more or less the same code with just a few differences
20:49 karolherbst: it's either https://gitlab.freedesktop.org/drm/nouveau/-/commit/4d60100a23ec5b98e43277d82e5de53c359cf02c or https://gitlab.freedesktop.org/drm/nouveau/-/commit/b084fff210bfd00de5cdef1802291272c77f581d
20:50 karolherbst: the first commit just doesn't outright work on fermi and it doesn't even boot up properly
21:00 fdobridge: <karolherbst🐧🦀> yeah... it's also way more stable on kepler before those changes
21:00 fdobridge: <karolherbst🐧🦀> checking pascal, just to be sure
21:01 fdobridge: <karolherbst🐧🦀> it's less broken on kepler than on fermi however
21:14 fdobridge: <karolherbst🐧🦀> yeah.. so it seems to be less bad on other GPUs, but the fault recovery feels way more smooth with the older code
21:14 fdobridge: <karolherbst🐧🦀> but fermi is just plain broken now
21:24 fdobridge: <airlied> So those patches are the ones that made Turing so stable I'm pretty sure
21:51 fdobridge: <karolherbst🐧🦀> yeah, quite likely
21:51 fdobridge: <karolherbst🐧🦀> I'll have to take a deeper look and see what's going on with them, unless somebody else wants to do that 😄
23:18 fdobridge: <airlied> @gfxstrand not sure how much you are looking after your nvk kernel branch but you might want to pull in the uapi change patch just in case
23:26 fdobridge: <gfxstrand> Right... Yeah, I should. I'm running drm-misc-fixes right now. I should just push that.