IRC Logs of #nouveau on irc.freenode.net for 2025-04-26

00:11 gfxstrand[d]: snowycoder[d]: We need to because NVK needs to lower the surface info reads to descriptor reads. That or we're building fake descriptor heaps.
01:53 damo22: i am trying to reverse engineer nvidia optimus on old laptops, which, as i understand, is a ACPI mechanism to power down the dGPU. Can someone point me to the code in nouveau where the _DSM method is checked for Optimus support or where _PS3 is called instead? I have almost implemented it in coreboot but the nouveau driver crashes when i resume from S3
01:56 damo22: as far as i can tell, optimus acpi _DSM usually toggles an acpi flag called OMPR to tell _PR3 method to power down the root port or not, but i think it can just always do this on models known to have optimus.
02:00 DodoGTA: damo22: This could be a start: https://github.com/torvalds/linux/blob/v6.14/drivers/gpu/drm/nouveau/nouveau_acpi.c
02:06 damo22: thanks
02:25 damo22: i cant seem to find anything that resumes the card from S3
02:28 damo22: doesnt it need to repower the dGPU?
02:30 damo22: _PS0
02:41 damo22: or is it the BIOS's job to call _PS0 on PEG port when resuming?
03:20 mangodev[d]: mhenning[d]: may've found the issue and i feel really stupid if it's what i think it was
03:20 mangodev[d]: not trying to celebrate too early and mess it up though
04:57 airlied[d]: PS0 gets called by PCI core
04:57 airlied[d]: And PS3
04:58 airlied[d]: The dsm call chooses between D3Hot and D3cold
04:58 airlied[d]: I can't remember all the details, older Optimus didnt use pr0/3 it was all dsm based and then later ones used the pr with dsm switching to cold
04:59 damo22: airlied[d]: i got it to call _PS3 and discovered the PM7H register bit to toggle the dGPU on and off, but something is happening when it tries to resume, i think it hangs before completing s3 shutdown, and then the resume hangs
05:02 airlied[d]: https://lists.freedesktop.org/archives/nouveau/attachments/20111217/cec063c2/attachment.patch
05:02 airlied[d]: Is as good as my memory of the times
05:04 damo22: someone else was working on this in coreboot and i tried to resurrect his work but its difficult to debug
05:06 damo22: will it completely break if i dont implement _DSM at all and only implement pr0/3 ?
05:17 damo22: PR0 is not defined in the vendor ACPI
05:17 damo22: for the PEG port
06:25 mangodev[d]: mhenning[d]: damn, the gitlab is down again
06:25 mangodev[d]: has there been some maintenance or something as of late?
06:32 airlied[d]: does look dead
06:35 mangodev[d]: why does it keep going on when i want to actually use it :(
06:35 mangodev[d]: what's worse is i think someone has had a similar issue before, going off a google search
06:35 mangodev[d]: there was an issue for nvk+librewolf crashing, but i couldn't look deeper into it because… the gitlab is down…
06:36 mangodev[d]: wasn't it down yesterday too? is it down every night for maintenance or something?
06:39 airlied[d]: was up like 5-6 hrs ago when I pushed drm-fixes branch, but broke after that
06:40 swee: The alpine linux gitlab works fine for me
06:53 airlied[d]: seems to be coming back up
08:22 falconm[m]: <mhenning[d]> "mangodev[d]: please file a bug..." <- there is a discord? if so may someone send me the link? I would greatly appreciate it, since I am more used to discord than matrix
08:23 mangodev[d]: falconm[m]: https://discord.gg/h8WadAXd
08:23 mangodev[d]: it's "unofficial"
08:23 mangodev[d]: """unofficial"""
08:27 snowycoder[d]: gfxstrand[d]: Makes sense, then I'll need to expose a lot of sucalc internals to NIR.
08:27 snowycoder[d]: With a bit of rework I figured out `image_deref_samples` and passed a test, so the descriptor magic and nil bindings are working.
08:27 snowycoder[d]: We "just" need to write all the sucalc dance, the hard part
08:28 mangodev[d]: snowycoder[d]: i'm curious
08:28 mangodev[d]: why does nvk seem to have a lot of `VK_DEVICE_LOST` errors relative to other mesa drivers? i've noticed it a lot more in issues compared to anv or radv
08:30 mangodev[d]: most of the issues of programs crashing are from the device being "lost" on nvk, rather than issues in implementation of the spec or using the hardware
08:30 mangodev[d]: how does a device get "lost" *while running the program?*
08:30 mangodev[d]: how does the device handle get lost while running a program using that device handle?
08:31 mangodev[d]: or is this a byproduct of classic nvidijank™
08:32 x512[m]: Proper software should handle VK_DEVICE_LOST and recreate Vulkan context.
08:32 mangodev[d]: what's weird is that i've even had the issue reciprocate through zink before and give a `GL_DEVICE_LOST`
08:32 mangodev[d]: nothing can keep track of nvk if you exist hard enough
08:33 mangodev[d]: is it something to do with the nouveau gsp that nvk runs off of?
08:36 x512[m]: Never experienced device lost yet on Haiku Zink NVRM.
08:37 x512[m]: Zink -> NVK
08:37 x512[m]: Maybe Nouveau KMD bug.
08:40 mangodev[d]: maybe, i've had it on quite a few applications
08:40 mangodev[d]: from proton to firefox to minecraft (through zink, haven't had it on native vulkan yet) to steam
08:41 mangodev[d]: the "proper software should handle it comment" implies that there's software proper enough to handle it 🙃
08:42 x512[m]: Firefox seems handle it.
08:42 snowycoder[d]: mangodev[d]: `VK_DEVICE_LOST` happens when there's any crash on the graphics card.
08:42 snowycoder[d]: That means either something gets too long and it's killed for DOS-prevention, or the GPU crashes on some commands.
08:42 snowycoder[d]: Do you have some kernel logs?
08:43 mangodev[d]: snowycoder[d]: varies on the issue, but i'd have to go dig them up when i encounter one again
08:43 mangodev[d]: mhenning[d]: wait a second
08:43 mangodev[d]: there's already an issue for this
08:43 mangodev[d]: that you even replied to before 🫠
08:43 mangodev[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12014
12:17 karolherbst[d]: `cmat_convert` is annoying 😢
13:58 gfxstrand[d]: mangodev[d]: I don't know how much we have relative to other drivers. NVK is still a relatively new driver, though, so there are still bugs.
14:00 gfxstrand[d]: I suspect, though, that what you're observing (if anything) is less about how many bugs we have (though we do have bugs) and more about how those bugs manifest. Nvidia hardware has a lot of validation built in and whenever you program something wrong, it detects it and kicks your context. On Intel or AMD, invalid programming tends to result in a GPU hang where you noticeably see everything lock up
14:00 gfxstrand[d]: and then the kernel driver tries to reset the GPU.
14:02 gfxstrand[d]: But we do have bugs. I haven't had time or energy in a while to try and burn down the issue tracker.
14:07 gfxstrand[d]: gfxstrand[d]: One of the annoying things about working on GPUs is that they all tend to have some sort of sinkhole where all the bugs end up. On Intel, it's GPU hang. On Nvidia, it's `VK_ERROR_DEVICE_LOST`. This causes a lot of issues with bug reporters because they'll see one issue with a GPU hang or lost device and jump in saying, "I'm seeing the same bug on <insert app name here>!" when they're
14:07 gfxstrand[d]: two completely unrelated bugs. That just happens to be that driver/hardware's sink hole.
14:13 x512[m]: Hardware error detection is much better than global GPU hang.
14:13 x512[m]: gfxstrand[d]: Intel/AMD can't preempt hang GPU context?
14:14 gfxstrand[d]: x512[m]: Oh, for sure. And Nvidia is often nice enough to even leave us little breadcrumbs to tell us where the issue is (though not always).
14:16 gfxstrand[d]: x512[m]: Sort of? Depends on the hardware generation and the nature of the hang. On Intel, they actually categorize hangs based on whether or not they can preempt. If they try to preempt and it fails, that's one kind of hang. If they can preempt but then it still doesn't complete for too long, that's another. I don't remember the naming off hand.
14:17 gfxstrand[d]: Then there's also context vs. full-chip hangs.
14:18 x512[m]: I believe that full-chip hang is a hardware or KMD bug. Userland should be not able to hang whole system. No matter what Mesa driver does, it should never happen.
14:19 gfxstrand[d]: On Intel, the typical "you hit a hardware assert" kind of hang is, I believe, typically a non-preemptable context hang. So it won't affect other contexts but it'll cause the machine to lock up a bit.
14:20 gfxstrand[d]: x512[m]: In an ideal world, yes. In the world we live in, full-chip hangs are all too common.
14:21 gfxstrand[d]: Fortunately, with an iGPU (most of Intel), you don't have VRAM so no data is lost, even with a full-chip hang.
14:21 x512[m]: What about GPU context state, registers etc.?
14:21 gfxstrand[d]: But AMD was really bad for a long time. Their newer cards have a lot fewer full-chip hangs but they're still not perfect.
14:23 gfxstrand[d]: x512[m]: If it's a context hang, you lose all that state for the hanging context but not the other contexts. For a full-chip hang, anything that was running is effectively toast. And with more and more hardware scheduling, "anything that was running" can be a lot of things.
14:25 kar1m0[d]: gfxstrand[d]: my only real issue with nvidia drivers is that they cap laptop gpus to only 80w and I can't change that
14:26 gfxstrand[d]: Back in the days when the scheduler was all on the CPU, full-chip hangs weren't really that different from context hangs, at least for iGPUs. Only one or maybe two things were in flight at any given time (one running and one queued) so the kernel knew exactly what was running and could isolate the hang. These days with firmware scheduling, though, basically any process that has outstanding work
14:26 gfxstrand[d]: should be considered running.
14:26 kar1m0[d]: isn't much of an issue but still they could have at least done something about it but then again it's nvidia
14:26 gfxstrand[d]: Laptop thermals are hard
14:27 kar1m0[d]: gfxstrand[d]: I managed to make mine work
14:27 kar1m0[d]: with nbfc configs
14:27 kar1m0[d]: but they do not work as they worked on windows obviously
14:27 kar1m0[d]: but they also sometimes work through bios settings but I do not touch those
14:39 x512[m]: Thermal control is a first thing that should be controlled by hardware/firmware. For safety reasons.
14:48 gfxstrand[d]: Yeah... CPUs controlling thermals doesn't typically result in things catching fire but you really want that as close to the hardware as possible.
14:49 gfxstrand[d]: At one point, I had a ThinkPad where the best known method for controlling thermals was a userspace daemon that smashed sysfs entries to adjust fans. Sketchy as hell...
14:50 kar1m0[d]: it is kind of scary to use such a software but I at least hope that it is enough, mission center showing me gpu and cpu temperatures is the only real measure
14:51 kar1m0[d]: although it usually jumps from one temperature to another
14:52 kar1m0[d]: so I can't say it's accurate but I can feel if my laptop overheats when the laptop frame is hot
14:52 kar1m0[d]: but so far my hardware is fine dunno if it will be in long term tho
14:56 gfxstrand[d]: Yeah, hard to say. Running too hot can slowly damage solder connections. See also all the dead PS3 problems.
14:58 kar1m0[d]: I doubt it runs hotter than it used to on windows, assuming I had 90 degrees celsius on my cpu when my fans worked at max speed with hp proprietary software on windows. which is not the case on linux and nbfc configs don't make my fans spin like crazy
14:58 kar1m0[d]: while gaming
15:42 x512[m]: Is it allowed to render to multiple non-intersecting parts of the same buffer from multiple contexts?
16:00 snowycoder[d]: I understood block linear 🤯
16:00 snowycoder[d]: God bless envytools
17:12 snowycoder[d]: Oh god, suldga doesn't take a normal 64-bit offset, it takes 32-bit shifted high part and the lower part is extracted from the bit-field.
17:12 snowycoder[d]: Sucalc is slowly making sense, and it's cursed.
18:26 behindpcibars: That you are a developer puppet dwfreed or most others from here, is a joke of a century, so random thought HdkR dunk your thunks straight up into your ass while holding middle finger stiff hold your nose in airlied's butt and get a good sniff. You might get delicously deadly fart at it's best from there, do both things with visible apetite. We capture all of you soon you vicous braindead
18:26 behindpcibars: shithoses.
19:51 f_: karol, I guess that's why you wanted to turn on +M I guess... :/
19:52 karolherbst: yeah... probably...
19:52 karolherbst: though I doubt it will help _that_ much
19:52 karolherbst: well.. let's see
19:53 karolherbst: uhh.. how did this work again..
19:54 karolherbst: mhhh
19:54 karolherbst: though
19:54 karolherbst: yeah.. that kills the bridge
19:55 f_: yeah
19:55 f_: you could autovoice *[d]*!*@*
19:57 karolherbst: f_: how can I grant autovoice tho?
19:57 dwfreed: karolherbst: +M doesn't help
19:57 karolherbst: yeah....
19:58 dwfreed: (bridge aside)
19:58 f_: ah well, turns out that .... person uses registered nickserv accounts
19:58 f_: Ugh
19:58 karolherbst: dri-devel is +M and it doesn't help there.. yeah...
19:58 karolherbst: yeah.. if you really really reall really want to, you get around any ban
19:59 f_: /mode +iI *[d]*!*@* :p
19:59 f_: (no that's not a good idea)
20:00 karolherbst: the bridge has a static IPv6
20:00 f_: yeah I know but you get it
20:00 karolherbst: and each puppet get its own address
20:00 karolherbst: yeah..
20:00 f_: something that comes to mind is maybe +m and manually add people to chanserv for autovoice but that aught to be tedious :-(
20:01 f_: so probably not an option
20:02 dwfreed: On my todo list is a bot that'll do a better job
20:02 f_: Bringing back AntiSpamBot? :o
20:02 dwfreed: Bit more powerful than that
20:02 f_: nice
20:03 karolherbst: I'm curious on what would even help
20:03 f_: I'm wondering what VPN(s) they're using
20:04 f_: but I guess if it were as easy as blocking vpns it perhaps would already have been done...
20:04 tiredchiku[d]: there's protonVPN in there for sure
20:04 tiredchiku[d]: and afaik karol was against blocking tor and VPNs because of the valid usecases
20:04 f_: > #nouveau has *!*@*.tor-irc.dnsbl.oftc.net banned by reticulum.oftc.net on July 16 2024 at 23:21
20:05 f_: but anyway this can be evaded by simply setting a cloak
20:05 dwfreed: there's some things I haven't been able to check in my current setup
20:07 dwfreed: I can usually spot him if i'm babysitting the channel
20:08 f_: You mean by how he speaks or by some connection info users can't see?
20:08 f_: (or can see)
20:08 Sid127: the nick is probably always a dead giveaway
20:08 dwfreed: I will say I can spot him before he speaks
20:09 dwfreed: I don't want to give too much away, as I suspect he has a permanent connection in these channels
20:10 f_: well the channels are usually publicly logged anyway
20:16 HdkR: Yea, they watch the logs, which isn't new.
22:51 mangodev[d]: quick question
22:51 mangodev[d]: what flag(s) do i need to add to build mesa with debug symbols?
22:52 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1365822859611476029/image.png?ex=680eb529&is=680d63a9&hm=68ce6f9154f8f50a903f232ad6f40f724029e3f51aa33134bae564066530a737&
22:52 mangodev[d]: i'm asking because of this issue comment
22:53 HdkR: mangodev[d]: -Dbuildtype=debugoptimized should
22:54 mangodev[d]: thank you :D
22:58 mangodev[d]: can't wait to be able to use hw-accel firefox again
22:58 mangodev[d]: it's been painfully slow for the past day because i now have it in software mode
22:58 mangodev[d]: it sucks because without this crash, the experience would be flawless
22:58 mangodev[d]: smoother web browsing than my phone
22:59 mangodev[d]: i have a feeling a similar crash is happening in discord as well, because it's also a SIGSEGV
22:59 mangodev[d]: although at least discord/electron silently recovers from the crash
23:01 mangodev[d]: any other libraries i should update with mesa-git?
23:10 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1365827292668100648/image.png?ex=680eb94a&is=680d67ca&hm=f812b0d2352cdc0267562ed158a3958e5364fd0f29d759c6f0919bbfba3058a7&
23:10 mangodev[d]: i would check the wiki, but uhhhh…
23:11 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1365827646034018427/image.png?ex=680eb99e&is=680d681e&hm=87d7af699383e85e7cfddb49d73c16cd61ec062d06bff616d57d6148a9fe3e15&
23:11 mangodev[d]: wait
23:11 mangodev[d]: is freedesktop down AGAIN??
23:23 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1365830791711686706/image.png?ex=680ebc8c&is=680d6b0c&hm=0106c9d95b90f5a8615c1df1a4f4991527b6b4bcea5867a1585fc63e17838365&
23:23 mangodev[d]: yeah i think freedesktop is having an outage… *again*
23:24 mangodev[d]: admittedly only partially though