01:13 fdobridge: <!‚ÄčDodoNVK (she) ūüáĪūüáĻ> Why did I get this error after my display got unplugged (due to a power outage)?: `Apr 27 06:17:19 RenoirBeast kernel: nouveau 0000:01:00.0: gsp: cli:0xc1d00002 obj:0x00730000 ctrl cmd:0x00731341 failed: 0x0000ffff`
01:15 fdobridge: <!‚ÄčDodoNVK (she) ūüáĪūüáĻ> And this mess of kernel errors after waking up the GPU to make the display work again
01:15 fdobridge: <!‚ÄčDodoNVK (she) ūüáĪūüáĻ> https://cdn.discordapp.com/attachments/1034184951790305330/1233949796046340106/message.txt?ex=662ef4cd&is=662da34d&hm=81ab539db2097080ee1f4f3f8bc722a383cbe46b232f0bcb9e1da008aa00d82b&
02:10 fdobridge: <r‚Äčinlovesyou> Did you experience 1 second outages too?
02:10 fdobridge: <r‚Äčinlovesyou> Happened twice for me earlier
02:11 fdobridge: <r‚Äčinlovesyou> Only displays rebooted, the psu managed to keep my pc on during that
02:16 fdobridge: <r‚Äčinlovesyou> Only displays rebooted, (and my ceiling light). the psu managed to keep my pc on during that (edited)
02:16 fdobridge: <r‚Äčinlovesyou> Happened twice for me earlier today (edited)
02:22 fdobridge: <!‚ÄčDodoNVK (she) ūüáĪūüáĻ> This was actually 1 hour according to dmesg
02:23 fdobridge: <r‚Äčinlovesyou> What
02:23 fdobridge: <r‚Äčinlovesyou> How does the power go out for an hour and only your displays get disconnected
02:24 fdobridge: <r‚Äčinlovesyou> Or was this an error that happened in the 1 second or so your psu keeps your pc alive before dying due to no power
02:25 fdobridge: <!‚ÄčDodoNVK (she) ūüáĪūüáĻ> Laptop momento
02:25 fdobridge: <r‚Äčinlovesyou> O h
02:25 fdobridge: <r‚Äčinlovesyou> I was experiencing "mini outages" here in Germany else m earlier
02:25 fdobridge: <r‚Äčinlovesyou> I was experiencing "mini outages" here in Germany earlier (edited)
02:25 fdobridge: <r‚Äčinlovesyou> Where the power went out for just a second and only my displays went off
02:26 fdobridge: <r‚Äčinlovesyou> Impressed that my psu kept the pc running
02:26 fdobridge: <r‚Äčinlovesyou> Even my router powered through
06:11 fdobridge: <!‚ÄčDodoNVK (she) ūüáĪūüáĻ> I think I had these too (but this wasn't one of them)
06:11 fdobridge: <!‚ÄčDodoNVK (she) ūüáĪūüáĻ>
06:11 fdobridge: <!‚ÄčDodoNVK (she) ūüáĪūüáĻ> In that case the oven clock would still be correct
07:37 fdobridge: <a‚Äčhuillet> that's an aux transaction error thing that Lyude fixed I think (?) you need a git kernel
07:47 fdobridge: <!‚ÄčDodoNVK (she) ūüáĪūüáĻ> I see the fix is already in 6.9-rc5
11:48 fdobridge: <p‚Äčavlo_kozlenko> Do you`r have a plan to automaticly reclocking for GM10x Maxwell, Kepler and Tesla G94-GT218? Have you'r decided to just focus on turning and newer architectures and not raise this issue?
11:48 fdobridge: <p‚Äčavlo_kozlenko> Do you have a plan to automaticly reclocking for GM10x Maxwell, Kepler and Tesla G94-GT218? Have you decided to just focus on turning and newer architectures and not raise this issue? (edited)
11:49 fdobridge: <S‚Äčid> ...not this again
11:50 fdobridge: <p‚Äčavlo_kozlenko> Oh, I see, everything will stay as it is?
11:52 fdobridge: <S‚Äčid> unless nvidia releases redistributable firmware for those cards, yes
11:57 fdobridge: <p‚Äčavlo_kozlenko> We need signed firmware for automatic reclocking?!
12:06 fdobridge: <m‚Äčagic_rb.> @pavlo_kozlenko since the others have explained this a million times, ill do it this time. Im not a dev. Just a user but ive asked this exact question in the past.
12:09 fdobridge: <m‚Äčagic_rb.> Maxwell down is way too old to properly support vulkan and the focus is now on vulkan. It would be possible but would require a lot of work and the performance would be horrible. Therefore those generations will not get much development probably. I assume theyll be kept arounf as much "getting display out" requires.
12:09 fdobridge: <m‚Äčagic_rb.> Pascal and maxwell, whilr possible to implement vulkan reasonably well, do not have the GSP, which is a riscv/arm SoC inside the gpu. It take GSP firmware and takes over much of the tasks the driver would have to do. Nvidia releases GSP under a proprietary but redistributable license which means nouveau can make use of said firmware for full support.
12:11 fdobridge: <m‚Äčagic_rb.> You might then ask why not use the firmware nvidia used for maxwell, pascal? Well be cause we cant, its not redistributable. Ok so then why not write our own? Well thats what we do, but due to reasons which ill explain later, nvidia requires the firmware to be signed from maxwell up. And we dont have the signing keys. One of the things we do not get without the correct the signature is reclocking
12:12 fdobridge: <m‚Äčagic_rb.> And for why the signature check, i suspect (purely my specilation) that its because of scammers that would get a 1060 flash it with a modified bios reporting itself as a 1080ti and sell it online. (Which is impossible with signed firmware)
12:13 fdobridge: <m‚Äčagic_rb.> Hopefully i didnt miss anything
12:15 fdobridge: <m‚Äčagic_rb.> And ive no clue why nvidia wont release redistibutable firmware for pascal/maxwell, but they do have a good reason. Licensing can be very complicated
12:17 fdobridge: <S‚Äčid> yeah, it's likely legal things for nv
12:17 fdobridge: <S‚Äčid> and those cards are *old*
12:17 fdobridge: <S‚Äčid> turing is already 6 years old too
12:18 RSpliet: ultimately it'll cost them time (legal, but also just packaging and distribution) and thus money. And I don't think they'll see a return from that investment as they won't be selling more pascal/maxwell cards as a result. Just a tiny bit of goodwill from the community if they do.
12:19 fdobridge: <m‚Äčagic_rb.> Id like to see pascal as much as the next guy since ive a 1060 in my desktop, but yeah, probably not happening
12:19 fdobridge: <m‚Äčagic_rb.> By the time it does ill buy a new gsp gpu anyway, the 1060 is already old af
12:20 fdobridge: <m‚Äčagic_rb.> Oh my, its 8 years old holy shit, time flies
12:20 fdobridge: <S‚Äčid> I believe NV's focus when it comes to linux is enterprise users, and GeForce cards get the benefits as a side effect due to shared architecture
12:21 fdobridge: <m‚Äčagic_rb.> Indeed, the linux gaming numbers are so small, especially on nvidia, that it makes no sense for them go care
12:21 fdobridge: <m‚Äčagic_rb.> Indeed, the linux gaming numbers are so small, especially on nvidia, that it makes no sense for them to care (edited)
12:21 fdobridge: <S‚Äčid> no, they do care
12:21 RSpliet: Sid: and potentially automotive https://corp.mediatek.com/news-events/press-releases/mediatek-brings-advanced-ai-capabilities-to-vehicles-with-new-dimensity-auto-cockpit-chipsets-enabled-by-nvidia-technology
12:21 fdobridge: <S‚Äčid> nv linux drivers do get game specific fixes
12:22 fdobridge: <S‚Äčid> just that gamers are not the biggest shareholders when it comes to linux
12:22 fdobridge: <S‚Äčid> mhm
12:22 fdobridge: <m‚Äčagic_rb.> Isnt that because they trickle down from the windows side? Or do we also get linux specific game specific fixes?
12:22 Sid127: RSpliet: yeah, enterprise/servers/AI solution
12:22 Sid127: @magic_rb. no we do get linux specific fixes
12:23 Sid127: I remember seeing stuff for Spider-Man Remastered and Starfield in the changelogs
12:23 fdobridge: <m‚Äčagic_rb.> Huh, right the doom eternal situation as an example
12:23 Sid127: there's also changelog lines explicitly mentioning VKD3D-Proton sometimes
12:23 fdobridge: <m‚Äčagic_rb.> Cool!
12:24 RSpliet: Heh interesting... maybe they're after a slice of the Steam deck pie, and know that they won't get it unless it's OSS...
12:24 fdobridge: <S‚Äčid> ```
12:24 fdobridge: <S‚Äčid> September 28th, 2023 - Windows 537.54, Linux 535.43.10
12:24 fdobridge: <S‚Äčid>
12:24 fdobridge: <S‚Äčid> New:
12:24 fdobridge: <S‚Äčid> VK_EXT_map_memory_placed [Linux]
12:24 fdobridge: <S‚Äčid> Fixes:
12:24 fdobridge: <S‚Äčid> Updates to provisional VK_NV_displacement_micromap implementation, now with glslang support
12:24 fdobridge: <S‚Äčid> Fix driver crash with Starfield running under vkd3d related to VK_NV_device_generated_commands and VK_EXT_device_generated_commands_compute
12:24 fdobridge: <S‚Äčid> Fix issue with vkCmdFillBuffer when the base address is not 16B aligned
12:24 fdobridge: <S‚Äčid> Fix vkCmdCopyQueryPoolResult with VK_QUERY_TYPE_TIMESTAMP and the last entry in the query pool
12:24 fdobridge: <S‚Äčid> Fix vkResetCommandPool issue when used on a command buffer in the recording state
12:24 fdobridge: <S‚Äčid> ```
12:25 Sid127: apologies for that bit of spam, RSpliet :p
12:25 Sid127: but yeah, starfield running under vkd3d-proton is an explicit linux oriented fix
12:25 fdobridge: <m‚Äčagic_rb.> Well good to know, nice to be wrong in the best way possiblr
12:26 Sid127: afaik the last vulkan beta driver also has a fix for Dragon's Dogma 2 and its nvidia reflex interaction
12:26 Sid127: by last, I mean the one that released 4 days ago
12:27 Sid127: as well as a fix for vkd3d-proton regarding frame IDs
12:27 Sid127: also hi ahuillet :wave:
12:27 fdobridge: <a‚Äčhuillet> the NVIDIA proprietary driver shares code between Linux and Windows, so fixes on one OS appear in the other automatically
12:27 fdobridge: <a‚Äčhuillet> hey Sid127. IRC now? :)
12:28 Sid127: right, but there exist loads on linux gaming that don't on windows
12:28 Sid127: and hence, issues
12:28 fdobridge: <S‚Äčid> and yeah I keep flip flopping between the two c:
12:30 fdobridge: <m‚Äčagic_rb.> Since im bridging through matrix i dont see disvord nicknames, so sid127 and tiredchiku\#0 are the same person?
12:30 fdobridge: <S‚Äčid> correct
12:31 fdobridge: <m‚Äčagic_rb.> Good to know, having to keep mapping tables in my head is annoying lol
12:31 fdobridge: <S‚Äčid> I *could* change my IRC nick to be tiredchiku as well :p
12:31 fdobridge: <m‚Äčagic_rb.> Nah
12:31 fdobridge: <m‚Äčagic_rb.> Ill manage
12:31 fdobridge: <S‚Äčid> but Sid127 is more consistent with my github/gitlab/codeberg usernames
12:32 fdobridge: <m‚Äčagic_rb.> Its completely a me skill issue for not using the native disvord client
12:32 fdobridge: <S‚Äčid> nah, don't worry about it
12:32 fdobridge: <S‚Äčid> I rocked the matrix bridge set up for quite a while too
12:32 fdobridge: <m‚Äčagic_rb.> Its very nice, until it isnt lol
12:32 fdobridge: <S‚Äčid> because I liked what beeper was doing and wanted it for myself
12:33 fdobridge: <S‚Äčid> yeah, considering matrix is self hosted, the admin tools are *really* lacking
12:37 fdobridge: <p‚Äčavlo_kozlenko> I did not mean firmware with pascal\maxwell.
12:37 fdobridge: <p‚Äčavlo_kozlenko>
12:37 fdobridge: <p‚Äčavlo_kozlenko> ```
12:37 fdobridge: <p‚Äčavlo_kozlenko> For example, to check the available power states and the current setting for the first card in your system, run:
12:37 fdobridge: <p‚Äčavlo_kozlenko>
12:37 fdobridge: <p‚Äčavlo_kozlenko> # cat /sys/kernel/debug/dri/0/pstate
12:37 fdobridge: <p‚Äčavlo_kozlenko>
12:37 fdobridge: <p‚Äčavlo_kozlenko> It is also possible to manually set/force a certain power state by writing to said interface:
12:38 fdobridge: <p‚Äčavlo_kozlenko>
12:38 fdobridge: <p‚Äčavlo_kozlenko> # echo pstate > /sys/kernel/debug/dri/0/pstate
12:38 fdobridge: <p‚Äčavlo_kozlenko> ```
12:38 fdobridge: <p‚Äčavlo_kozlenko> Will it be possible to do this automatically, or will it require another signed firmware?
12:39 fdobridge: <S‚Äčid> no, because to have automatic reclocking you need the firmware
12:40 fdobridge: <m‚Äčagic_rb.> reclocking != powerstates as far as i know
12:41 fdobridge: <m‚Äčagic_rb.> generally they coincide but not necessarily
12:41 fdobridge: <m‚Äčagic_rb.> but that question is beyond me
12:41 fdobridge: <S‚Äčid> the manual reclocking support is basically manually setting pstates
12:41 fdobridge: <k‚Äčarolherbstūüźßūü¶Ä> the problem is, that none of this is simple
12:42 RSpliet: yeah the pstate controlled through this debugfs node _is_ reclocking.
12:42 fdobridge: <k‚Äčarolherbstūüźßūü¶Ä> for doing it automatically you'd need to know the load on the GPU
12:42 fdobridge: <k‚Äčarolherbstūüźßūü¶Ä> and then come up with heuristics on when to increase/decrease clocks
12:42 RSpliet: For doing it automatically we need to come up with a way to make the screen not flicker ;x
12:42 karolherbst: ahh yeah.... I've written to code but only tested on a laptop :D
12:42 fdobridge: <S‚Äčid> :myy_TinyGiggle:
12:43 fdobridge: <p‚Äčavlo_kozlenko> I tried to write the code and ran into this too ūü§£
12:44 karolherbst: the other problem is to write the code in a way it doesn't waste CPU cycles constantly
12:44 fdobridge: <m‚Äčagic_rb.> so we could do it from the kernel side?
12:44 fdobridge: <m‚Äčagic_rb.> huh interesting
12:44 karolherbst: nothing requires doing it in firmware, however you want to do that for power efficiency
12:44 karolherbst: and then the GPU just deciding when to clock up/down. But due to how the code is working it also requires telling the kernel and all that
12:44 fdobridge: <m‚Äčagic_rb.> well TIL
12:44 fdobridge: <!‚ÄčDodoNVK (she) ūüáĪūüáĻ> https://trello.com/c/0WqnRuER/126-dynamic-reclocking
12:44 RSpliet: well...
12:44 karolherbst: it's a giant pita to fix it all up
12:45 RSpliet: The problem with doing it from the CPU is the latency on the PCIe bus
12:45 karolherbst: I don't mean the reclocking
12:45 RSpliet: oh you mean the load testing
12:45 RSpliet: Yeah that makes more sense :-)
12:46 karolherbst: yeah
12:46 karolherbst: and to initiate the reclocking
12:46 karolherbst: I basically wrote code to poll the engine idle counters on the PMU and it interrupts the CPU to tell it what to do
12:46 karolherbst: and that kinda worked
12:46 karolherbst: and the kernel side could just configure thresholds
12:48 RSpliet: yes, this would be the way
12:48 RSpliet: at least for a first iteration. I'm sure the decision making for when to adjust the clocks can be tweaked endlessly
12:48 karolherbst: the issue as you pointed out is the flickering :)
12:49 karolherbst: but yeah.. I've tested it with doing 1000 reclocks per second
12:49 karolherbst: and it worked fine :D
12:49 karolherbst: or rather the interval was very very short
12:49 fdobridge: <m‚Äčagic_rb.> except for the flickering?
12:49 karolherbst: but anyway.... not a high prio these days
12:50 RSpliet: the flickering shouldn't be the hardest thing in the world to solve, just nobody got round to it
12:50 RSpliet: the "secret" is a HW scanout buffer that needs configuring to hold enough pixels for each display such that you don't interrupt scanout when you take DRAM offline for reclocking.
12:50 fdobridge: <m‚Äčagic_rb.> if anyone feels bored after turing up is done, then maybe pascal lol, but by that point, no one will care anyway
12:50 RSpliet: on fermi- that thing's called the NISO buffer afaik
12:54 RSpliet: Also, on fermi- for single-monitor nouveau plays a different trick; we just delay DRAM reclocking to the VBLANK period, when no scanout takes place
12:54 RSpliet: Doesn't work for two monitors because the VBLANK periods of both monitors don't necessarily overlap
12:55 RSpliet: For kepler and newer they started supporting more than 2 monitors, so that NISO buffer got more complex and they changed how to sync to VBLANK too in a way that I don't think we fixed in nouveau
12:55 RSpliet: </memory dump?
12:55 RSpliet: >
12:55 karolherbst: luckily we don't reclock memory too often, because it doesn't have like 40 states like the graph engine
12:57 RSpliet: very true
12:58 RSpliet: anyway, this problem isn't unsolvable, it just requires someone with big brains to get really angry at the problem for a few weeks, and that event hasn't happened :-P
12:58 RSpliet: the flicker problem that is
12:58 RSpliet: the broader reclocking problem has received a lot of angry braincycles already, but can always do with more, there's gaps in existing nouveau code
12:58 fdobridge: <m‚Äčagic_rb.> ūü§£ i like how you put the event