01:13fdobridge: <!DodoNVK (she) 🇱🇹> Why did I get this error after my display got unplugged (due to a power outage)?: `Apr 27 06:17:19 RenoirBeast kernel: nouveau 0000:01:00.0: gsp: cli:0xc1d00002 obj:0x00730000 ctrl cmd:0x00731341 failed: 0x0000ffff`
01:15fdobridge: <!DodoNVK (she) 🇱🇹> And this mess of kernel errors after waking up the GPU to make the display work again
01:15fdobridge: <!DodoNVK (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1233949796046340106/message.txt?ex=662ef4cd&is=662da34d&hm=81ab539db2097080ee1f4f3f8bc722a383cbe46b232f0bcb9e1da008aa00d82b&
02:10fdobridge: <rinlovesyou> Did you experience 1 second outages too?
02:10fdobridge: <rinlovesyou> Happened twice for me earlier
02:11fdobridge: <rinlovesyou> Only displays rebooted, the psu managed to keep my pc on during that
02:16fdobridge: <rinlovesyou> Only displays rebooted, (and my ceiling light). the psu managed to keep my pc on during that (edited)
02:16fdobridge: <rinlovesyou> Happened twice for me earlier today (edited)
02:22fdobridge: <!DodoNVK (she) 🇱🇹> This was actually 1 hour according to dmesg
02:23fdobridge: <rinlovesyou> What
02:23fdobridge: <rinlovesyou> How does the power go out for an hour and only your displays get disconnected
02:24fdobridge: <rinlovesyou> Or was this an error that happened in the 1 second or so your psu keeps your pc alive before dying due to no power
02:25fdobridge: <!DodoNVK (she) 🇱🇹> Laptop momento
02:25fdobridge: <rinlovesyou> O h
02:25fdobridge: <rinlovesyou> I was experiencing "mini outages" here in Germany else m earlier
02:25fdobridge: <rinlovesyou> I was experiencing "mini outages" here in Germany earlier (edited)
02:25fdobridge: <rinlovesyou> Where the power went out for just a second and only my displays went off
02:26fdobridge: <rinlovesyou> Impressed that my psu kept the pc running
02:26fdobridge: <rinlovesyou> Even my router powered through
06:11fdobridge: <!DodoNVK (she) 🇱🇹> I think I had these too (but this wasn't one of them)
06:11fdobridge: <!DodoNVK (she) 🇱🇹>
06:11fdobridge: <!DodoNVK (she) 🇱🇹> In that case the oven clock would still be correct
07:37fdobridge: <ahuillet> that's an aux transaction error thing that Lyude fixed I think (?) you need a git kernel
07:47fdobridge: <!DodoNVK (she) 🇱🇹> I see the fix is already in 6.9-rc5
11:48fdobridge: <pavlo_kozlenko> Do you`r have a plan to automaticly reclocking for GM10x Maxwell, Kepler and Tesla G94-GT218? Have you'r decided to just focus on turning and newer architectures and not raise this issue?
11:48fdobridge: <pavlo_kozlenko> Do you have a plan to automaticly reclocking for GM10x Maxwell, Kepler and Tesla G94-GT218? Have you decided to just focus on turning and newer architectures and not raise this issue? (edited)
11:49fdobridge: <Sid> ...not this again
11:50fdobridge: <pavlo_kozlenko> Oh, I see, everything will stay as it is?
11:52fdobridge: <Sid> unless nvidia releases redistributable firmware for those cards, yes
11:57fdobridge: <pavlo_kozlenko> We need signed firmware for automatic reclocking?!
12:06fdobridge: <magic_rb.> @pavlo_kozlenko since the others have explained this a million times, ill do it this time. Im not a dev. Just a user but ive asked this exact question in the past.
12:09fdobridge: <magic_rb.> Maxwell down is way too old to properly support vulkan and the focus is now on vulkan. It would be possible but would require a lot of work and the performance would be horrible. Therefore those generations will not get much development probably. I assume theyll be kept arounf as much "getting display out" requires.
12:09fdobridge: <magic_rb.> Pascal and maxwell, whilr possible to implement vulkan reasonably well, do not have the GSP, which is a riscv/arm SoC inside the gpu. It take GSP firmware and takes over much of the tasks the driver would have to do. Nvidia releases GSP under a proprietary but redistributable license which means nouveau can make use of said firmware for full support.
12:11fdobridge: <magic_rb.> You might then ask why not use the firmware nvidia used for maxwell, pascal? Well be cause we cant, its not redistributable. Ok so then why not write our own? Well thats what we do, but due to reasons which ill explain later, nvidia requires the firmware to be signed from maxwell up. And we dont have the signing keys. One of the things we do not get without the correct the signature is reclocking
12:12fdobridge: <magic_rb.> And for why the signature check, i suspect (purely my specilation) that its because of scammers that would get a 1060 flash it with a modified bios reporting itself as a 1080ti and sell it online. (Which is impossible with signed firmware)
12:13fdobridge: <magic_rb.> Hopefully i didnt miss anything
12:15fdobridge: <magic_rb.> And ive no clue why nvidia wont release redistibutable firmware for pascal/maxwell, but they do have a good reason. Licensing can be very complicated
12:17fdobridge: <Sid> yeah, it's likely legal things for nv
12:17fdobridge: <Sid> and those cards are *old*
12:17fdobridge: <Sid> turing is already 6 years old too
12:18RSpliet: ultimately it'll cost them time (legal, but also just packaging and distribution) and thus money. And I don't think they'll see a return from that investment as they won't be selling more pascal/maxwell cards as a result. Just a tiny bit of goodwill from the community if they do.
12:19fdobridge: <magic_rb.> Id like to see pascal as much as the next guy since ive a 1060 in my desktop, but yeah, probably not happening
12:19fdobridge: <magic_rb.> By the time it does ill buy a new gsp gpu anyway, the 1060 is already old af
12:20fdobridge: <magic_rb.> Oh my, its 8 years old holy shit, time flies
12:20fdobridge: <Sid> I believe NV's focus when it comes to linux is enterprise users, and GeForce cards get the benefits as a side effect due to shared architecture
12:21fdobridge: <magic_rb.> Indeed, the linux gaming numbers are so small, especially on nvidia, that it makes no sense for them go care
12:21fdobridge: <magic_rb.> Indeed, the linux gaming numbers are so small, especially on nvidia, that it makes no sense for them to care (edited)
12:21fdobridge: <Sid> no, they do care
12:21RSpliet: Sid: and potentially automotive https://corp.mediatek.com/news-events/press-releases/mediatek-brings-advanced-ai-capabilities-to-vehicles-with-new-dimensity-auto-cockpit-chipsets-enabled-by-nvidia-technology
12:21fdobridge: <Sid> nv linux drivers do get game specific fixes
12:22fdobridge: <Sid> just that gamers are not the biggest shareholders when it comes to linux
12:22fdobridge: <Sid> mhm
12:22fdobridge: <magic_rb.> Isnt that because they trickle down from the windows side? Or do we also get linux specific game specific fixes?
12:22Sid127: RSpliet: yeah, enterprise/servers/AI solution
12:22Sid127: @magic_rb. no we do get linux specific fixes
12:23Sid127: I remember seeing stuff for Spider-Man Remastered and Starfield in the changelogs
12:23fdobridge: <magic_rb.> Huh, right the doom eternal situation as an example
12:23Sid127: there's also changelog lines explicitly mentioning VKD3D-Proton sometimes
12:23fdobridge: <magic_rb.> Cool!
12:24RSpliet: Heh interesting... maybe they're after a slice of the Steam deck pie, and know that they won't get it unless it's OSS...
12:24fdobridge: <Sid> ```
12:24fdobridge: <Sid> September 28th, 2023 - Windows 537.54, Linux 535.43.10
12:24fdobridge: <Sid>
12:24fdobridge: <Sid> New:
12:24fdobridge: <Sid> VK_EXT_map_memory_placed [Linux]
12:24fdobridge: <Sid> Fixes:
12:24fdobridge: <Sid> Updates to provisional VK_NV_displacement_micromap implementation, now with glslang support
12:24fdobridge: <Sid> Fix driver crash with Starfield running under vkd3d related to VK_NV_device_generated_commands and VK_EXT_device_generated_commands_compute
12:24fdobridge: <Sid> Fix issue with vkCmdFillBuffer when the base address is not 16B aligned
12:24fdobridge: <Sid> Fix vkCmdCopyQueryPoolResult with VK_QUERY_TYPE_TIMESTAMP and the last entry in the query pool
12:24fdobridge: <Sid> Fix vkResetCommandPool issue when used on a command buffer in the recording state
12:24fdobridge: <Sid> ```
12:25Sid127: apologies for that bit of spam, RSpliet :p
12:25Sid127: but yeah, starfield running under vkd3d-proton is an explicit linux oriented fix
12:25fdobridge: <magic_rb.> Well good to know, nice to be wrong in the best way possiblr
12:26Sid127: afaik the last vulkan beta driver also has a fix for Dragon's Dogma 2 and its nvidia reflex interaction
12:26Sid127: by last, I mean the one that released 4 days ago
12:27Sid127: as well as a fix for vkd3d-proton regarding frame IDs
12:27Sid127: also hi ahuillet :wave:
12:27fdobridge: <ahuillet> the NVIDIA proprietary driver shares code between Linux and Windows, so fixes on one OS appear in the other automatically
12:27fdobridge: <ahuillet> hey Sid127. IRC now? :)
12:28Sid127: right, but there exist loads on linux gaming that don't on windows
12:28Sid127: and hence, issues
12:28fdobridge: <Sid> and yeah I keep flip flopping between the two c:
12:30fdobridge: <magic_rb.> Since im bridging through matrix i dont see disvord nicknames, so sid127 and tiredchiku\#0 are the same person?
12:30fdobridge: <Sid> correct
12:31fdobridge: <magic_rb.> Good to know, having to keep mapping tables in my head is annoying lol
12:31fdobridge: <Sid> I *could* change my IRC nick to be tiredchiku as well :p
12:31fdobridge: <magic_rb.> Nah
12:31fdobridge: <magic_rb.> Ill manage
12:31fdobridge: <Sid> but Sid127 is more consistent with my github/gitlab/codeberg usernames
12:32fdobridge: <magic_rb.> Its completely a me skill issue for not using the native disvord client
12:32fdobridge: <Sid> nah, don't worry about it
12:32fdobridge: <Sid> I rocked the matrix bridge set up for quite a while too
12:32fdobridge: <magic_rb.> Its very nice, until it isnt lol
12:32fdobridge: <Sid> because I liked what beeper was doing and wanted it for myself
12:33fdobridge: <Sid> yeah, considering matrix is self hosted, the admin tools are *really* lacking
12:37fdobridge: <pavlo_kozlenko> I did not mean firmware with pascal\maxwell.
12:37fdobridge: <pavlo_kozlenko>
12:37fdobridge: <pavlo_kozlenko> ```
12:37fdobridge: <pavlo_kozlenko> For example, to check the available power states and the current setting for the first card in your system, run:
12:37fdobridge: <pavlo_kozlenko>
12:37fdobridge: <pavlo_kozlenko> # cat /sys/kernel/debug/dri/0/pstate
12:37fdobridge: <pavlo_kozlenko>
12:37fdobridge: <pavlo_kozlenko> It is also possible to manually set/force a certain power state by writing to said interface:
12:38fdobridge: <pavlo_kozlenko>
12:38fdobridge: <pavlo_kozlenko> # echo pstate > /sys/kernel/debug/dri/0/pstate
12:38fdobridge: <pavlo_kozlenko> ```
12:38fdobridge: <pavlo_kozlenko> Will it be possible to do this automatically, or will it require another signed firmware?
12:39fdobridge: <Sid> no, because to have automatic reclocking you need the firmware
12:40fdobridge: <magic_rb.> reclocking != powerstates as far as i know
12:41fdobridge: <magic_rb.> generally they coincide but not necessarily
12:41fdobridge: <magic_rb.> but that question is beyond me
12:41fdobridge: <Sid> the manual reclocking support is basically manually setting pstates
12:41fdobridge: <karolherbst🐧🦀> the problem is, that none of this is simple
12:42RSpliet: yeah the pstate controlled through this debugfs node _is_ reclocking.
12:42fdobridge: <karolherbst🐧🦀> for doing it automatically you'd need to know the load on the GPU
12:42fdobridge: <karolherbst🐧🦀> and then come up with heuristics on when to increase/decrease clocks
12:42RSpliet: For doing it automatically we need to come up with a way to make the screen not flicker ;x
12:42karolherbst: ahh yeah.... I've written to code but only tested on a laptop :D
12:42fdobridge: <Sid> :myy_TinyGiggle:
12:43fdobridge: <pavlo_kozlenko> I tried to write the code and ran into this too 🤣
12:44karolherbst: the other problem is to write the code in a way it doesn't waste CPU cycles constantly
12:44fdobridge: <magic_rb.> so we could do it from the kernel side?
12:44fdobridge: <magic_rb.> huh interesting
12:44karolherbst: nothing requires doing it in firmware, however you want to do that for power efficiency
12:44karolherbst: and then the GPU just deciding when to clock up/down. But due to how the code is working it also requires telling the kernel and all that
12:44fdobridge: <magic_rb.> well TIL
12:44fdobridge: <!DodoNVK (she) 🇱🇹> https://trello.com/c/0WqnRuER/126-dynamic-reclocking
12:44RSpliet: well...
12:44karolherbst: it's a giant pita to fix it all up
12:45RSpliet: The problem with doing it from the CPU is the latency on the PCIe bus
12:45karolherbst: I don't mean the reclocking
12:45RSpliet: oh you mean the load testing
12:45RSpliet: Yeah that makes more sense :-)
12:46karolherbst: yeah
12:46karolherbst: and to initiate the reclocking
12:46karolherbst: I basically wrote code to poll the engine idle counters on the PMU and it interrupts the CPU to tell it what to do
12:46karolherbst: and that kinda worked
12:46karolherbst: and the kernel side could just configure thresholds
12:48RSpliet: yes, this would be the way
12:48RSpliet: at least for a first iteration. I'm sure the decision making for when to adjust the clocks can be tweaked endlessly
12:48karolherbst: the issue as you pointed out is the flickering :)
12:49karolherbst: but yeah.. I've tested it with doing 1000 reclocks per second
12:49karolherbst: and it worked fine :D
12:49karolherbst: or rather the interval was very very short
12:49fdobridge: <magic_rb.> except for the flickering?
12:49karolherbst: but anyway.... not a high prio these days
12:50RSpliet: the flickering shouldn't be the hardest thing in the world to solve, just nobody got round to it
12:50RSpliet: the "secret" is a HW scanout buffer that needs configuring to hold enough pixels for each display such that you don't interrupt scanout when you take DRAM offline for reclocking.
12:50fdobridge: <magic_rb.> if anyone feels bored after turing up is done, then maybe pascal lol, but by that point, no one will care anyway
12:50RSpliet: on fermi- that thing's called the NISO buffer afaik
12:54RSpliet: Also, on fermi- for single-monitor nouveau plays a different trick; we just delay DRAM reclocking to the VBLANK period, when no scanout takes place
12:54RSpliet: Doesn't work for two monitors because the VBLANK periods of both monitors don't necessarily overlap
12:55RSpliet: For kepler and newer they started supporting more than 2 monitors, so that NISO buffer got more complex and they changed how to sync to VBLANK too in a way that I don't think we fixed in nouveau
12:55RSpliet: </memory dump?
12:55RSpliet: >
12:55karolherbst: luckily we don't reclock memory too often, because it doesn't have like 40 states like the graph engine
12:57RSpliet: very true
12:58RSpliet: anyway, this problem isn't unsolvable, it just requires someone with big brains to get really angry at the problem for a few weeks, and that event hasn't happened :-P
12:58RSpliet: the flicker problem that is
12:58RSpliet: the broader reclocking problem has received a lot of angry braincycles already, but can always do with more, there's gaps in existing nouveau code
12:58fdobridge: <magic_rb.> 🤣 i like how you put the event