IRC Logs of #nouveau on irc.freenode.net for 2024-02-14

05:07 fdobridge_: <gfxstrand> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24795
05:07 fdobridge_: <gfxstrand> This works on kepler and modifiers "work" if we set the PTE bits on the BO
05:13 fdobridge_: <Sid> hmm, Quake Champions on NVK does something weird
05:13 fdobridge_: <Sid> it seems to be breaking wine? wine isn't able to put up the window for the game
05:13 fdobridge_: <Sid> dxvk log doesn't have much info
05:14 fdobridge_: <gfxstrand> @airlied ^^
05:17 fdobridge_: <Sid> https://cdn.discordapp.com/attachments/1034184951790305330/1207193780801765416/steam-611500.log?ex=65dec1d1&is=65cc4cd1&hm=6bee7ef2343a3e16c845117a6d352e51cb5740366d4ef0f0bd8e5670d92f0dae&
05:28 fdobridge_: <Sid> Sea of Thieves still hits an Xid13
05:40 fdobridge_: <Sid> Control on dx12 fails to render the menu, throws `[Wed Feb 14 11:06:08 2024] nouveau 0000:01:00.0: Control_DX12.ex[12010]: nv50cal_space: -16` in the dmesg
06:01 fdobridge_: <prop_energy_ball> I don't have the best setup to test this until I get home on the 24th
06:09 fdobridge_: <Sid> any specific way you'd like for me to test it?
06:09 fdobridge_: <Sid> or should I just see if I can get gamescope running, nested
06:12 fdobridge_: <prop_energy_ball> You can test if nested works
06:12 fdobridge_: <prop_energy_ball> If vkcube looks like vkcube and not a garbled mess, it should work
06:12 fdobridge_: <prop_energy_ball> might also need that "expose more vram types" MR if that isnnt already merged
06:12 fdobridge_: <prop_energy_ball> I am not up on the NVK lore atm, maybe it was
06:14 fdobridge_: <Sid> hm
06:16 fdobridge_: <Sid> gamescope --prefer-vk-device here I come
06:26 fdobridge_: <Sid> hm, which patch was that again?
06:27 fdobridge_: <Sid> found it
06:35 fdobridge_: <Sid> https://cdn.discordapp.com/attachments/1034184951790305330/1207213584338124840/gamescope.log?ex=65ded443&is=65cc5f43&hm=89f71631752f2e99d34861c1932549691f16d1622a53833804b7472f5b686062&
06:40 fdobridge_: <Sid> funny
06:40 fdobridge_: <Sid> `vulkan: physical device does not support DRM format modifiers`
06:40 fdobridge_: <Sid> oh
06:42 fdobridge_: <Sid> heh
06:42 fdobridge_: <Sid> the MR doesn't seem to be advertising the extension
06:43 fdobridge_: <Sid> ```
06:43 fdobridge_: <Sid> vulkaninfo: ../mesa/src/nouveau/nil/nil_image.c:426: nil_image_init: Assertion `nil_drm_format_mod_is_supported(dev, info->modifier)' failed.
06:43 fdobridge_: <Sid> ```
06:44 fdobridge_: <Sid> weird
06:46 fdobridge_: <Sid> I guess it isn't guaranteed after all e-e
06:46 fdobridge_: <Sid> `https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24795/diffs#086d0c6e07669aafbb1e7f0bedcb1f69a9e6387f_412_425`
06:46 fdobridge_: <Sid> I guess it isn't guaranteed after all e-e
06:46 fdobridge_: <Sid> <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24795/diffs#086d0c6e07669aafbb1e7f0bedcb1f69a9e6387f_412_425> (edited)
06:47 fdobridge_: <Sid> it's just current git with !26622 and !24795 on top, in that order
06:49 fdobridge_: <mohamexiety> Keep in mind the MR is missing quite a bit still, it was mainly to fix something related to render on Kepler
06:51 fdobridge_: <Sid> yeah
06:51 fdobridge_: <mohamexiety> You could probably enable that yourself and see what happens
06:51 fdobridge_: <Sid> did see PTE bits are required to be manually set on the BO
06:51 fdobridge_: <Sid> or, at least that's what I understood
06:53 fdobridge_: <Sid> right, ok, src/nouveau/nil/nil_drm_format_mod.c
06:55 fdobridge_: <gfxstrand> @airlied The other thing we need to figure out is our plan for backwards compatibility. Because of no VM_BIND, nouveau GL's modifiers implementation is effectively broken. But it's also been shipping for years. If NVK implements modifiers correctly, they'll be incompatible with nouveau GL. I see a few options here:
06:55 fdobridge_: <gfxstrand> 1. Disable modifiers in nouveau GL and back port the patch. Unfortunately, it won't get back ported far enough to really ensure nothing blows up.
06:55 fdobridge_: <gfxstrand> 2. Say that the nouveau GL definition is the defacto modifier died and define the current set of modifiers as requiring the BO flags. Then add a new set of modifiers that require VM_BIND and implement those in NVK. This will mean NVK and nouveau GL will always fail to negotiate with each other and fall back to linear.
06:55 fdobridge_: <gfxstrand> 3. Set the BO flags from NVK for a while and then hope to drop that after a couple of years once everyone is shipping Zink.
06:55 fdobridge_: <gfxstrand> 4. Don't care, call it a nouveau GL bug, and tell people that the fix for the bug is to use Zink.
06:55 fdobridge_: <gfxstrand>
06:55 fdobridge_: <gfxstrand> I REALLY don't like options 3 and 4. Option 2 feels the safest, though it sucks to redefine the NVIDIA modifiers a third time.
06:55 fdobridge_: <Sid> `uint32_t pte_kind = nil_choose_pte_kind(dev, format, 0, compression);`
06:55 fdobridge_: <Sid> what would I have to set here?
06:55 fdobridge_: <gfxstrand> @airlied The other thing we need to figure out is our plan for backwards compatibility. Because of no VM_BIND, nouveau GL's modifiers implementation is effectively broken. But it's also been shipping for years. If NVK implements modifiers correctly, they'll be incompatible with nouveau GL. I see a few options here:
06:55 fdobridge_: <gfxstrand> 1. Disable modifiers in nouveau GL and back port the patch. Unfortunately, it won't get back ported far enough to really ensure nothing blows up.
06:55 fdobridge_: <gfxstrand> 2. Say that the nouveau GL definition is the defacto modifier died and define the current set of modifiers as requiring the BO flags. Then add a new set of modifiers that require VM_BIND and implement those in NVK. This will mean NVK and nouveau GL will always fail to negotiate with each other and fall back to linear.
06:55 fdobridge_: <gfxstrand> 3. Set the BO flags from NVK for a while and then hope to drop that after a couple of years once everyone is shipping Zink.
06:55 fdobridge_: <gfxstrand> 4. Don't care, call it a nouveau GL bug, and tell people that the fix for the bug is to use Zink.
06:55 fdobridge_: <gfxstrand> I REALLY don't like options 3 and 4. Option 2 feels the safest, though it sucks to redefine the NVIDIA modifiers a third time.
06:55 fdobridge_: <gfxstrand>
06:55 fdobridge_: <gfxstrand> Okay, with that I'm off to bed. Noodle it and let me know what you think. @karolherbst, too. (edited)
06:56 fdobridge_: <gfxstrand> @airlied The other thing we need to figure out is our plan for backwards compatibility. Because of no VM_BIND, nouveau GL's modifiers implementation is effectively broken. But it's also been shipping for years. If NVK implements modifiers correctly, they'll be incompatible with nouveau GL. I see a few options here:
06:56 fdobridge_: <gfxstrand> 1. Disable modifiers in nouveau GL and back port the patch. Unfortunately, it won't get back ported far enough to really ensure nothing blows up.
06:56 fdobridge_: <gfxstrand> 2. Say that the nouveau GL definition is the defacto modifier spec and define the current set of modifiers as requiring the BO flags. Then add a new set of modifiers that require VM_BIND and implement those in NVK. This will mean NVK and nouveau GL will always fail to negotiate with each other and fall back to linear.
06:56 fdobridge_: <gfxstrand> 3. Set the BO flags from NVK for a while and then hope to drop that after a couple of years once everyone is shipping Zink.
06:56 fdobridge_: <gfxstrand> 4. Don't care, call it a nouveau GL bug, and tell people that the fix for the bug is to use Zink.
06:56 fdobridge_: <gfxstrand> I REALLY don't like options 3 and 4. Option 2 feels the safest, though it sucks to redefine the NVIDIA modifiers a third time.
06:56 fdobridge_: <gfxstrand>
06:56 fdobridge_: <gfxstrand> Okay, with that I'm off to bed. Noodle it and let me know what you think. @karolherbst, too. (edited)
06:58 fdobridge_: <airlied> I'm pretty set on option 4 being the nicest 🙂
06:59 fdobridge_: <airlied> the modifiers are also used by nvidia so I'm, not sure option 2 is on the table at all
07:04 fdobridge_: <gfxstrand> Option 4 is going to be REALLY user visible.
07:05 fdobridge_: <gfxstrand> Like, WSI is just fucked if you get nouveau GL for your compositor.
07:06 fdobridge_: <gfxstrand> I'd rather 1 than 4. 1 is basically 4 but we protect ourselves from the breakage a bit.
07:08 fdobridge_: <gfxstrand> As for NVIDIA... They're the ones who implemented it wrong in nouveau GL and didn't test well enough to figure that out. I'm having trouble finding sympathy. And it's not like the old modifiers will stop working in KMS. The behavior of old and new for KMS will be the same.
07:10 fdobridge_: <airlied> 3 is a bit ugly, but not the most horrible thing, it's just annoying to have to pollute nvk and the uapi checks in the kernel
07:10 fdobridge_: <airlied> 1. is probably fine, most users will get this stuff from a distro which will have nouveau and nvk from the same mesa release
07:10 fdobridge_: <airlied> maybe some containers will be broken
10:03 fdobridge_: <karolherbst🐧🦀> like flatpaks 😛
10:09 fdobridge_: <karolherbst🐧🦀> but anyway, I think we can safely assume that VK and GL come from the same mesa release
13:39 fdobridge_: <zmike.> fwiw there are zink bug reports for this mismatch with intel/amd all over the place now where users have iris/radeonsi running their compositor and try to run apps with zink and get broken modifiers
13:39 fdobridge_: <zmike.> so you're just be joining the popular crowd
13:44 fdobridge_: <Sid> @airlied genuine question: what kinda power management are we doing for gpus on prime systems
13:44 fdobridge_: <karolherbst🐧🦀> we turn the GPU off if it's not in active use
13:44 fdobridge_: <Sid> iirc amdgpu puts the gpu into d3cold if there's no load on it
13:44 fdobridge_: <karolherbst🐧🦀> nouveau does the same
13:44 fdobridge_: <karolherbst🐧🦀> it's relaly not a GPU driver feature anyway
13:44 fdobridge_: <karolherbst🐧🦀> *really
13:44 fdobridge_: <Sid> even with gsp? 🙃
13:44 fdobridge_: <karolherbst🐧🦀> good question
13:45 fdobridge_: <karolherbst🐧🦀> it's all firmware level stuff
13:45 fdobridge_: <Sid> I took a quick look and my gpu was d0 with gsp
13:45 fdobridge_: <karolherbst🐧🦀> so I would be surprised if GSP really could do much here
13:45 fdobridge_: <Sid> didn't try non-gsp, hang on
13:45 fdobridge_: <karolherbst🐧🦀> maybe the runpm thing isn't enabled?
13:45 fdobridge_: <Sid> with no load except whatever drm node sway makes to handle external outputs
13:46 fdobridge_: <karolherbst🐧🦀> check with `grep . /sys/bus/pci/devices/*/power/control`
13:46 fdobridge_: <karolherbst🐧🦀> uhh.. and why do we still have drivers default to `on` :ferrisUpsideDown:
13:46 fdobridge_: <Sid> I was doing `/sys/class/drm/card*/device/power_state` but will do
13:47 fdobridge_: <Sid> gimme 2 mins to reboot to nouveau
13:47 fdobridge_: <Sid> should I check with gsp or without first
13:48 fdobridge_: <karolherbst🐧🦀> with gsp
13:48 fdobridge_: <Sid> yeah, even without gsp, my dGPU is on d0
13:49 fdobridge_: <Sid> [sidpr@strogg ~]$ cat /sys/class/drm/card*/device/power_state
13:49 fdobridge_: <Sid> D0
13:49 fdobridge_: <Sid> D0
13:49 fdobridge_: <karolherbst🐧🦀> power_state doesn't help to debug this
13:49 fdobridge_: <karolherbst🐧🦀> and I'm not even sure it actually gives meaningful information anyway
13:49 fdobridge_: <Sid> ```
13:49 fdobridge_: <Sid> /sys/bus/pci/devices/0000:01:00.0/power/control:auto
13:49 fdobridge_: <Sid> /sys/bus/pci/devices/0000:01:00.1/power/control:auto
13:49 fdobridge_: <Sid> /sys/bus/pci/devices/0000:01:00.2/power/control:auto
13:49 fdobridge_: <Sid> /sys/bus/pci/devices/0000:01:00.3/power/control:auto
13:49 fdobridge_: <Sid> ```
13:49 fdobridge_: <Sid> the 4 nvidia devices
13:50 fdobridge_: <karolherbst🐧🦀> nah.. I need it for all
13:50 fdobridge_: <Sid> vga controller, audio device, usb controller, serial bus controller
13:50 fdobridge_: <Sid> well, just imagine the same output but for every pci device on my laptop 😅
13:50 fdobridge_: <karolherbst🐧🦀> so everything is set to `auto`?
13:50 fdobridge_: <karolherbst🐧🦀> mhhh
13:50 fdobridge_: <Sid> yup
13:51 fdobridge_: <Sid> also this is non-gsp
13:51 fdobridge_: <karolherbst🐧🦀> what about `grep . /sys/bus/pci/devices/*/power/runtime_status`?
13:51 fdobridge_: <Sid> ```
13:51 fdobridge_: <Sid> /sys/bus/pci/devices/0000:01:00.0/power/runtime_status:active
13:51 fdobridge_: <Sid> /sys/bus/pci/devices/0000:01:00.1/power/runtime_status:active
13:51 fdobridge_: <Sid> /sys/bus/pci/devices/0000:01:00.2/power/runtime_status:suspended
13:51 fdobridge_: <Sid> /sys/bus/pci/devices/0000:01:00.3/power/runtime_status:suspended```
13:52 fdobridge_: <karolherbst🐧🦀> yeah sooo..
13:52 fdobridge_: <karolherbst🐧🦀> something uses the GPU
13:52 fdobridge_: <Sid> there's nothing on the dGPU right now, sway is being driven by the iGPU
13:52 fdobridge_: <Sid> well
13:52 fdobridge_: <karolherbst🐧🦀> well.. the kernel doesn't lie here
13:52 fdobridge_: <karolherbst🐧🦀> the GPU is in use
13:52 fdobridge_: <karolherbst🐧🦀> the question is: by what
13:52 fdobridge_: <Sid> I know sway also keeps a "node" open on the dGPU, I see it in nvtop when on proprietary
13:52 fdobridge_: <Sid> to handle external monitors being hotplugged
13:52 fdobridge_: <karolherbst🐧🦀> sure
13:52 fdobridge_: <Sid> says 4mb vram use, 0% everything else
13:53 fdobridge_: <karolherbst🐧🦀> it can also be some random other app being silly
13:53 fdobridge_: <karolherbst🐧🦀> like spotify :ferrisUpsideDown:
13:53 fdobridge_: <karolherbst🐧🦀> or vscode
13:53 fdobridge_: <Sid> there's nothing else open except firefox
13:53 fdobridge_: <karolherbst🐧🦀> electron based apps like to use the discrete GPU for whatever reason
13:53 fdobridge_: <Sid> I even use discord in firefox, so
13:53 fdobridge_: <karolherbst🐧🦀> `lsof /dev/dri/render*`
13:54 fdobridge_: <Sid> ```
13:54 fdobridge_: <Sid> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
13:54 fdobridge_: <Sid> sway 600 sidpr mem CHR 226,129 774 /dev/dri/renderD129
13:54 fdobridge_: <Sid> sway 600 sidpr 14u CHR 226,129 0t0 774 /dev/dri/renderD129
13:54 fdobridge_: <Sid> sway 600 sidpr 15u CHR 226,129 0t0 774 /dev/dri/renderD129
13:54 fdobridge_: <Sid> sway 600 sidpr 16u CHR 226,129 0t0 774 /dev/dri/renderD129
13:54 fdobridge_: <Sid> sway 600 sidpr 20u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> sway 600 sidpr 21u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> sway 600 sidpr 22u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> sway 600 sidpr 23u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> sway 600 sidpr 24u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> kitty 650 sidpr 8u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> kitty 650 sidpr 9u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> kitty 650 sidpr 10u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> kitty 650 sidpr 11u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> xdg-deskt 725 sidpr 14u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> xdg-deskt 725 sidpr 15u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> xdg-deskt 725 sidpr 16u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> xdg-deskt 725 sidpr 17u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> firefox 795 sidpr 15u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> firefox 795 sidpr 38u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> firefox 795 sidpr 40u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> firefox 795 sidpr 41u CHR 226,128 0t0 768 /dev/dri/renderD128
13:54 fdobridge_: <Sid> firefox 795 sidpr 42u CHR 226,128 0t0 768 /dev/dri/renderD128```
13:54 fdobridge_: <Sid> sways node
13:54 fdobridge_: <Sid> 😅
13:54 fdobridge_: <karolherbst🐧🦀> yeah soo.. does the GPU suspend if you exit sway and wait a couple of seconds?
13:55 fdobridge_: <Sid> good question, gimme a bit
13:55 fdobridge_: <karolherbst🐧🦀> sway might just do random commands every now and then or second or soo...
13:55 fdobridge_: <karolherbst🐧🦀> but keeping a render node open for display hotplugging is the wrong approach
13:55 fdobridge_: <karolherbst🐧🦀> mutter doesn't do this even
13:56 fdobridge_: <Sid> it does not
13:57 fdobridge_: <Sid> I'm in a tty
13:57 fdobridge_: <karolherbst🐧🦀> `runtime_status` still remains `active`?
13:57 fdobridge_: <Sid> runtime_status says active for gpu and audio controller
13:57 fdobridge_: <karolherbst🐧🦀> mhhhh
13:57 fdobridge_: <Sid> lsof for /dev/dri/render* outputs nothing
13:57 fdobridge_: <karolherbst🐧🦀> kill all audio servers/daemons 😄
13:58 fdobridge_: <karolherbst🐧🦀> ohh wait yeah..
13:58 fdobridge_: <karolherbst🐧🦀> the GPU stays active if the audio thing is in use
13:58 fdobridge_: <karolherbst🐧🦀> so it's probably some audio thing...
13:58 fdobridge_: <karolherbst🐧🦀> are you using pipewire?
13:58 fdobridge_: <Sid> yup
13:58 fdobridge_: <karolherbst🐧🦀> mhhh
13:58 fdobridge_: <karolherbst🐧🦀> try to set the nvidia audio thing to disabled via `pavucontrol`
13:58 fdobridge_: <Sid> I'm guessing there's no way to work around that on the kernel side?
13:59 fdobridge_: <karolherbst🐧🦀> of course not
13:59 fdobridge_: <karolherbst🐧🦀> it's one PCIe device anyway
13:59 fdobridge_: <Sid> I'm still in tbe tty :D
13:59 fdobridge_: <karolherbst🐧🦀> if something uses the audio part, it uses the audio part
13:59 fdobridge_: <Sid> :\
13:59 fdobridge_: <karolherbst🐧🦀> this is almost entirely usespace doing silly things
13:59 fdobridge_: <karolherbst🐧🦀> like.. most of such bugs
13:59 fdobridge_: <Sid> funny thing
13:59 fdobridge_: <Sid> pavucontrol does not list that audio device
13:59 fdobridge_: <karolherbst🐧🦀> oh no....
14:00 fdobridge_: <karolherbst🐧🦀> noooooooooooooooooooooooo
14:00 fdobridge_: <karolherbst🐧🦀> I know this bug
14:00 fdobridge_: <Sid> for the record it does on proprietaru
14:00 fdobridge_: <karolherbst🐧🦀> gimme your `dmesg` 😄
14:00 fdobridge_: <Sid> wait
14:00 fdobridge_: <karolherbst🐧🦀> I bet it's this silly alsa bug...
14:00 fdobridge_: <Sid> let me reboot to gsp, then try again
14:00 fdobridge_: <Sid> gsp is running 6.8-rc4
14:00 fdobridge_: <Sid> this is 6.7.4 rn
14:02 fdobridge_: <karolherbst🐧🦀> nah.. the bug I'm thinking about was never fixed, because $reasons
14:02 fdobridge_: <Sid> *sweat*
14:02 fdobridge_: <Sid> https://cdn.discordapp.com/attachments/1034184951790305330/1207326033582497843/dmesg.log?ex=65df3cfd&is=65ccc7fd&hm=0f2775530b60045151f0932897211ce677519edd6c02d6a681d4efb72c4e9662&
14:02 RSpliet: Is this return of a silly bug where the NVIDIA device has an audio device but no codecs because there's no displays hooked to the NVIDIA GPU so a codec would just be a waste of money?
14:02 fdobridge_: <karolherbst🐧🦀> anyway.. alsa probably fails to load and remains in limbo state
14:03 fdobridge_: <karolherbst🐧🦀> `[ 4.865854] snd_hda_intel 0000:01:00.1: Unable to change power state from D3cold to D0, device inaccessible` :ferrisUpsideDown:
14:03 fdobridge_: <Sid> huh
14:03 fdobridge_: <karolherbst🐧🦀> is this a new bug...
14:03 fdobridge_: <karolherbst🐧🦀> nah that's fine
14:04 fdobridge_: <karolherbst🐧🦀> anyway
14:04 fdobridge_: <karolherbst🐧🦀> `[ 5.033546] snd_hda_intel 0000:01:00.1: no codecs initialized`
14:04 fdobridge_: <karolherbst🐧🦀> 😄
14:04 karolherbst: RSpliet: it is...
14:04 Sid127: sooo
14:04 Sid127: hm
14:04 fdobridge_: <Sid> wait
14:04 Sid127: heck
14:04 Sid127: wait
14:04 Sid127: I have a dummy edid plug
14:05 RSpliet: Sid127: nah that would be connected to the Intel integrated GPU.
14:05 karolherbst: Sid127: the bug is simple: hda postpones loading for real to a later async point in time because $reasons. But at this point it can't unload itself anymore, so the driver gets loaded, but essentially never hooks up the runtime suspend stuff
14:05 karolherbst: soooo...
14:05 karolherbst: it's all broken
14:05 Sid127: no I know for a fact my laptop's hdmi port is wired to the nv card
14:05 karolherbst: it doesn't matter
14:05 karolherbst: the audio driver fails to load
14:05 karolherbst: it's a bug in the audio driver
14:05 Sid127: huh
14:06 karolherbst: actually two bugs
14:06 Sid127: ..then how do I have hdmi audio when I plug in a real display
14:06 karolherbst: 1. it doesn't initialize correctly 2. it doesn't gracefully fall back to a state where runpm would work
14:06 Sid127: on proprietary, at least
14:06 karolherbst: good question
14:06 karolherbst: _maybe_ nouveau needs to do something?
14:06 Sid127: because even there the audio controller is run by snd_hda_intel
14:07 karolherbst: yeah.. the 1. thing could be caused by nouveau but the 2. bug is a driver bug
14:07 karolherbst: in snd_hda_intel
14:07 RSpliet: IIRC there's some workaround code hidden in the PCI to tinker with some registers on NVIDIA cards to enable the audio device
14:07 Sid127: edid plug found!
14:07 karolherbst: there is a bit of audio stuff nouveau needs to do to make it all work proper... I suspect nvidia does something which we don't
14:08 karolherbst: yeah....
14:08 Sid127: brb
14:08 karolherbst: there are a few details to that
14:08 RSpliet: that workaround code may need a refresh for TU116? Dk, at this point that's a bit of a guess
14:09 Sid127: so karol
14:09 Sid127: guess what
14:09 Sid127: I rebooted with the edid plug plugged in
14:09 Sid127: and pavucontrol shows the audio controller now
14:10 Sid127: and sure enough
14:10 Sid127: now the audio controller is suspended, after I disabled it in pavucontrol
14:10 Sid127: however way is still keeping the dgpu active
14:10 Sid127: sway*
14:11 karolherbst: F
14:11 karolherbst: if you quit sway, does the GPU suspend then?
14:12 RSpliet: karolherbst: https://elixir.bootlin.com/linux/latest/source/drivers/pci/quirks.c#L5693 there's the quirk. Maybe this needs revisiting for newer cards?
14:12 Sid127: it does
14:13 Sid127: the gpu suspends
14:13 Sid127: power_state also reports D3cold
14:13 Sid127: I'll ask if one of my amd-laptop friends can reproduce the gpu not suspending issue, and file a bug report to sway
14:14 Sid127: but the audio thing, we need to handle on the kernel side
14:14 RSpliet: Sid127: do you have a dmesg with the EDID plug plugged in?
14:14 karolherbst: Sid127: congrats, you just hit 5 bugs
14:14 Sid127: gotta make sure audio initiaization doesn't fail if no display is plugged in
14:14 Sid127: karolherbst: hey I'm glad I'm able to be of help :D
14:14 Sid127: RSpliet: yup!
14:15 Sid127: karolherbst: though I wanna know what 5 bugs those are so I can brag to another friend ^^'
14:15 Sid127: sway thing is one
14:16 fdobridge_: <Sid> https://cdn.discordapp.com/attachments/1034184951790305330/1207329409124667433/dmesg-edid.log?ex=65df4022&is=65cccb22&hm=871aea3cda5e13d789ce3ad2183dfcf6e634c058a5de6484d086ef7b4c562a69&
14:16 Sid127: RSpliet ^ that's the log with edid plugged in
14:16 RSpliet: Sid127: yep. The difference between the two logs is that without the EDID plug plugged in, there's a log line saying
14:16 RSpliet: [ 0.412460] pci 0000:01:00.0: Enabling HDA controller
14:16 karolherbst: nah
14:17 karolherbst: you have that in the other as well
14:17 RSpliet: And with it plugged in, that line is absent, meaning that the BIOS, VBIOS or GSP did the initialisation
14:17 karolherbst: ehh wait
14:17 RSpliet: karolherbst: surely gedit wouldn't lie to me? :D
14:17 karolherbst: it's not there in the EDID huh
14:18 karolherbst: I missread that
14:18 karolherbst: _anyway_
14:18 karolherbst: it's of no significance
14:18 karolherbst: just means that the quirk might be faulty
14:19 karolherbst: I wonder if the firmware does things...
14:19 karolherbst: anyway, that line being printed basically means that the audio function was turned off when booting
14:20 RSpliet: Yeah. There's a chance that Turing needs *more* initialisation to expose the codecs too even if there's nothing plugged in to the HDMI port, so that snd_hda_intel can continue initialisation?
14:21 karolherbst: something like that
14:21 karolherbst: the quirk is run either way, just that `Enabling HDA controller` only gets printed if the check fails
14:21 karolherbst: so in the working case, that bit is enabled from the start
14:22 Sid127: fun
14:22 karolherbst: should be easy to search for it in the open driver tho
14:22 Sid127: so: 1. kernel is lying to me 2. sway is keeping my gpu from going into d3cold
14:22 Sid127: what else
14:22 Sid127: oh, do you want me to test gpu suspend with plasma?
14:22 karolherbst: 3. snd_hda_intel prevents runpm from working after failing to initialize
14:23 RSpliet: karolherbst: perhaps the init scripts in the VBIOS can shed some light onto this. Just an idea.
14:23 karolherbst: 4. nouveau/kernel don't enable HDA device properly
14:25 karolherbst: maybe...
14:25 Sid127: isn't the vbios signed though, how can we poke into the init scripts
14:26 RSpliet: Sid127: not suggesting to poke, just peek :D
14:26 Sid127: tomato potato, how do we see the vbios' innards :D
14:26 RSpliet: By grabbing it and then running it through envybios of course
14:26 RSpliet: Would be easy to see if there's a script that touches that register at 0x488, and see if it touches any other important registers in the same block of code
14:26 karolherbst: mhhh
14:27 Sid127: despite it being signed/encrypted?
14:27 Sid127: huh
14:27 karolherbst: would have to go through 0x88488
14:27 RSpliet: don't think it's encrypted, just signed
14:27 RSpliet: but... I've been out of the loop for a while :D
14:27 karolherbst: though
14:27 karolherbst: I doubt it's something in the vbios...
14:28 karolherbst: devinit isn't run by the firmware for secondary GPUs
14:29 RSpliet: There's a hint in something that runs before we run the PCI quirk code, which is before loading nouveau or the intel hda driver. VBIOS would be the logical place, but maybe the laptop vendor hard-wired something in the UEFI instead...
14:29 Sid127: about that
14:29 Sid127: my laptop is running a modded bios
14:29 RSpliet: uh oh
14:30 Sid127: I have access to a lot more than acer allows
14:30 Sid127: meaning I can poke around and look for things
14:30 Sid127: I haven't changed any default settings except enable aspm on the GPU and GPU resource bus, and enable above 4G decoding, and... patch my bios to force ReBAR on my gpu
14:30 Sid127: BUT
14:31 Sid127: I should be able to poke around to look for stuff
14:31 RSpliet: I need to tap out again, don't think my boss is too excited about me brainstorming on nouveau things. I think I've said all that I know anyway :-P
14:31 Sid127: also, karol, what would 5. be :D
14:31 RSpliet: Sid127: 5. it's an Acer laptop.
14:31 Sid127: rude
14:31 RSpliet: yeah, and unwarranted :D
14:32 Sid127: it's much more than that it's cursed and beautiful because I beat the firmware into submission
14:32 RSpliet: sorry, only heard bad things about their handling of warranty repairs (unwarranted was a pun indeed), I'm sure the hardware is okay
14:32 Sid127: I have ReBAR on a GTX Turing card
14:32 Sid127: nvidia hates me rn
14:32 RSpliet: it's nothing personal
14:32 RSpliet: they hate everyone equally
14:33 Sid127: no they hate me for getting a feature they officially enable only on the 30 series and 40 series on my dingy little 1660ti :>
14:33 Sid127: which has been a pcie feature since pcie 2.0
14:33 Sid127: but anyway, we digress
14:35 Sid127: let me install plasma and see if my gpu doesn't suspend there either
14:38 Sid127: I know it suspends fine on x11, iirc
14:39 Sid127: plasma wayland.. no idea
14:45 Sid127: I bring news
14:45 Sid127: gpu suspends just fine on plasma wayland
14:45 karolherbst: yeah
14:45 karolherbst: it's a sway bug for sure
14:45 Sid127: I have a feeling it's more of a wlroots thing
14:45 karolherbst: or that
14:45 Sid127: but I'm in no mood to confirm that right now
14:45 Sid127: hyprland yucky
14:45 karolherbst: it might also be that the access pattern does not prevent runpm on other drivers
14:45 karolherbst: but...
14:46 karolherbst: anyway.. render node being open and not suspending? guess you submit some work then
14:46 Sid127: ...hmm?
14:46 karolherbst: well.. if an application keeps a renderer node open and the GPU doesn't suspend, I automatically assume it's the apps fault
14:46 karolherbst: until proven otherwise
14:47 karolherbst: there is a timeout happening anyway, and we hit it on certain UAPIs calls
14:47 karolherbst: soo.. sway directly or indirectly does something to call nouveau UAPI with refreshes the timeout
15:00 Sid127: also, bug #5: if gsp is enabled, vga controller does not suspend even if there's no load on it
15:12 fdobridge_: <gfxstrand> That sounds... bad. 😬
15:13 fdobridge_: <gfxstrand> Maybe NVK can be the first non-broken modifiers implementation. 😂
15:15 Sid127: I think I figured out why proprietary driver does not suspend the gpu on wayland
15:15 Sid127: I see a single xwayland listing on lsof /dev/dri/render* on the dGPU
15:29 fdobridge_: <gfxstrand> That sounds plausible. IIRC, the prop driver does suspend properly, but you can't have ANY open instances of `/dev/dri/foo` or it won't.
15:29 fdobridge_: <gfxstrand> At one point there was also some config flag required
15:33 fdobridge_: <Sid> I know for a fact it can suspend on x11 if the conditions are "right"
15:34 fdobridge_: <Sid> the readme has a whole page dedicated to rtd3 on x11
15:35 fdobridge_: <Sid> currently trying to see how I can disable xwayland
15:35 fdobridge_: <Sid> to see if doing that makes it suspend on wl
15:36 fdobridge_: <rinlovesyou> Suspending Nvidia proprietary is so awesome because torch will become oblivious that the gpu exists or something
15:52 fdobridge_: <Sid> ok, snd_hda_intel fails to initialize without the edid plug even on proprietary driver
15:53 fdobridge_: <Sid> meaning we can rule out bug #4
15:55 fdobridge_: <Sid> OH MY GOD IT IS XWAYLAND CAUSING PROPRIETARY TO NOT BE ABLE TO SUSPEND ON WAYLAND
15:57 fdobridge_: <marysaka> Why am I not surprise
15:57 fdobridge_: <Sid> it's not even nvidia's fault 😭
15:58 fdobridge_: <Sid> as soon as I killed xwayland (5 times before plasma stopped auto-restarting it)
15:58 fdobridge_: <Sid> my gpu suspended
15:59 fdobridge_: <Sid> why does xwayland keep a "node" open on the proprietary driver
15:59 fdobridge_: <tom3026> thats odd, ive had it working for quite some time on wayland too
15:59 fdobridge_: <Sid> I must investigate this
15:59 fdobridge_: <Sid> are you sure
16:00 fdobridge_: <tom3026> yeah /proc/driver/nvidia/gpus/0000:01:00.0/power video memory turns off when not in use
16:00 fdobridge_: <tom3026> /sys/class/drm/card1/device/power/control is also set to auto on boot
16:00 fdobridge_: <Sid> ```
16:00 fdobridge_: <Sid> [sidpr@strogg ~]$ cat /proc/driver/nvidia/gpus/0000\:01\:00.0/power
16:00 fdobridge_: <Sid> Runtime D3 status: Enabled (fine-grained)
16:00 fdobridge_: <Sid> Video Memory: Off
16:00 fdobridge_: <Sid>
16:00 fdobridge_: <Sid> GPU Hardware Support:
16:01 fdobridge_: <Sid> Video Memory Self Refresh: Supported
16:01 fdobridge_: <Sid> Video Memory Off: Supported
16:01 fdobridge_: <Sid> [sidpr@strogg ~]$ env | grep -iE type
16:01 fdobridge_: <Sid> XDG_SESSION_TYPE=wayland
16:01 fdobridge_: <Sid> ```
16:01 fdobridge_: <tom3026> "Video Memory: off"
16:01 fdobridge_: <Sid> what does `lsof /dev/dri/render*` say
16:01 fdobridge_: <Sid> and what de are you on
16:01 fdobridge_: <tom3026> kde, well right now i got an external monitor hooked up so its on :p
16:02 fdobridge_: <Sid> weird
16:02 fdobridge_: <tom3026> calling nvidia-smi wakes it also if you use any widgets in kde's system monitor
16:03 fdobridge_: <Sid> that I know
16:03 fdobridge_: <Sid> and I don't use kde's system monitor at all
16:03 fdobridge_: <Sid> but I did see an xwayland node on the gpu for some reason, and killing it allowed the gpu to suspend
16:04 fdobridge_: <tom3026> you could have vram leaked, it has some treshold setting
16:04 fdobridge_: <Sid> right off a fresh boot?
16:04 fdobridge_: <tom3026> hm that i doubt :p
16:04 fdobridge_: <Sid> I'm not even using sddm
16:04 fdobridge_: <Sid> I'm using ly
16:05 fdobridge_: <Sid> so not like my display manager is running on x11
16:05 fdobridge_: <tom3026> mine is
16:05 fdobridge_: <Sid> so weird
16:09 fdobridge_: <Sid> ```
16:09 fdobridge_: <Sid> Xwayland 758 sidpr 9u CHR 226,129 0t0 837 /dev/dri/renderD129
16:09 fdobridge_: <Sid> Xwayland 758 sidpr 11u CHR 226,128 0t0 792 /dev/dri/renderD128```
16:09 fdobridge_: <Sid> see
16:09 fdobridge_: <tom3026> ```
16:09 fdobridge_: <tom3026> #nvidia sucks
16:09 fdobridge_: <tom3026> ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/usr/bin/nvidia-modprobe -c0 -u"
16:09 fdobridge_: <tom3026>
16:09 fdobridge_: <tom3026> # Disable wake on lan.
16:09 fdobridge_: <tom3026> ACTION=="add", SUBSYSTEM=="net", KERNEL=="wl*", RUN+="/usr/bin/iw dev $name set power_save on"
16:09 fdobridge_: <tom3026>
16:09 fdobridge_: <tom3026> # Enable runtime PM for all pci devices
16:09 fdobridge_: <tom3026> SUBSYSTEM=="pci", ATTR{power/control}="auto"
16:09 fdobridge_: <tom3026>
16:09 fdobridge_: <tom3026> # Remove NVIDIA USB xHCI Host Controller devices, if present
16:09 fdobridge_: <tom3026> ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c0330", ATTR{remove}="1"
16:09 fdobridge_: <tom3026>
16:09 fdobridge_: <tom3026> # Remove NVIDIA USB Type-C UCSI devices, if present
16:09 fdobridge_: <tom3026> ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c8000", ATTR{remove}="1"
16:09 fdobridge_: <tom3026>
16:09 fdobridge_: <tom3026> # Remove NVIDIA Audio devices, if present
16:09 fdobridge_: <tom3026> ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x040300", ATTR{remove}="1"
16:09 fdobridge_: <tom3026>
16:09 fdobridge_: <tom3026> # Enable runtime PM for NVIDIA VGA/3D controller devices on driver bind
16:09 fdobridge_: <tom3026> ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto"
16:09 fdobridge_: <tom3026> ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="auto"
16:09 fdobridge_: <tom3026>
16:09 fdobridge_: <tom3026> # Disable runtime PM for NVIDIA VGA/3D controller devices on driver unbind
16:09 fdobridge_: <tom3026> ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="on"
16:09 fdobridge_: <tom3026> ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="on"
16:09 fdobridge_: <tom3026> ```
16:09 fdobridge_: <Sid> node on dGPU (129)
16:10 fdobridge_: <tom3026> thats my convulated PM udev rule :p
16:10 fdobridge_: <Sid> I don't remove any devices
16:10 fdobridge_: <Sid> but I do enable runtime pm for them all
16:10 fdobridge_: <tom3026> you might need/want as far as i understand they could keep it awake
16:10 fdobridge_: <Sid> they don't
16:11 fdobridge_: <Sid> only the audio controller does and I discovered why thanks to karol today
16:11 fdobridge_: <Sid> might try to dive into the code for it and fix it myself
16:11 fdobridge_: <tom3026> i also set a bunch of env vars to really force things to the igpu,
16:11 fdobridge_: <tom3026> ```
16:11 fdobridge_: <tom3026> export DXVK_FILTER_DEVICE_NAME="Intel"
16:11 fdobridge_: <tom3026> export VKD3D_FILTER_DEVICE_NAME="Intel"
16:11 fdobridge_: <tom3026> export MESA_VK_DEVICE_SELECT="8086:a788"
16:11 fdobridge_: <tom3026> export __GLX_VENDOR_LIBRARY_NAME="mesa"
16:11 fdobridge_: <tom3026> export __EGL_VENDOR_LIBRARY_FILENAMES="/usr/share/glvnd/egl_vendor.d/50_mesa.json"
16:11 fdobridge_: <tom3026> export __NV_PRIME_RENDER_OFFLOAD="0"
16:11 fdobridge_: <tom3026> export __VK_LAYER_NV_optimus="non_NVIDIA_only"
16:11 fdobridge_: <tom3026> export LIBVA_DRIVER_NAME="iHD"
16:11 fdobridge_: <tom3026> export VDPAU_DRIVER="va_gl"
16:12 fdobridge_: <tom3026> export WLR_RENDER_DRM_DEVICE="/dev/dri/renderD128"
16:12 fdobridge_: <tom3026> ```
16:12 fdobridge_: <tom3026> and then "prime-run" script pretty much reverses it
16:12 fdobridge_: <Sid> and the workaround to the audio controller keeping it awake for me is to just add in a dummy eDID plug
16:12 fdobridge_: <Sid> oh, how do you have external display working with __EGL_VENDOR_LIBRARY_FILENAMES set
16:12 fdobridge_: <Sid> on wayland
16:12 fdobridge_: <tom3026> yeah
16:13 fdobridge_: <Sid> weird answer to "how" but okay 😅
16:13 fdobridge_: <tom3026> well i guess technically its being set by kwin so it gets set after the fact the compositor has launched :p
16:13 fdobridge_: <Sid> also for plasma it'll be KWIN_DRM_DEVICE
16:13 fdobridge_: <tom3026> ~/.config/plasma-workspace/env/env.sh
16:13 fdobridge_: <Sid> WLR_DRM_DEVICE is for wlroots based
16:13 fdobridge_: <tom3026> yeh
16:15 fdobridge_: <tom3026> https://i.imgur.com/HNf2MWH.jpeg dont ask me "how" it just does
16:15 fdobridge_: <tom3026> 😄
16:17 fdobridge_: <Sid> weird
16:18 fdobridge_: <Sid> oh wait I know why
16:18 fdobridge_: <Sid> your external display port is wired to the iGPU
16:18 fdobridge_: <tom3026> nope
16:18 fdobridge_: <Sid> while mine is to the dGPU
16:19 fdobridge_: <tom3026> hdmi and usb-c alt mode or what its called both goes to the dgpu
16:19 fdobridge_: <Sid> if dGPU was driving the second monitor, there'd be more than just xorg node on nvidia-smi
16:22 fdobridge_: <Sid> ok yeah
16:22 fdobridge_: <Sid> __EGL_VENDOR_LIBRARY_FILENAMES stopped the xwayland node from being created on the dGPU
16:23 fdobridge_: <Sid> and now even I have runtime d3 on wayland
16:23 fdobridge_: <Sid> wtf
16:25 fdobridge_: <Sid> wow
16:26 fdobridge_: <tom3026> https://admin.pci-ids.ucw.cz/read/PC/10de https://i.imgur.com/kk9QXVf.png
16:26 fdobridge_: <tom3026> so yeah, no idea how. it just does
16:29 klf434: Hi everyone. Is there any way to know the temperature of my gpu? I use nouveau & GSP support. lm_sensors doesn't seem to be working, I see a lot of "nvidia-gpu 0000:01:00.3: i2c timeout error e0000000" in my dmesg
16:35 Sid127: klf434: unfortunately, not yet
16:36 klf434: :(
16:36 klf434: Thank you
17:04 fdobridge_: <Sid> @karolherbst so, the 5 issues
17:04 fdobridge_: <Sid> 1. kernel is lying to me when it says "enabled hda controller" for the one on the gpu, because the hda controller is actually failing to initialize if there's no display connected to the gpu.
17:04 fdobridge_: <Sid> 2. sway is keeping my gpu from suspending to save power
17:04 fdobridge_: <Sid> 3. snd_hda_intel prevents nouveau runpm from working after failing to initialize
17:04 fdobridge_: <Sid> 4. nouveau/kernel doesn't enable HDA device properly
17:04 fdobridge_: <Sid> 5. when gsp is enabled, the display controller does not suspend even if there's no load on the gpu
17:04 fdobridge_: <karolherbst🐧🦀> 1. and 4. are the same thing
17:04 fdobridge_: <Sid> #4 is not a nouveau issue, because even proprietary driver has issue #1
17:05 fdobridge_: <karolherbst🐧🦀> but can you runpm with the prop driver?
17:05 fdobridge_: <Sid> yes
17:05 fdobridge_: <karolherbst🐧🦀> and audio works?
17:05 fdobridge_: <Sid> needs edid plug on boot for controller to show up in pavucontrol
17:06 fdobridge_: <karolherbst🐧🦀> okay, so the same issue as with nouveau?
17:06 fdobridge_: <Sid> yes
17:06 fdobridge_: <Sid> 100% same
17:06 fdobridge_: <karolherbst🐧🦀> pain
17:06 fdobridge_: <Sid> issue #2, however, is a bit of a funny one
17:06 fdobridge_: <Sid> in that it's partly a sway/wlroots issue, and partly a mesa issue
17:06 fdobridge_: <karolherbst🐧🦀> btw, the "5" was a guess, it just felt like 5 :ferrisUpsideDown:
17:06 fdobridge_: <Sid> I know :P
17:07 fdobridge_: <Sid> I set this to 50_mesa.json
17:07 fdobridge_: <Sid> `__EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/50_mesa.json`
17:07 fdobridge_: <Sid> and now
17:07 fdobridge_: <Sid> on proprietary driver, sway does not create the node on the dGPU
17:07 fdobridge_: <Sid> because proprietary driver's EGL vendor file is 10_nvidia.json
17:08 fdobridge_: <Sid> but since every mesa driver uses the same single egl library, sway is creating nodes on all gpus that library is for
17:08 fdobridge_: <Sid> which, on nouveau, includes both my iGPU and my dGPU
17:09 fdobridge_: <Sid> and in that case, this would also be an issue on intel + amd laptops
17:09 fdobridge_: <Sid> or amd + nv laptops
17:09 fdobridge_: <Sid> or amd + amd laptops
17:09 fdobridge_: <Sid> or multi-gpu mesa-powered desktops
17:09 fdobridge_: <Sid> *also*, the sway "node" is not actually sway, but xwayland
17:10 fdobridge_: <Sid> or, at least that's what it is on plasma wayland
17:15 fdobridge_: <Sid> ....wait I'm dumb
17:15 fdobridge_: <Sid> I keep forgetting I'm using a sway fork
17:16 fdobridge_: <Sid> ok, can confirm, problem does not exist in upstream sway
17:17 fdobridge_: <Sid> sorry e-e
17:19 fdobridge_: <Sid> so,
17:19 fdobridge_: <Sid> - snd_hda_intel needs to be figured out on how to make it not fail to initialize if there's no display
17:19 fdobridge_: <Sid> - nouveau runpm needs to be able to run even if snd_hda_intel fails, though it's unlikely this'll happen since it's the same pci device
17:19 fdobridge_: <Sid> - runpm is not working on the vga controller if gsp is enabled
17:28 fdobridge_: <karolherbst🐧🦀> pain
17:30 fdobridge_: <karolherbst🐧🦀> Soo.. I'd put other two issues for the first two: 1. _if_ `snd_hda_intel` fails to initialize or doesn't find any codecs it shouldn't prevent runpm from working and 2. we need to figure out why it fails on your end in the first place. The first one doesn't necessarily have to happen due to a bug anywhere, it's just that if the driver figures the sound device to be of no use, it just bails in a way that the device is now in a limb
17:31 fdobridge_: <karolherbst🐧🦀> if you `rmmod snd_hda_intel` then your GPU would runpm again
17:31 fdobridge_: <karolherbst🐧🦀> or well..
17:31 fdobridge_: <karolherbst🐧🦀> if you unbind the audio device from it
17:32 fdobridge_: <karolherbst🐧🦀> this should also make it work: `echo "0000:01:00.1" > /sys/bus/pci/drivers/snd_hda_intel/unbind`
17:32 fdobridge_: <karolherbst🐧🦀> maybe the solution is for `snd_hda_intel` to simply unbind in a cursed way, because how it initializes devices is just broken tbh
17:33 fdobridge_: <karolherbst🐧🦀> and people generally don't report that as most people don't really notice
17:37 fdobridge_: <Sid> mhm
17:37 fdobridge_: <Sid> except then if I plug in an external display
17:37 fdobridge_: <Sid> hdmi audio won't work
17:37 fdobridge_: <karolherbst🐧🦀> correct
17:37 fdobridge_: <karolherbst🐧🦀> but
17:37 fdobridge_: <karolherbst🐧🦀> mhhh
17:37 fdobridge_: <karolherbst🐧🦀> soo uhm...
17:37 fdobridge_: <Sid> :P
17:38 fdobridge_: <karolherbst🐧🦀> what happens if you unbind
17:38 fdobridge_: <karolherbst🐧🦀> plug in your HDMI thing
17:38 fdobridge_: <karolherbst🐧🦀> and then bind
17:38 fdobridge_: <Sid> how do I rebind
17:38 fdobridge_: <karolherbst🐧🦀> echo into `bind` instead of `unbind`
17:39 fdobridge_: <Sid> controller shows up in pavucontrol
17:39 fdobridge_: <karolherbst🐧🦀> funky
17:40 fdobridge_: <karolherbst🐧🦀> I have no idea what's the issue here then 😄
17:40 fdobridge_: <Sid> and is properly suspended too
17:40 fdobridge_: <karolherbst🐧🦀> it can be some weirdo interactions between nouveau and `snd_hda_intel` at play here
17:40 fdobridge_: <karolherbst🐧🦀> or the audio device just being extra weird
17:41 fdobridge_: <Sid> well, the issue is that if there's no edid info, binding fails
17:41 fdobridge_: <Sid> and, this is a thing on proprietary drivers too
17:42 fdobridge_: <Sid> so I think it's just more of a kernel and `snd_hda_intel` thing than nouveau
17:43 fdobridge_: <karolherbst🐧🦀> I see...
17:43 fdobridge_: <karolherbst🐧🦀> but well..
17:43 fdobridge_: <karolherbst🐧🦀> the audio driver doesn't really know about the edid stuff, does it?
17:44 fdobridge_: <karolherbst🐧🦀> I don't really know how those interactions work exactly
17:44 fdobridge_: <Sid> ..right
17:44 fdobridge_: <karolherbst🐧🦀> but the video driver is supposed to communicate some stuff to the audio one
17:44 fdobridge_: <Sid> yeah
17:44 fdobridge_: <Sid> I wonder if amdgpu has the same problem
17:44 fdobridge_: <karolherbst🐧🦀> dunno 🙂
17:46 fdobridge_: <Sid> it is! https://bugzilla.kernel.org/show_bug.cgi?id=213291
17:46 fdobridge_: <karolherbst🐧🦀> yeah...
17:47 fdobridge_: <karolherbst🐧🦀> fundamentally `snd_hda_intel` has to stop preventing runpm in that case 😛
17:47 fdobridge_: <karolherbst🐧🦀> fixing audio is just uhhh.. the nice oo have here 😄
17:47 fdobridge_: <Sid> I have a stupid thought
17:47 fdobridge_: <karolherbst🐧🦀> *to
17:47 fdobridge_: <Sid> what if this is just a race condition
17:47 fdobridge_: <karolherbst🐧🦀> it's not
17:47 fdobridge_: <Sid> and snd_hda_intel is loading sooner
17:47 fdobridge_: <karolherbst🐧🦀> nah
17:47 fdobridge_: <karolherbst🐧🦀> the driver is cursed
17:47 fdobridge_: <karolherbst🐧🦀> it postpones loading until a later point
17:48 fdobridge_: <karolherbst🐧🦀> so the device gets assigned to the driver and it just hand waves on it and says "it loaded fine" and then later breaks in a kworker thread where it can't actually tell the kernel it failed to load
17:50 fdobridge_: <Sid> hm
17:51 fdobridge_: <karolherbst🐧🦀> not sure I'm able to write a patch, but maybe the driver could just unbind the device in that case? dunno what's the kernel API for that, but might be possible
17:51 fdobridge_: <karolherbst🐧🦀> and if they refuse the patch, we just land it anyway through drm :ferrisUpsideDown:. This issue is kinda old (like 5+ years) and nothing seems to happen
17:53 fdobridge_: <Sid> would it not be possible to bind only if/when a display is connected?
17:53 fdobridge_: <Sid> kinda like how the system loads a kernel module when it is required
17:53 fdobridge_: <karolherbst🐧🦀> no
17:54 fdobridge_: <karolherbst🐧🦀> because the driver doesn't really know
17:54 fdobridge_: <karolherbst🐧🦀> it's really just to fix the error case for now
17:54 fdobridge_: <karolherbst🐧🦀> like if something bad happens, it should at least not break runpm for us
17:54 fdobridge_: <karolherbst🐧🦀> it's not even specific to your issue
17:54 fdobridge_: <karolherbst🐧🦀> if it fails for _any_ reasons you will end up in that state
17:56 fdobridge_: <Sid> yeah
17:56 fdobridge_: <Sid> that's fair
17:57 fdobridge_: <karolherbst🐧🦀> I mean.. generally it's fine to not work around a failing driver, but audio on DP/HDMI is often such a niche use case on laptops, that breaking runpm is kinda a big enough deal to care here
17:59 fdobridge_: <Sid> yeah
19:48 fdobridge_: <airlied> there is also some bullshit where Windows got confused with dp/hdmi audio on laptops and lots of laptops turned it off
22:39 fdobridge_: <!DodoNVK (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1207456162413420595/Screenshot_20240215_003605.png?ex=65dfb62e&is=65cd412e&hm=63e8adfe19b11e33d0de36cf13091843d985d783fa5ed290e472503235e37ad4&