00:02 fdobridge: <c​langcat> What is simple drm?
00:07 killmlana: "simpledrm provides a minimal drm driver running on the bios provided framebuffer (EFI/VGA) to have a fallback driver for wayland in case no gpu specific drm driver is available". Ahh nvm the simpledrm error is a consequence of gpu not being detected
00:20 dwfreed: does it show in lspci?
00:22 fdobridge: <c​langcat> Yea even then that shouldn't have an effect on your driver. What happens if you call `modprobe nouveau` inside Linux as it's strange no logs show up
03:16 yusisamerican: I noticed nvk_CmdCopyBuffer2 does GPU copies of buffers up to (1 << 17) bytes and then wraps around until its done with the whole set of buffers. This is in tune with how its done for NVC0 on the gallium driver but not NVE4 on the gallium driver, on there, there is no wrap at all and the buffer is copied wholesale. Is the wrap necessary on NVE4+? I modified it to remove the wrap on NVK and it at least seems to copy the whole b
13:23 pabs: is there a way to reset hung nvidia GPUs without rebooting?
13:32 fdobridge: <S​id> in theory you can do a PCIe reset on the GPU, but you need to be able to ssh into the machine for that
14:16 pabs: I have SSH into the machine in this case
14:19 pabs: is there a command-line tool or something in sysfs I can toggle to trigger a reset?
14:21 fdobridge: <m​agic_rb.> Its i think `/sys/bus/pci/.../unbind` and then `/sys/bus/pci/rescan` i think
14:21 fdobridge: <m​agic_rb.> Im not a computer to verify
14:22 fdobridge: <S​id> correct
14:22 fdobridge: <S​id> echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
14:22 fdobridge: <S​id> and echo 1 > /sys/bus/pci/rescan
14:22 fdobridge: <S​id> of course, replace the PCIe ID with that of your GPU
14:23 fdobridge: <m​agic_rb.> Yay! I was close
14:23 pabs: remove rather than unbind?
14:23 fdobridge: <m​agic_rb.> Im not at a computer to verify (edited)
14:23 fdobridge: <S​id> unbind only unbinds it from the driver
14:23 fdobridge: <S​id> remove drops it from the pcie bus
14:24 fdobridge: <S​id> s\/driver/kernel module
14:24 fdobridge: <m​agic_rb.> (Drop should also reset the firmware)
14:24 fdobridge: <m​agic_rb.> (Remove should also reset the firmware) (edited)
14:26 pabs: thanks, will try that next time it happens. hopefully better than a reboot.
14:27 pabs:wonders if desktops (like GNOME) will get killed or continue hanging
14:27 fdobridge: <m​agic_rb.> I would expect it to crash? The device should disappear from underneath it
14:28 fdobridge: <m​agic_rb.> If not `kill -9` it :)
14:29 pabs: with the GPU hang I can ssh in and reboot, but then reboot doesn't complete. so presumably GNOME/gdm/something is hanging too
14:29 fdobridge: <m​agic_rb.> If userspace hangs itll eventually time out, if it doesnt its noy userspace but kernel space
14:30 fdobridge: <m​agic_rb.> And also, if you have a watchdog available, even firmware, enable it, itll hard reset the system in case reboot hangs
14:30 fdobridge: <m​agic_rb.> Ofc a BMC watchdog is better but you get that only on servers
14:31 pabs: so not sure if kill -9 actually works, guess I should try it properly