17:31 Onorick: Is there an easy way to trigger a GPU reset? I sometime have full graphical freeze, the only way to get my system back is usually Alt+SysRq+e, but I wonder if triggering a GPU reset could also save it.
17:40 _ds_: Onorick, defaults for the GPU reset time-outs are be 10s for non-compute and 60s for compute, but it'd help to mention what GPU. Also desktop (X or Wayland, versions) & kernel as that can affect things too.
17:41 _ds_: Used to be that I'd have to restart X manually after a hang & reset, whereas these days it automatically exits.
17:46 Onorick: _ds_: Well, I believe troubleshooting why my problem happen to be harder than figuring a way to minimize the impact. I suspect a hardware problem.
17:46 Onorick: It's a 6700 XT, I use wayland (1.22.0) with wlroots (0.16.2) and sway (1.8.1) on linux-zen (6.4.2) it happened since I got that GPU, around 3-4 months ago.
17:46 Onorick: Whenever I'm playing, I can randomly get full graphical freeze with audio and everything else still working, the only kernel message is "RT throttling activated".
17:47 _ds_: That doesn't sound like a reset is happening or even needed to me.
17:47 _ds_: RT – real-time?
17:47 Onorick: Well no reset are happening, but I would like to trigger one to see if that will unfreeze everything.
17:48 _ds_: I think that you'll need to write to /sys/class/drm/card0/device/ (for appropriate values of 0).
17:48 Onorick: Yes, the kernel message is about real time, but I'm not sure if it's a symptom rather than a consequence.
17:49 Onorick: And as the graphical part being the only thing not working, I'm guessing it's more GPU related.
17:49 _ds_: 6600XT here. I've seen GPU resets happen, and your RT message looks to me like a symptom of something else.
17:50 _ds_: I'd check if the freeze still happens with RT disabled.
17:50 Onorick: Oh yes, I also had GPU resets but that another issue, I just want to trigger one to see if that would unstuck the graphical freeze.
17:51 _ds_: … hmm, incomplete path – should be, I think, /sys/class/drm/card0/device/reset
17:51 _ds_: (probably best to ssh in to write to that)
17:52 Onorick: I already tried to reduce allowed RT, but even then, the thing that use RT like audio are still working while the graphic size if frozen.
17:52 Onorick: Also I can't even change TTY, only sysrq seem to work.
18:01 _ds_: You could do a test reset now then write a script which watches the kernel log for that RT message and does the GPU reset automatically.
18:04 _ds_: (I'd probably pipe “dmesg -w” output into expect.)
18:06 _ds_: … also, I'd normally expect Alt-SysRq-K to be a better choice than Alt-SysRq-E…
18:12 Onorick: _ds_: Sorry was not on my desktop for a bit. Yes, that my plan, well it use awk to detect the RT throttling line and right now only try to kill wine. But I didn't check if that worked yet, GPU reset is my next test.
18:13 Onorick: Well Alt-SysRq-E was because it is part of REISUB.
18:16 Onorick: I also tried multiple Alt-SysRq-F, it seemed to only work after killing the game but then, got spammed with those kernel messages:
18:16 Onorick: amdgpu 0000:03:00.0: amdgpu: failed to get a new IB (-512)
18:16 Onorick: amdgpu 0000:03:00.0: amdgpu: failed to clear page tables on GEM object close (-512)