10:58Quibus: Hi all. I'm experiencing random spontaneous resets on my PC since a few weeks. Is this a known issue with the AMDGPU drivers?
10:58Quibus: It mostly seems to happen while I'm watching video, but it's pretty rare, only happens about once a day at most
14:56superkuh: Not really a "known issue". What do your syslog logs say about it?
15:47Quibus: superkuh: nothing, there is no trace of this in any log file. The PC just suddenly shows a black screen and then the POST
15:49superkuh: I wouldn't immediately assume it's a GPU thing then.
15:50superkuh: Try doing some Prime95/mprime stress tests and see if you can trigger the reset.
15:51superkuh: youtube's javascript web application is very CPU intensive.
15:51Quibus: Well, it's a big CPU :-)
15:52Quibus: I regularly compile C++ code with 32 parallel processes, but it has never happened then.
15:59Ristovski: Quibus: Are you running any overclock and/or undervolt?
15:59Quibus: Ristovski: I've been using the same settings for over a year
15:59Ristovski: Quibus: What CPU and GPU?
16:00Quibus: model name : AMD Ryzen 9 7950X 16-Core Processor
16:00Quibus: No extra GPU, just the AMDGPU built in
16:01Ristovski: I have a 5700G and I get the exact same thing as you when I push the corve optimizer undervolt a bit too low, hence why I am asking
16:01Quibus: Strange that it started only a few weeks ago then
16:02Ristovski: Does that imply that you indeed have set the curve optimizer offsets?
16:02Quibus: I don't remember exactly what I set up...
16:03Ristovski: In any case, it _should_ log a Machine Check Exception (MCE) in the kernel log on the next boot if you let it boot directly into Linux (as opposed to entering bios after the reset)
16:04Quibus: I'll check that
16:05Ristovski: It should be logged quite early as "mce: [Hardware Error]:" in the kernel log on the next boot after the reset
16:05Quibus: The offsets are on about -15, but I'll have to check the details
16:06Ristovski: If you do end up seeing an MCE, there is a way to decode which core caused the reset, you can then switch from an all-core curve optimizer offset to a per-core one and dial that "bad" core down a bit
16:06Quibus: Should be in journalctl output, right?
16:07Ristovski: Yes
16:07Quibus: sep 06 22:48:00 creator kernel: MCE: In-kernel MCE decoding enabled.
16:07Quibus: that's the only thing about MCE
16:09Ristovski: Youre checking case insensitive, right? (kernel message would lower case "mce")
16:09Quibus: Nothing with lower case mce
16:10Quibus: sudo journalctl | grep -i mce
16:10Quibus: that's what I used
16:11Ristovski: Can you confirm from the logged date that its indeed the boot session thats right after the reset?
16:12Ristovski: actually can't recall if journalctl with no extra params logs _all_ boot sessions by default
16:26Quibus: it does log all boot sessions
16:35Quibus: with -b 1 only the last one
16:35Quibus: sorry -b -1
16:37Ristovski: Hmm, out of ideas then
19:00Quibus: Ristovski: thanks for trying
20:50Quibus: That was another reset :(