10:58 Quibus: Hi all. I'm experiencing random spontaneous resets on my PC since a few weeks. Is this a known issue with the AMDGPU drivers?
10:58 Quibus: It mostly seems to happen while I'm watching video, but it's pretty rare, only happens about once a day at most
14:56 superkuh: Not really a "known issue". What do your syslog logs say about it?
15:47 Quibus: superkuh: nothing, there is no trace of this in any log file. The PC just suddenly shows a black screen and then the POST
15:49 superkuh: I wouldn't immediately assume it's a GPU thing then.
15:50 superkuh: Try doing some Prime95/mprime stress tests and see if you can trigger the reset.
15:51 superkuh: youtube's javascript web application is very CPU intensive.
15:51 Quibus: Well, it's a big CPU :-)
15:52 Quibus: I regularly compile C++ code with 32 parallel processes, but it has never happened then.
15:59 Ristovski: Quibus: Are you running any overclock and/or undervolt?
15:59 Quibus: Ristovski: I've been using the same settings for over a year
15:59 Ristovski: Quibus: What CPU and GPU?
16:00 Quibus: model name : AMD Ryzen 9 7950X 16-Core Processor
16:00 Quibus: No extra GPU, just the AMDGPU built in
16:01 Ristovski: I have a 5700G and I get the exact same thing as you when I push the corve optimizer undervolt a bit too low, hence why I am asking
16:01 Quibus: Strange that it started only a few weeks ago then
16:02 Ristovski: Does that imply that you indeed have set the curve optimizer offsets?
16:02 Quibus: I don't remember exactly what I set up...
16:03 Ristovski: In any case, it _should_ log a Machine Check Exception (MCE) in the kernel log on the next boot if you let it boot directly into Linux (as opposed to entering bios after the reset)
16:04 Quibus: I'll check that
16:05 Ristovski: It should be logged quite early as "mce: [Hardware Error]:" in the kernel log on the next boot after the reset
16:05 Quibus: The offsets are on about -15, but I'll have to check the details
16:06 Ristovski: If you do end up seeing an MCE, there is a way to decode which core caused the reset, you can then switch from an all-core curve optimizer offset to a per-core one and dial that "bad" core down a bit
16:06 Quibus: Should be in journalctl output, right?
16:07 Ristovski: Yes
16:07 Quibus: sep 06 22:48:00 creator kernel: MCE: In-kernel MCE decoding enabled.
16:07 Quibus: that's the only thing about MCE
16:09 Ristovski: Youre checking case insensitive, right? (kernel message would lower case "mce")
16:09 Quibus: Nothing with lower case mce
16:10 Quibus: sudo journalctl | grep -i mce
16:10 Quibus: that's what I used
16:11 Ristovski: Can you confirm from the logged date that its indeed the boot session thats right after the reset?
16:12 Ristovski: actually can't recall if journalctl with no extra params logs _all_ boot sessions by default
16:26 Quibus: it does log all boot sessions
16:35 Quibus: with -b 1 only the last one
16:35 Quibus: sorry -b -1
16:37 Ristovski: Hmm, out of ideas then
19:00 Quibus: Ristovski: thanks for trying
20:50 Quibus: That was another reset :(