Hi,
I experience random reboots (one in a few days) running Fedora-31 on my Ryzen-3700x.
All components are not overclocked, survive a multicore memtest86 run (whole night) without errors and thermals are no problem either.
However, although the system is perfectly stable running Windows-10 I get reboots running linux every now and then.
After a crash/reboot; I see the following rebport about a machine check exception:
[ 0.105003] x86: Booting SMP configuration:
[ 0.105003] .... node #0, CPUs: #1 #2
[ 0.107022] mce: [Hardware Error]: Machine check events logged
[ 0.107023] mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 5: bea0000000000108
[ 0.107092] mce: [Hardware Error]: TSC 0 ADDR 7f80a0c0181a MISC d012000100000000 SYND 4d000000 IPID 500b000000000
[ 0.107167] mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1580717835 SOCKET 0 APIC 4 microcode 8701013
[ 0.107241] #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15
[ 0.123023] smp: Brought up 1 node, 16 CPUs
AMD support was not helpful at all, first trying to deny linux support (we provide drivers for windows 10 only), and now, after I asked them to decode the MCE they are stalling.
Nevermind, I decoded the values myself and the result is:
[ce@localhost MCE-Ryzen-Decoder]$ ./run.py 5 bea0000000000108
Bank: Execution Unit (EX)
Error: Watchdog Timeout error (WDT 0x0)
Since this seems to be power-management related, I tried to set processor.max_cstate=5 - however it didn't help, same MCE happend a few hours later.
I also tried idle=nomwait, however this is said to be required for Zen1 only (and after all, shouldn't have this been fixed in silicon long ago).
Does anybody experience similar / related issues?
Any ideas how to fix this issue are highly welcomed.
Best regards, Clemens
I experience random reboots (one in a few days) running Fedora-31 on my Ryzen-3700x.
All components are not overclocked, survive a multicore memtest86 run (whole night) without errors and thermals are no problem either.
However, although the system is perfectly stable running Windows-10 I get reboots running linux every now and then.
After a crash/reboot; I see the following rebport about a machine check exception:
[ 0.105003] x86: Booting SMP configuration:
[ 0.105003] .... node #0, CPUs: #1 #2
[ 0.107022] mce: [Hardware Error]: Machine check events logged
[ 0.107023] mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 5: bea0000000000108
[ 0.107092] mce: [Hardware Error]: TSC 0 ADDR 7f80a0c0181a MISC d012000100000000 SYND 4d000000 IPID 500b000000000
[ 0.107167] mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1580717835 SOCKET 0 APIC 4 microcode 8701013
[ 0.107241] #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15
[ 0.123023] smp: Brought up 1 node, 16 CPUs
AMD support was not helpful at all, first trying to deny linux support (we provide drivers for windows 10 only), and now, after I asked them to decode the MCE they are stalling.
Nevermind, I decoded the values myself and the result is:
[ce@localhost MCE-Ryzen-Decoder]$ ./run.py 5 bea0000000000108
Bank: Execution Unit (EX)
Error: Watchdog Timeout error (WDT 0x0)
Since this seems to be power-management related, I tried to set processor.max_cstate=5 - however it didn't help, same MCE happend a few hours later.
I also tried idle=nomwait, however this is said to be required for Zen1 only (and after all, shouldn't have this been fixed in silicon long ago).
Does anybody experience similar / related issues?
Any ideas how to fix this issue are highly welcomed.
Best regards, Clemens
Comment