Announcement

Collapse
No announcement yet.

Ryzen 3700x - spontaneous reboots under lightly loaded conditions

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ryzen 3700x - spontaneous reboots under lightly loaded conditions

    Hi,

    I experience random reboots (one in a few days) running Fedora-31 on my Ryzen-3700x.
    All components are not overclocked, survive a multicore memtest86 run (whole night) without errors and thermals are no problem either.

    However, although the system is perfectly stable running Windows-10 I get reboots running linux every now and then.

    After a crash/reboot; I see the following rebport about a machine check exception:

    [ 0.105003] x86: Booting SMP configuration:
    [ 0.105003] .... node #0, CPUs: #1 #2
    [ 0.107022] mce: [Hardware Error]: Machine check events logged
    [ 0.107023] mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 5: bea0000000000108
    [ 0.107092] mce: [Hardware Error]: TSC 0 ADDR 7f80a0c0181a MISC d012000100000000 SYND 4d000000 IPID 500b000000000
    [ 0.107167] mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1580717835 SOCKET 0 APIC 4 microcode 8701013
    [ 0.107241] #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15
    [ 0.123023] smp: Brought up 1 node, 16 CPUs

    AMD support was not helpful at all, first trying to deny linux support (we provide drivers for windows 10 only), and now, after I asked them to decode the MCE they are stalling.

    Nevermind, I decoded the values myself and the result is:

    [[email protected] MCE-Ryzen-Decoder]$ ./run.py 5 bea0000000000108
    Bank: Execution Unit (EX)
    Error: Watchdog Timeout error (WDT 0x0)


    Since this seems to be power-management related, I tried to set processor.max_cstate=5 - however it didn't help, same MCE happend a few hours later.
    I also tried idle=nomwait, however this is said to be required for Zen1 only (and after all, shouldn't have this been fixed in silicon long ago).

    Does anybody experience similar / related issues?
    Any ideas how to fix this issue are highly welcomed.

    Best regards, Clemens

  • #2
    I've run Debian, Ubuntu, Arch and CentOS on Ryzen 1700, 2700X, 3900X and Threadripper 1950X systems and never experienced an error like that. I did have a random hard lock when idle on the 1700 system, but a workaround was disable the low-power C-states (C6?) and then was later fixed with an updated BIOS as the board was dropping voltages too low at idle.

    Would you be willing to try a different base distro, see if it still occurs? First, try disabling all the power saving functions, or see if there is an updated BIOS.

    edit: I've never yet had a manufacturer interested in trying to solve a problem exhibited in Linux, so AMD not showing interest doesn't really surprise me in the slightest.
    Last edited by Paradigm Shifter; 02-06-2020, 03:05 AM.

    Comment


    • #3
      Hi PS,

      Originally posted by Paradigm Shifter View Post
      Would you be willing to try a different base distro, see if it still occurs? First, try disabling all the power saving functions, or see if there is an updated BIOS.
      I didn't spot p-state settings in BIOS, so I gave processor.max_cstate=5 a try - didn't help. Bios is already updated to the latest stable version.
      Currently I am running with idle=nomwait, however I guess that silicon bug present in Zen1 devices that work-arround is recommended for, shouldn't affect my 3700x.
      The next few days will tell

      However it seems I am not the only one with such issues: https://www.reddit.com/r/archlinux/c...h_ryzen_3600x/
      Same mobo, differet distro - somehow the suggested work-arround (replacing R600 based graphics cards with GCN+ stuff) leaves me puzzled, my system is already equipped with a RX570.

      I've never yet had a manufacturer interested in trying to solve a problem exhibited in Linux, so AMD not showing interest doesn't really surprise me in the slightest.
      You are right - just heard back from Asrock, this is what I got back:

      So far, we have tested WIN10 x64 only with this mainboard. We cannot proof is working well under other OS or offer drivers or support.
      Last edited by Linuxhippy; 02-07-2020, 05:11 PM.

      Comment


      • #4
        Sorry for the delay in responding, I've been rather busy of late.

        How has it gone? Improved, or still looking for the cause and a solution?

        I generally distrust kernel flags for enabling/disabling power state management, as a lot of stuff goes on in the board that the OS never sees - this stems from a Z97 board I had which would always hard reboot when I tried compiling anything on every thread. Never any errors in the logs; it was the overcurrent protection tripping because it was set too low for the 4790K I had in the board, and it cut in so fast that the OS didn't even know anything had happened.

        I'd try disabling stuff in the BIOS, just to identify the issue if it is still a problem.

        Comment

        Working...
        X