Announcement

Collapse
No announcement yet.

Some FreeBSD Users Are Still Running Into Random Lock-Ups With Ryzen

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Shinobi
    replied
    I am trying to make myself uncomfortable by thinking that this bug

    has to be fixed, so that we don't have to fiddle around with the BIOS settings.

    Those who are interested can also take a look at the Ryzen errata (50+Pages!!)


    In the errata pages, AMD provides a bunch of "Suggested Workaround", that system software is supposed to do..
    The revision history says, that 1.12 is the one that made it to the public & on June..
    So I hope a lot of kernel fixes/work-arounds are yet to come..

    Leave a comment:


  • Shinobi
    replied
    Originally posted by partizann View Post

    Didnt had issues on later kernel releases, while I had lockups on older ones, while 4.15 was in rc state.
    That is nice to hear. Would it be handy for you copy-paste the exact full kernel version, that works for you ?

    Leave a comment:


  • partizann
    replied
    Originally posted by Shinobi View Post
    Is the "Ryzen lock up issue(that gets fixed by disabling C-state)" ? still wide-spread ?
    Is the same problem still present in the Ryzen 2+Vega combo series of chips also ?
    Didnt had issues on later kernel releases, while I had lockups on older ones, while 4.15 was in rc state.

    Leave a comment:


  • Shinobi
    replied

    Is the "Ryzen lock up issue(that gets fixed by disabling C-state)" ? still wide-spread ?
    Is the same problem still present in the Ryzen 2+Vega combo series of chips also ?

    Leave a comment:


  • Chewi
    replied
    Originally posted by Apteryx View Post

    Are you also running with C-State disabled in UEFI? I also still had crashes with the CONFIG_RCU_NOCB_CPU workaround (thanks for discovering it). But it doesn't mean that C-State alone is the cure: I also had crashes with C-State off but *without* the CONFIG_RCU_NOCB_CPU hack. At this point I'm throwing everything I can at the problem, so ASLR is off, C-State is off, opcache is off and using CONFIG_RCU_NOCB_CPU. It seems to be holding that way -- touching wood.
    I did run with C-State disabled instead for a while but I heard that this might disable boost. Still not sure about that but it also uses more power than CONFIG_RCU_NOCB_CPU. I haven't measured it but others have. I think only these two fixes make any difference, ASLR is almost certainly not related to this, but I don't think either fix is perfect. It may be that this affects some users more than others. I boot in legacy mode rather than UEFI, I doubt that makes any difference but you never know.

    Leave a comment:


  • shmerl
    replied
    Originally posted by Apteryx View Post

    Probably true, but having a UPS doesn't fix the Ryzen soft lock problem. I tested 3 different systems equipped with 1800X and 1700X CPUs, two of which are plugged onto a commercial grade UPS and they crash all the same.
    I guess. I've been bugged by random freezes and MCE errors on Linux despite RMA-ing the CPU, motherboard and even RAM. And I suspect it's power related.

    Leave a comment:


  • keantoken
    replied
    I get the random lockup issue more often with the Nouveau driver than with nVidia drivers. I think Nouveau does something that tends to provoke it.

    Leave a comment:


  • JPFSanders
    replied
    Originally posted by typerrrrrrrr View Post
    Forcing off the C6 states (https://github.com/r4m0n/ZenStates-Linux) with a script in /etc/rc.local has changed my setup from having an average uptime of 1 - 1.5 days into 50+ days (generally I've rebooted for other reasons so haven't gone beyond that number). Every other attempted fix (RCU, randomizing VA space, etc) had nearly zero effect.
    That is very interesting indeed, many thanks for sharing it.

    Leave a comment:


  • Apteryx
    replied
    Originally posted by monraaf View Post

    This was fixed in 4.13rc-something. These soft-lockups have been visible on large SPARC machines as well. This issue is not specific to Ryzen/Threadripper.

    My office has one Threadripper machine which had this issue as well. With the updated kernel, the problem went away.

    The machine is running 24/7 and is used for heavy number crunching.
    The 4.13rc-something resolution might apply to Threadripper, but it doesn't apply to Ryzen 1700X and 1800X: they still have the CPU soft lock issue on 4.14 (all 3 different systems).

    Leave a comment:


  • Apteryx
    replied
    Originally posted by shmerl View Post
    Ryzen is very sensitive to power stability. In short, you need UPS to avoid problems.
    Probably true, but having a UPS doesn't fix the Ryzen soft lock problem. I tested 3 different systems equipped with 1800X and 1700X CPUs, two of which are plugged onto a commercial grade UPS and they crash all the same.

    Leave a comment:

Working...
X