Announcement

Collapse
No announcement yet.

Some FreeBSD Users Are Still Running Into Random Lock-Ups With Ryzen

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by shmerl View Post
    Ryzen is very sensitive to power stability. In short, you need UPS to avoid problems.
    Probably true, but having a UPS doesn't fix the Ryzen soft lock problem. I tested 3 different systems equipped with 1800X and 1700X CPUs, two of which are plugged onto a commercial grade UPS and they crash all the same.

    Comment


    • #52
      Originally posted by monraaf View Post

      This was fixed in 4.13rc-something. These soft-lockups have been visible on large SPARC machines as well. This issue is not specific to Ryzen/Threadripper.

      My office has one Threadripper machine which had this issue as well. With the updated kernel, the problem went away.

      The machine is running 24/7 and is used for heavy number crunching.
      The 4.13rc-something resolution might apply to Threadripper, but it doesn't apply to Ryzen 1700X and 1800X: they still have the CPU soft lock issue on 4.14 (all 3 different systems).

      Comment


      • #53
        Originally posted by typerrrrrrrr View Post
        Forcing off the C6 states (https://github.com/r4m0n/ZenStates-Linux) with a script in /etc/rc.local has changed my setup from having an average uptime of 1 - 1.5 days into 50+ days (generally I've rebooted for other reasons so haven't gone beyond that number). Every other attempted fix (RCU, randomizing VA space, etc) had nearly zero effect.
        That is very interesting indeed, many thanks for sharing it.

        Comment


        • #54
          I get the random lockup issue more often with the Nouveau driver than with nVidia drivers. I think Nouveau does something that tends to provoke it.

          Comment


          • #55
            Originally posted by Apteryx View Post

            Probably true, but having a UPS doesn't fix the Ryzen soft lock problem. I tested 3 different systems equipped with 1800X and 1700X CPUs, two of which are plugged onto a commercial grade UPS and they crash all the same.
            I guess. I've been bugged by random freezes and MCE errors on Linux despite RMA-ing the CPU, motherboard and even RAM. And I suspect it's power related.

            Comment


            • #56
              Originally posted by Apteryx View Post

              Are you also running with C-State disabled in UEFI? I also still had crashes with the CONFIG_RCU_NOCB_CPU workaround (thanks for discovering it). But it doesn't mean that C-State alone is the cure: I also had crashes with C-State off but *without* the CONFIG_RCU_NOCB_CPU hack. At this point I'm throwing everything I can at the problem, so ASLR is off, C-State is off, opcache is off and using CONFIG_RCU_NOCB_CPU. It seems to be holding that way -- touching wood.
              I did run with C-State disabled instead for a while but I heard that this might disable boost. Still not sure about that but it also uses more power than CONFIG_RCU_NOCB_CPU. I haven't measured it but others have. I think only these two fixes make any difference, ASLR is almost certainly not related to this, but I don't think either fix is perfect. It may be that this affects some users more than others. I boot in legacy mode rather than UEFI, I doubt that makes any difference but you never know.

              Comment


              • #57

                Is the "Ryzen lock up issue(that gets fixed by disabling C-state)" ? still wide-spread ?
                Is the same problem still present in the Ryzen 2+Vega combo series of chips also ?

                Comment


                • #58
                  Originally posted by Shinobi View Post
                  Is the "Ryzen lock up issue(that gets fixed by disabling C-state)" ? still wide-spread ?
                  Is the same problem still present in the Ryzen 2+Vega combo series of chips also ?
                  Didnt had issues on later kernel releases, while I had lockups on older ones, while 4.15 was in rc state.

                  Comment


                  • #59
                    Originally posted by partizann View Post

                    Didnt had issues on later kernel releases, while I had lockups on older ones, while 4.15 was in rc state.
                    That is nice to hear. Would it be handy for you copy-paste the exact full kernel version, that works for you ?

                    Comment


                    • #60
                      I am trying to make myself uncomfortable by thinking that this bug
                      https://bugzilla.kernel.org/show_bug.cgi?id=196683
                      has to be fixed, so that we don't have to fiddle around with the BIOS settings.

                      Those who are interested can also take a look at the Ryzen errata (50+Pages!!)
                      https://developer.amd.com/wp-content...55449_1.12.pdf

                      In the errata pages, AMD provides a bunch of "Suggested Workaround", that system software is supposed to do..
                      The revision history says, that 1.12 is the one that made it to the public & on June..
                      So I hope a lot of kernel fixes/work-arounds are yet to come..

                      Comment

                      Working...
                      X