Announcement

Collapse
No announcement yet.

Ryzen-Test & Stress-Run Make It Easy To Cause Segmentation Faults On Zen CPUs

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Michael View Post

    I will have more information out later today or tomorrow as well, running several hour long tests in different workload configurations... Now that I can reproduce super-easily via phoronix-test-suite stress-run, encourages me to run more tests whenever it's PTS automated, and being able to show off PTS stress-run capabilities since I don't often get to talk about it too much otherwise.
    One more for you to test: in bios set CPU voltage offset to +25mV. I've managed to crash my 1800X fairly consistently within 30 minutes when running large x264 encoding jobs, but this small voltage bump seems to have fixed it - It's been almost 40h of encoding and no issues. If you're running memory at speeds faster than 2666 make sure your SOC voltage is 1.1V (some bioses adjust that automatically, some don't)

    Comment


    • #12
      I didn't actually look yet, but just from reading this article I get the impression that really all the ryzen test script is doing is running a few compiling jobs of gcc side by side. If that's right, then that's pretty easy to duplicate. Plus Michael was able to get PTS to show this issue by running multiple benchmarks side by side, That's also pretty easy to duplicate. I just can't imagine that AMD didn't find out about this by the time of the first tape out samples. It's seems like it's just too easy to hit. They must have known.

      It's not at all like the BD bug that affected it shortly after launch, that bug literally affected nobody ever. It could only be triggered in a very specific scenario. This Zen bug seems much more obvious than that one was.

      Comment


      • #13
        Michael, did you not try disabling ASLR, which is what fixes this for most people? I haven't had a segfault or freeze in ages.

        Comment


        • #14
          Originally posted by Chewi View Post
          Michael, did you not try disabling ASLR, which is what fixes this for most people? I haven't had a segfault or freeze in ages.
          I got a better idea: Instead of requiring Linux users to turn off an important security feature that apparently works with Intel chips, ARM chips, POWER chips, MIPs chips, all AMD chips other than RyZen, etc., why doesn't AMD figure out what's going on and fix their product.

          Comment


          • #15
            I think there are also reports that state that turning off ASLR doesn't workaround the issue completely.

            Comment


            • #16
              Looks like FreeBSD and DragonflyBSD developers have debugged and fixed it in their kernels: https://svnweb.freebsd.org/base?view...evision=321899

              Might be worth mentioning in the newspost

              Comment


              • #17
                Originally posted by chuckula View Post

                Oh and turn off 7 of the cores!
                That was uncalled for. When trying to find a root cause, it's not unusual to take steps you'd never take under normal circumstances.

                Comment


                • #18
                  Originally posted by DanielG View Post
                  Looks like FreeBSD and DragonflyBSD developers have debugged and fixed it in their kernels: https://svnweb.freebsd.org/base?view...evision=321899

                  Might be worth mentioning in the newspost
                  That's a great find! Thanks for posting that here.

                  Comment


                  • #19
                    Originally posted by DanielG View Post
                    Looks like FreeBSD and DragonflyBSD developers have debugged and fixed it in their kernels: https://svnweb.freebsd.org/base?view...evision=321899

                    Might be worth mentioning in the newspost
                    Well, as can be read in the according comment, they didn't really fix it, only made it less likely to happen, and whether a similar fix makes sense in Linux remains to be seen. In any case, it seems to be a problem in the microcode which comes as a binary blob ...

                    Comment


                    • #20
                      Thank you for this test, Michael!

                      Comment

                      Working...
                      X