Announcement

Collapse
No announcement yet.

DragonFlyBSD Finalizes Its Ryzen Workaround

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • DragonFlyBSD Finalizes Its Ryzen Workaround

    Phoronix: DragonFlyBSD Finalizes Its Ryzen Workaround

    Separate from the AMD Ryzen performance marginality problem affecting Linux users, BSD users have been working on a workaround for their kernels to address problems with how their user stacks are mapped...

    http://www.phoronix.com/scan.php?pag...zen-Workaround

  • #2
    Everybody keeps saying it's unrelated, but I'm not convinced it's unrelated yet. I think we are seeing 2 different symptoms of the same problem. The reason we see different symptoms on linux and BSD is because they address memory differently. But I'd be willing to bet that physically the underlying cause is identical.

    Comment


    • #3
      I have made a big news about Ryzen in a Chinese World of Warcraft forum.
      Now 8 users have tested their Ryzen box and all 8 Ryzen processors have segfault problems.

      Comment


      • #4
        I'm running 3 Ryzen boxes, which would freeze about every 2-3 days. After switching from a generic kernel to a low latency kernel, no problems... So I am convinced that there is something wrong. AMD won't be able to downplay this forever... Datacenters won't want to deal with instability, so once Epyc starts hitting datacenters I'm sure AMD will have to deal with this. That is, unless workarounds like this one are put in place across the board...

        Comment


        • #5
          Originally posted by plasmasnake View Post
          I'm running 3 Ryzen boxes, which would freeze about every 2-3 days. After switching from a generic kernel to a low latency kernel, no problems... So I am convinced that there is something wrong. AMD won't be able to downplay this forever... Datacenters won't want to deal with instability, so once Epyc starts hitting datacenters I'm sure AMD will have to deal with this. That is, unless workarounds like this one are put in place across the board...
          Epyc is not affected by it, only early batches of Ryzen. If you have such a chip and it fails in this way you can RMA it and get a known good one.

          Comment


          • #6
            Originally posted by plasmasnake View Post
            I'm running 3 Ryzen boxes, which would freeze about every 2-3 days. After switching from a generic kernel to a low latency kernel, no problems... So I am convinced that there is something wrong. AMD won't be able to downplay this forever... Datacenters won't want to deal with instability, so once Epyc starts hitting datacenters I'm sure AMD will have to deal with this. That is, unless workarounds like this one are put in place across the board...
            There is no sense to run any other than low latency kernels. Do you drive your car with the handbrake on too...You can have a unstable kernel with every cpu, it is very easy to config one.

            Comment


            • #7
              "performance marginality problem" I still get a laugh every time I see that. I wonder what marketing idiot came up with it.

              Comment


              • #8
                Originally posted by debianxfce View Post

                There is no sense to run any other than low latency kernels. Do you drive your car with the handbrake on too...You can have a unstable kernel with every cpu, it is very easy to config one.
                And here we go with the usual BS by one of the "know nothing" forum trolls.
                There are plenty of reasons not to use a low latency kernel.

                It's not a one-size fits all.

                The comparison with a car is also dreadfully wrong.
                It would've been more apt to set up a comparison between an automatic and a manual transmission. And even that one would've been still inaccurate.

                Comment


                • #9
                  Originally posted by duby229 View Post

                  Epyc is not affected by it, only early batches of Ryzen. If you have such a chip and it fails in this way you can RMA it and get a known good one.

                  They said ThreadRipper isn't affected as well but that's also running B1 stepping of the Zeppelin Dies (unlike Epyc which is B2). If they are still trying to figure out what's going on on the Linux side as, how are they able to make a claim that it's fixed in TR?

                  If it was microcode, should have been ported to Ryzen BIOS. If it was fixed in manufacturing, they should be able to tell what the manufacturing batch cutover was. If it's improved QA in the binning process (eq - top 5% for TR), will the Ryzen PRO chips get that treatment as well? Or will the consumer line end up playing ongoing SEGV silicon lottery?

                  I'm still waiting to get my RMA number - the people who've been getting cpu's via RMA's in the amd community forums are getting UA1725XXX chips (25th week of 2017) which doesn't seem to be running into the SEGV issue anymore. Nevertheless, 1 user who received an UA1725XXX chip on their first RMA started running into MCE errors. He RMA'd the first RMA and got a UA1725XXX second time as well, and that CPU looks like it's working correctly. The Thread Rippers the reviewers got are UA1727XXX (27th week of 2017).

                  Seems like RMA silicon lottery is involved.
                  Last edited by Funks; 08-13-2017, 05:25 AM.

                  Comment


                  • #10
                    Originally posted by qsmcomp View Post
                    I have made a big news about Ryzen in a Chinese World of Warcraft forum.
                    Now 8 users have tested their Ryzen box and all 8 Ryzen processors have segfault problems.
                    I believe a large majority of Ryzen chips with UA build dates earlier than UA 1725XXX are affected. note that the 25th week of 2017 is late June - if you buy one off the shelf right now, it would be surprising if you get one (you'll be playing SEGV silicon lottery) UA 1725+.

                    Unless you have a ThreadRipper chip that is ( most of the reviewers got UA 1727XXX chips ).

                    Reported Good RMA CPU Build Dates
                    UA1725SUS (mcl00@amdcommunityforums , fujii@amdcommunityforums, sat@amdcommunityforums)

                    Reported Instance of bad CPU Build Dates (SEGV)
                    UA1706SUT (sat@amdcommunityforums)
                    UA1707SUT (apache14@amdcommunityforums)
                    UA1707PGT (fujii@amdcommunityforums)
                    UA1707SUS (xtronom@amdcommunityforums)
                    UA1714SUS (xtronom@amdcommunityforums) UA1714SUS (runningman@amdcommunitforums)
                    UA1716PGT (fujii@amdcommunityforums)
                    UA1717SUT (sat@amdcommunityforums)
                    UA1725SUS (sat@amdcommunityforums) - Note: first rma MCE, and weird REBOOTS, went for second RMA.
                    Last edited by Funks; 08-13-2017, 05:27 AM.

                    Comment

                    Working...
                    X