Announcement

Collapse
No announcement yet.

AMD Confirms Linux Performance Marginality Problem Affecting Some, Doesn't Affect Epyc / TR

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by sdack View Post
    So it's not a bug with Ryzen per se as it's not happening with Windows but an issue particular to Linux. Who would have thought ...

    Sure, the problem has been identified in the hardware and AMD is replacing the broken CPUs because the problem is on... Linux. LOL

    Sure Linux is also the cause that the failure is present under Windows and BSD.

    Comment


    • A problem which is reproduced on BSD, Windows(WSL) and Linux is not a Linux bug.
      And when load causes corruptions in uOp cache, it's surely a defect in the CPU.

      Comment


      • Looking forward to know what's so special about these compilers as to trigger a "performance marginality" not seen in any other workload! All those Windows users, all those gamers … should they be jealous at the performance of clang/gcc or what?

        I guess AMD must be wondering the same by now, and are probably trying to break this down. Someone with enough time could help them by iteratively removing half the code of gcc or clang (binary search) until we have a reduced testcase that segfaults instantly.

        Comment


        • Originally posted by andreano View Post
          I guess AMD must be wondering the same by now, and are probably trying to break this down. Someone with enough time could help them by iteratively removing half the code of gcc or clang (binary search) until we have a reduced testcase that segfaults instantly.
          The problem is, it's not deterministic, so it's hard to narrow it down. No one reliably reproduced it yet as far as I know.

          Comment


          • Originally posted by pjssilva View Post

            Actually you can do this kind of stuff by suing a configuration file in /etc/sensors ou /etc/sensors.d but I do not remember the details. In this case the formula is no easy that I do it in my head. Note that the -20 C is only necessary for the X processors, like 1700X and 1800X. For a regular processor, like 1700, the temperature will be already correct. I forgot to mention that. Sorry.
            OK, this makes sense now. I have a 1700, so I don't have the 20 C offset in my case.

            To provide a couple more updates:

            1. I ran the stress test overnight at stock settings for my 1700 @ 3.0 GHz with SMT *ON*, no hard lockups or resets, but I did get the segfaults.
            2. I've tried running a combination of tests with SMT on/off and ASLR on/off to see what's more stable
            3. My max temp seems to cap out around 68 C when overclocked at 3.7 GHz at stock voltages. So I don't think I'm running into a max temp issue.

            I found this really detailed article that goes into the problem: http://fujii.github.io/2017/06/23/ho...luts-on-ryzen/ It also explains how to disable ASLR for Linux.

            My best results so far seem to be disabling SMT *and* ASLR. I'm planning on running another overnight test @ 3.0 GHz and then another @ 3.7 GHz overclock with these two settings. I do not have the beta BIOS so I'm unable to disable the Op Cache which is something else people are recommending.

            I've been running my Ryzen for several months now, doing a variety of compilation, running VMs, playing games, etc. and I've never had a problem. It's good to try to get to the bottom of the issue, but day to day it appears very unlikely most users will encounter the issue. I'm hoping I can figure out a way to run the stress test overnight without hard lockups and segfaults, and that point I'll keep that stable configuration.

            Comment


            • Originally posted by ermo View Post

              You do realize that intel has done the same thing multiple times in the past? ISTR that they sat on the Skylake HT crash bug for over a year.

              This kind of behaviour is par for the course in this particular game I'm afraid.
              Hi ermo
              I have heard of such, but as I never had any problems with Intel I did not read much into it.

              I was looking to replace my computer and read the thread on the AMD Ryzen, I got as far as
              39 pages in, and gave up with a sore head.

              What really shocked me was how the customers were doing all the work, when it should have
              been AMD, not much point in buying a product and wasting weeks trying to get it to work when
              it should have worked out of the box, is AMD going to pay all those people for their time.

              I am sceptical, look how AMD timed this, look at the psychology used in coming up with a name for
              the problem, marginal, as if it is no big deal, and spat out just in time to pretend they care, and no
              details of what exactly the problem is, so they found a problem they can't describe, and they
              tell us it does not effect any other chips, even though they don't know what it is, its ok.

              I don't think this is over

              Comment


              • Originally posted by efikkan View Post
                A problem which is reproduced on BSD, Windows(WSL) and Linux is not a Linux bug.
                And when load causes corruptions in uOp cache, it's surely a defect in the CPU.
                You wish this was true, don't you? But the bigger you make the issue the more petty only you become.

                AMD is offering you a replacement to shut you up. They're doing damage control, because a diminishing minority is having a very specific and rare issue and is using it to damage the reputation of their new CPU. Any FUD and fake news on their CPU, no matter how small and irrelevant, will cause them to lose buyers in the gaming market - a market where reputation matters as well as the number of LEDs inside your PC. Yet even for Linux is it an uncommon issue, because many Linux users these days don't actually compile this much software, making it a fraction of a fraction of users. AMD wouldn't give two cents for a problem as marginal as this if it wasn't for the much larger fish they're trying to fry with the new Ryzen.

                Now AMD will have to replace Ryzen CPUs from everywhere, even those by Windows users who will be using the offer to get lucky and to score a CPU with better over-clocking potential than an older one.

                I bet you still only think of their offer as some sort of admittance of guilt to the Linux community, don't you?

                Comment


                • Sdack, you speak similar to my paranoid schizophrenic cousin. I hope you get the help you need some day.

                  Comment


                  • BRAVO...BRAVO...BRAVO !! FANTASTIC WORK MICHAEL !!! THIS is why I subscribed to you ! If you are not a subscriber yet....please do so NOW !! Michael's work here on Phoronix and his Phoromatic Test Suit is ESSENTIAL !! At the very least TIP the son of a gun !! We NEED Michael !!

                    Comment


                    • Originally posted by Scorched View Post
                      Sdack, you speak similar to my paranoid schizophrenic cousin. I hope you get the help you need some day.
                      It's your first post on Phoronix and all you're using it for is to spite somebody. What's that called in psychiatric terms?

                      Comment

                      Working...
                      X