Announcement

Collapse
No announcement yet.

Continuing To Stress Ryzen

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ok.. the tone in this thread seems to be a bit heated. Please keep the tone civil everyone. This does not become a linux forum.

    Since my premature post yesterday, 2 of my computers, one 1700x and one 1600x both with ECC ram, 1 ubuntu and one debian testing, both have been running a 26 hour cycle test. One was running the Kill Ryzen script, and one the ryzen_segv_test-master script from the BSD error discussion. This last ran on the 1600x with: run.sh 12 250000. So far I have a 350 Mb log, and no seg. errors. Have not tested my 1700 that runs with non-ECC ram, but I will try that as well. So since I only got conftest seg error on all computers previously I can understand that AMD has trouble replicating this error.

    Kind regards
    Brut.

    Comment


    • Originally posted by RyzenNewbie View Post

      don't worry, I haven't forgotten you - my answer currently is in moderation queue because I was so bold to include an URL...
      I did not expect serious answer on sarcastic comment..., anyway, why not? Because that's most idiotic theory one could come up with, AMD revenue was about 5 billion, Microsoft revenue for the same period was about 85 billion, to suggest that AMD would make some conspiracy with Microsoft without even logical sequance is insane, let alone without any evidence. Anyway, fact remains, I did not see anything related to Windows, and yet I'm having hard time understanding what is the actual problem....., as other posters already mentioned, those conftest segfaults are normal, and happen on Intel machines the same..., as for system crash some people do have, it can be anything, from unstable overclock, to the bad software and anywhere in between.

      Unless you can replicate that on Windows, I can't see any valid argument that it is a hardware bug.

      Comment


      • Originally posted by Brutalix View Post
        Ok.. the tone in this thread seems to be a bit heated. Please keep the tone civil everyone. This does not become a linux forum.

        Since my premature post yesterday, 2 of my computers, one 1700x and one 1600x both with ECC ram, 1 ubuntu and one debian testing, both have been running a 26 hour cycle test. One was running the Kill Ryzen script, and one the ryzen_segv_test-master script from the BSD error discussion. This last ran on the 1600x with: run.sh 12 250000. So far I have a 350 Mb log, and no seg. errors. Have not tested my 1700 that runs with non-ECC ram, but I will try that as well. So since I only got conftest seg error on all computers previously I can understand that AMD has trouble replicating this error.

        Kind regards
        Brut.
        Just to get more data points - which motherboards (and bios versions)? Any voltage tweeks or just defaults?

        Comment


        • Originally posted by ermo View Post
          What strikes me is that people are so eager to blame the messengers? Why is that?
          If you like tabloid style headlines, then nothing.

          Even after his update, the title still remains "50+ Segmentation Faults Per Hour: Continuing To Stress Ryzen" just to get hits.

          It is shoddy journalism to run with a "story" and not have your facts straight, then issue a correction saying basically, it happens on other (non-Ryzen) systems as well.

          As for Ryzen itself, there does appear to be some corner case where some people are seeing, and it is incredibly difficult to replicate.
          You can bet that AMD has hundreds of machines with Epyc & threadripper & Ryzen all doing tons of different workloads trying to replicate the issue, and nothing has come up yet as far as we know.

          This isn't anything new, Intel also has erratas on their CPUs, nothing is perfect out of the gate, it takes time to find out all the bugs.
          Look at https://www.intel.com/content/dam/ww...ion-update.pdf and look at all the "no fix" items listed, most of the fixes are done via BIOS updates, some need a new stepping.
          AMD hasn't issued errata guide for Ryzen yet, the last one I can find is http://support.amd.com/TechDocs/5537...Processors.pdf which also have "no fix" & other fixes done via BIOS updates.


          If someone does come up with a repeatable workload that can show Ryzen failing, then, cool, that person should get a bug hunting award from AMD.

          Comment


          • Originally posted by satai View Post

            Just to get more data points - which motherboards (and bios versions)? Any voltage tweeks or just defaults?
            1600x:
            PRIME X370-PRO, BIOS 0610 05/05/2017, Kingston 32gb ECC 2400 ram. Standard voltage on ram and CPU. (Ram runs on 1.2 v.) Ubuntu 17.04 Custom kernel 4.11.0.

            1700x:
            GA-AX-370-K5, BIOS F3. Kingston 32gb ECC 2400 ram. Standard voltage (meaning auto on ram and CPU). (Ram also here runs on 1.2v) Debian testing. Custom kernel 4.11.0.

            Kind regards
            B.



            Comment


            • Originally posted by vortex View Post
              (...) If someone does come up with a repeatable workload that can show Ryzen failing, then, cool, that person should get a bug hunting award from AMD.
              errm, that here perhaps:

              https://bugs.freebsd.org/bugzilla/sh...id=219399#c204

              as mentioned two times before?

              Comment


              • Originally posted by Brutalix View Post
                1600x:
                (default BIOS) Ubuntu 17.04 Custom kernel 4.11.0.

                1700x:
                (default BIOS) Debian testing. Custom kernel 4.11.0.
                custom kernel means "which is a bit slimmer and trimmed for debian", right? That means HZ=1000, correct? And what else?

                Comment


                • Originally posted by RyzenNewbie View Post

                  errm, that here perhaps:

                  https://bugs.freebsd.org/bugzilla/sh...id=219399#c204

                  as mentioned two times before?
                  That isn't showing a bug in Ryzen. That is showing a repeatable crash.
                  What people are failing to grasp here is the fundamentals of rigorous root cause analysis.
                  The BSD lot have a better handle on this than linux where right now a linux news outlet has an irresponsible headline misrepresenting the situation, an article that is now linked to many a site...

                  The BSD have reduced the footprint of the testcase to narrow it down.
                  Linux hasn't even got past the brute force method
                  Windows... only report I have seen (BSOD) appears to be RAM timing related.

                  BSD lot are still looking into their testcase to narrow it down further. It may turn out to be a BSD specific case (has the testcase been tried/ported to linux? ) it may point towards Ryzen.

                  Personally I would go over the BSD case to replicate the testcase in linux to increase the number of machines trialling this method. IF it doesn't cause the same fault on linux then it is possibly a BSD specific issue. IF it causes it on linux then the testcase needs to be further narrowed down

                  Comment


                  • Originally posted by vortex View Post
                    If you like tabloid style headlines, then nothing.

                    Even after his update, the title still remains "50+ Segmentation Faults Per Hour: Continuing To Stress Ryzen" just to get hits.

                    It is shoddy journalism to run with a "story" and not have your facts straight, then issue a correction saying basically, it happens on other (non-Ryzen) systems as well.
                    I tend to agree. I really like most of Micheals work. I think understand how difficult it's to run a linux news site, doing it all by him self and managing to stay afloat. But like me, he thought the conf. seg. where the real deal, and ran with the story. But when it was cleared up that it was not the case, then he should have changed the title, and the story.

                    In Norway where i come from, the press has it own regulatory system. All press organisations have voluntary accepted a code of ethics for the press, and created a regulatory organ, that controls how the press operate, and how they handle mistakes or erranous publications. This have reduced the litigation costs for the press, and increased the credebility of the press in Norway.

                    http://presse.no/pfu/etiske-regler/v.../vvpl-engelsk/

                    Kind regards.

                    B.



                    Comment


                    • Originally posted by RyzenNewbie View Post

                      custom kernel means "which is a bit slimmer and trimmed for debian", right? That means HZ=1000, correct? And what else?
                      That and removing a lot of drivers not needed. Like radio etc..
                      Also I ran the bug script for the bug on bsd on my ubuntu computer, no errors here for 26 hours. I you want i can send you the log file, only 350 MB log file.

                      Kind regards

                      B.


                      Comment

                      Working...
                      X