Announcement

Collapse
No announcement yet.

Continuing To Stress Ryzen

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by RyzenNewbie View Post

    don't worry, I haven't forgotten you - my answer currently is in moderation queue because I was so bold to include an URL...
    I did not expect serious answer on sarcastic comment..., anyway, why not? Because that's most idiotic theory one could come up with, AMD revenue was about 5 billion, Microsoft revenue for the same period was about 85 billion, to suggest that AMD would make some conspiracy with Microsoft without even logical sequance is insane, let alone without any evidence. Anyway, fact remains, I did not see anything related to Windows, and yet I'm having hard time understanding what is the actual problem....., as other posters already mentioned, those conftest segfaults are normal, and happen on Intel machines the same..., as for system crash some people do have, it can be anything, from unstable overclock, to the bad software and anywhere in between.

    Unless you can replicate that on Windows, I can't see any valid argument that it is a hardware bug.

    Comment


    • Originally posted by Brutalix View Post
      Ok.. the tone in this thread seems to be a bit heated. Please keep the tone civil everyone. This does not become a linux forum.

      Since my premature post yesterday, 2 of my computers, one 1700x and one 1600x both with ECC ram, 1 ubuntu and one debian testing, both have been running a 26 hour cycle test. One was running the Kill Ryzen script, and one the ryzen_segv_test-master script from the BSD error discussion. This last ran on the 1600x with: run.sh 12 250000. So far I have a 350 Mb log, and no seg. errors. Have not tested my 1700 that runs with non-ECC ram, but I will try that as well. So since I only got conftest seg error on all computers previously I can understand that AMD has trouble replicating this error.

      Kind regards
      Brut.
      Just to get more data points - which motherboards (and bios versions)? Any voltage tweeks or just defaults?

      Comment


      • Originally posted by ermo View Post
        What strikes me is that people are so eager to blame the messengers? Why is that?
        If you like tabloid style headlines, then nothing.

        Even after his update, the title still remains "50+ Segmentation Faults Per Hour: Continuing To Stress Ryzen" just to get hits.

        It is shoddy journalism to run with a "story" and not have your facts straight, then issue a correction saying basically, it happens on other (non-Ryzen) systems as well.

        As for Ryzen itself, there does appear to be some corner case where some people are seeing, and it is incredibly difficult to replicate.
        You can bet that AMD has hundreds of machines with Epyc & threadripper & Ryzen all doing tons of different workloads trying to replicate the issue, and nothing has come up yet as far as we know.

        This isn't anything new, Intel also has erratas on their CPUs, nothing is perfect out of the gate, it takes time to find out all the bugs.
        Look at https://www.intel.com/content/dam/ww...ion-update.pdf and look at all the "no fix" items listed, most of the fixes are done via BIOS updates, some need a new stepping.
        AMD hasn't issued errata guide for Ryzen yet, the last one I can find is http://support.amd.com/TechDocs/5537...Processors.pdf which also have "no fix" & other fixes done via BIOS updates.


        If someone does come up with a repeatable workload that can show Ryzen failing, then, cool, that person should get a bug hunting award from AMD.

        Comment


        • Originally posted by satai View Post

          Just to get more data points - which motherboards (and bios versions)? Any voltage tweeks or just defaults?
          1600x:
          PRIME X370-PRO, BIOS 0610 05/05/2017, Kingston 32gb ECC 2400 ram. Standard voltage on ram and CPU. (Ram runs on 1.2 v.) Ubuntu 17.04 Custom kernel 4.11.0.

          1700x:
          GA-AX-370-K5, BIOS F3. Kingston 32gb ECC 2400 ram. Standard voltage (meaning auto on ram and CPU). (Ram also here runs on 1.2v) Debian testing. Custom kernel 4.11.0.

          Kind regards
          B.



          Comment


          • Originally posted by vortex View Post
            (...) If someone does come up with a repeatable workload that can show Ryzen failing, then, cool, that person should get a bug hunting award from AMD.
            errm, that here perhaps:



            as mentioned two times before?

            Comment


            • Originally posted by Brutalix View Post
              1600x:
              (default BIOS) Ubuntu 17.04 Custom kernel 4.11.0.

              1700x:
              (default BIOS) Debian testing. Custom kernel 4.11.0.
              custom kernel means "which is a bit slimmer and trimmed for debian", right? That means HZ=1000, correct? And what else?

              Comment


              • That isn't showing a bug in Ryzen. That is showing a repeatable crash.
                What people are failing to grasp here is the fundamentals of rigorous root cause analysis.
                The BSD lot have a better handle on this than linux where right now a linux news outlet has an irresponsible headline misrepresenting the situation, an article that is now linked to many a site...

                The BSD have reduced the footprint of the testcase to narrow it down.
                Linux hasn't even got past the brute force method
                Windows... only report I have seen (BSOD) appears to be RAM timing related.

                BSD lot are still looking into their testcase to narrow it down further. It may turn out to be a BSD specific case (has the testcase been tried/ported to linux? ) it may point towards Ryzen.

                Personally I would go over the BSD case to replicate the testcase in linux to increase the number of machines trialling this method. IF it doesn't cause the same fault on linux then it is possibly a BSD specific issue. IF it causes it on linux then the testcase needs to be further narrowed down

                Comment


                • Originally posted by vortex View Post
                  If you like tabloid style headlines, then nothing.

                  Even after his update, the title still remains "50+ Segmentation Faults Per Hour: Continuing To Stress Ryzen" just to get hits.

                  It is shoddy journalism to run with a "story" and not have your facts straight, then issue a correction saying basically, it happens on other (non-Ryzen) systems as well.
                  I tend to agree. I really like most of Micheals work. I think understand how difficult it's to run a linux news site, doing it all by him self and managing to stay afloat. But like me, he thought the conf. seg. where the real deal, and ran with the story. But when it was cleared up that it was not the case, then he should have changed the title, and the story.

                  In Norway where i come from, the press has it own regulatory system. All press organisations have voluntary accepted a code of ethics for the press, and created a regulatory organ, that controls how the press operate, and how they handle mistakes or erranous publications. This have reduced the litigation costs for the press, and increased the credebility of the press in Norway.

                  Each editor and editorial staff member is required to be familiar with these ethical standards of the press, and to base their practice on this code. The ethical practice comprehends the complete journalistic process from research to publication. 1. The Role of the Press in Society 1.1. Freedom of Speech, Freedom of Information and Freedom…


                  Kind regards.

                  B.



                  Comment


                  • Originally posted by RyzenNewbie View Post

                    custom kernel means "which is a bit slimmer and trimmed for debian", right? That means HZ=1000, correct? And what else?
                    That and removing a lot of drivers not needed. Like radio etc..
                    Also I ran the bug script for the bug on bsd on my ubuntu computer, no errors here for 26 hours. I you want i can send you the log file, only 350 MB log file.

                    Kind regards

                    B.


                    Comment


                    • Write something ... OK, if you insist.

                      This morning I reran kill-ryzen and got even more build failures until I stopped the job. I discovered that Ubuntu had recently provided another kernel approved by Mint, so installed that: 4.11.0-13. All voltages and timings are as previously reported. I am running kill-ryzen now. It has generated a lot less output after build time zero. I have one build seg fault on loop 15 at 52s, and a general prot. fault on loop zero at 56s. No further errors are yet reported. This counts as silence compared with the previous runs.

                      System monitor's Resources plot looks like the world's worst eye diagram, but all threads are busy. I now understand that kill-ryzen does not assign a build to a particular thread, but instead the system scheduler moves the tasks around. We will see how it works out over the next several hours.

                      And whoever it was several messages back that commented about the large size of this thread should go to Overclock.net and view the ROG Crosshair VI overclocking thread which is past 25000 messages and still going.

                      Ryzen 7 1800X @ 3.9 GHz on Asus C6H with BIOS 9920 and Trident Z 3200C14 @ 3333 MT/s as described on page 9 of this thread.

                      Comment

                      Working...
                      X