Announcement

Collapse
No announcement yet.

Continuing To Stress Ryzen

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by ermo View Post
    What strikes me is that people are so eager to blame the messengers? Why is that?
    If you like tabloid style headlines, then nothing.

    Even after his update, the title still remains "50+ Segmentation Faults Per Hour: Continuing To Stress Ryzen" just to get hits.

    It is shoddy journalism to run with a "story" and not have your facts straight, then issue a correction saying basically, it happens on other (non-Ryzen) systems as well.

    As for Ryzen itself, there does appear to be some corner case where some people are seeing, and it is incredibly difficult to replicate.
    You can bet that AMD has hundreds of machines with Epyc & threadripper & Ryzen all doing tons of different workloads trying to replicate the issue, and nothing has come up yet as far as we know.

    This isn't anything new, Intel also has erratas on their CPUs, nothing is perfect out of the gate, it takes time to find out all the bugs.
    Look at https://www.intel.com/content/dam/ww...ion-update.pdf and look at all the "no fix" items listed, most of the fixes are done via BIOS updates, some need a new stepping.
    AMD hasn't issued errata guide for Ryzen yet, the last one I can find is http://support.amd.com/TechDocs/5537...Processors.pdf which also have "no fix" & other fixes done via BIOS updates.


    If someone does come up with a repeatable workload that can show Ryzen failing, then, cool, that person should get a bug hunting award from AMD.

    Comment


    • Originally posted by satai View Post

      Just to get more data points - which motherboards (and bios versions)? Any voltage tweeks or just defaults?
      1600x:
      PRIME X370-PRO, BIOS 0610 05/05/2017, Kingston 32gb ECC 2400 ram. Standard voltage on ram and CPU. (Ram runs on 1.2 v.) Ubuntu 17.04 Custom kernel 4.11.0.

      1700x:
      GA-AX-370-K5, BIOS F3. Kingston 32gb ECC 2400 ram. Standard voltage (meaning auto on ram and CPU). (Ram also here runs on 1.2v) Debian testing. Custom kernel 4.11.0.

      Kind regards
      B.



      Comment


      • Originally posted by vortex View Post
        (...) If someone does come up with a repeatable workload that can show Ryzen failing, then, cool, that person should get a bug hunting award from AMD.
        errm, that here perhaps:

        https://bugs.freebsd.org/bugzilla/sh...id=219399#c204

        as mentioned two times before?

        Comment


        • Originally posted by Brutalix View Post
          1600x:
          (default BIOS) Ubuntu 17.04 Custom kernel 4.11.0.

          1700x:
          (default BIOS) Debian testing. Custom kernel 4.11.0.
          custom kernel means "which is a bit slimmer and trimmed for debian", right? That means HZ=1000, correct? And what else?

          Comment


          • Originally posted by RyzenNewbie View Post

            errm, that here perhaps:

            https://bugs.freebsd.org/bugzilla/sh...id=219399#c204

            as mentioned two times before?
            That isn't showing a bug in Ryzen. That is showing a repeatable crash.
            What people are failing to grasp here is the fundamentals of rigorous root cause analysis.
            The BSD lot have a better handle on this than linux where right now a linux news outlet has an irresponsible headline misrepresenting the situation, an article that is now linked to many a site...

            The BSD have reduced the footprint of the testcase to narrow it down.
            Linux hasn't even got past the brute force method
            Windows... only report I have seen (BSOD) appears to be RAM timing related.

            BSD lot are still looking into their testcase to narrow it down further. It may turn out to be a BSD specific case (has the testcase been tried/ported to linux? ) it may point towards Ryzen.

            Personally I would go over the BSD case to replicate the testcase in linux to increase the number of machines trialling this method. IF it doesn't cause the same fault on linux then it is possibly a BSD specific issue. IF it causes it on linux then the testcase needs to be further narrowed down

            Comment


            • Originally posted by vortex View Post
              If you like tabloid style headlines, then nothing.

              Even after his update, the title still remains "50+ Segmentation Faults Per Hour: Continuing To Stress Ryzen" just to get hits.

              It is shoddy journalism to run with a "story" and not have your facts straight, then issue a correction saying basically, it happens on other (non-Ryzen) systems as well.
              I tend to agree. I really like most of Micheals work. I think understand how difficult it's to run a linux news site, doing it all by him self and managing to stay afloat. But like me, he thought the conf. seg. where the real deal, and ran with the story. But when it was cleared up that it was not the case, then he should have changed the title, and the story.

              In Norway where i come from, the press has it own regulatory system. All press organisations have voluntary accepted a code of ethics for the press, and created a regulatory organ, that controls how the press operate, and how they handle mistakes or erranous publications. This have reduced the litigation costs for the press, and increased the credebility of the press in Norway.

              http://presse.no/pfu/etiske-regler/v.../vvpl-engelsk/

              Kind regards.

              B.



              Comment


              • Originally posted by RyzenNewbie View Post

                custom kernel means "which is a bit slimmer and trimmed for debian", right? That means HZ=1000, correct? And what else?
                That and removing a lot of drivers not needed. Like radio etc..
                Also I ran the bug script for the bug on bsd on my ubuntu computer, no errors here for 26 hours. I you want i can send you the log file, only 350 MB log file.

                Kind regards

                B.


                Comment


                • Write something ... OK, if you insist.

                  This morning I reran kill-ryzen and got even more build failures until I stopped the job. I discovered that Ubuntu had recently provided another kernel approved by Mint, so installed that: 4.11.0-13. All voltages and timings are as previously reported. I am running kill-ryzen now. It has generated a lot less output after build time zero. I have one build seg fault on loop 15 at 52s, and a general prot. fault on loop zero at 56s. No further errors are yet reported. This counts as silence compared with the previous runs.

                  System monitor's Resources plot looks like the world's worst eye diagram, but all threads are busy. I now understand that kill-ryzen does not assign a build to a particular thread, but instead the system scheduler moves the tasks around. We will see how it works out over the next several hours.

                  And whoever it was several messages back that commented about the large size of this thread should go to Overclock.net and view the ROG Crosshair VI overclocking thread which is past 25000 messages and still going.

                  Ryzen 7 1800X @ 3.9 GHz on Asus C6H with BIOS 9920 and Trident Z 3200C14 @ 3333 MT/s as described on page 9 of this thread.

                  Comment


                  • Originally posted by Naib View Post
                    That isn't showing a bug in Ryzen. That is showing a repeatable crash.
                    yep, a freeze that I cannot force on my Xeon system at the moment.


                    What people are failing to grasp here is the fundamentals of rigorous root cause analysis.
                    thank you; but you do understand that I am the reporter of the FreeBSD reports, do you?


                    The BSD lot have a better handle on this than linux where right now a linux news outlet has an irresponsible headline misrepresenting the situation, an article that is now linked to many a site...
                    I do not care about any headlines; but I do care to get more people trying what I am trying - even if that means to test an unknown operating system.


                    (...) (has the testcase been tried/ported to linux? )
                    I don't think so, do you volunteer? I mean that seriously. I'd even offer a bootable image with FreeBSD test cases so that you don't have to install it on your SSD or a new one.


                    Personally I would go over the BSD case to replicate the testcase in linux to increase the number of machines trialling this method. IF it doesn't cause the same fault on linux then it is possibly a BSD specific issue. IF it causes it on linux then the testcase needs to be further narrowed down
                    agreed...

                    Comment


                    • For people who want a thread with *many* people confirming the bug when running kill-ryzen.sh just look in google for

                      DiscussionThreadripper Early Adopters and Ryzen gamers, can you spare us some time?

                      (People in that thread don't have Threadripper yet, but many have Ryzen and did the test in their computers).

                      As I said before, I have the feeling that distribution using gcc 7.1.1 seem to be much more stable. But I am still working on some tests in my computer. Since to get a good certainty of stability you need to run the test for at least 24hs it takes a while. I have also found a potential bug in kill-ryzen.sh when running in Antergos, it seems that you need to increase the virtual RAM drive size for the compilation to finish.

                      If I can confirm that my system, that always shows the bug under Ubuntu (gcc 6.3 based), is stable under Antergos (gcc 7.1.1 based), it may give AMD engineers some hint that gcc 6.3 may be generating some code that triggers the bug even if the code is valid. This is not that unusual from what I heard and a workaround may be to identify the code path and avoid it when compiling. But let us see, still soon to tell. kill-ryzen.sh is only running for 3 hours now since I fixed the RAM disk size. It did not even complete the first round of compilations.

                      Edit: Brutalix, sorry I missed you post where you say that you are running Ubuntu and Debian. Good to see more good processors in the wild.
                      Last edited by pjssilva; 08-06-2017, 03:44 PM.

                      Comment

                      Working...
                      X