Announcement

Collapse
No announcement yet.

Continuing To Stress Ryzen

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by Naib View Post
    That isn't showing a bug in Ryzen. That is showing a repeatable crash.
    yep, a freeze that I cannot force on my Xeon system at the moment.


    What people are failing to grasp here is the fundamentals of rigorous root cause analysis.
    thank you; but you do understand that I am the reporter of the FreeBSD reports, do you?


    The BSD lot have a better handle on this than linux where right now a linux news outlet has an irresponsible headline misrepresenting the situation, an article that is now linked to many a site...
    I do not care about any headlines; but I do care to get more people trying what I am trying - even if that means to test an unknown operating system.


    (...) (has the testcase been tried/ported to linux? )
    I don't think so, do you volunteer? I mean that seriously. I'd even offer a bootable image with FreeBSD test cases so that you don't have to install it on your SSD or a new one.


    Personally I would go over the BSD case to replicate the testcase in linux to increase the number of machines trialling this method. IF it doesn't cause the same fault on linux then it is possibly a BSD specific issue. IF it causes it on linux then the testcase needs to be further narrowed down
    agreed...

    Comment


    • For people who want a thread with *many* people confirming the bug when running kill-ryzen.sh just look in google for

      DiscussionThreadripper Early Adopters and Ryzen gamers, can you spare us some time?

      (People in that thread don't have Threadripper yet, but many have Ryzen and did the test in their computers).

      As I said before, I have the feeling that distribution using gcc 7.1.1 seem to be much more stable. But I am still working on some tests in my computer. Since to get a good certainty of stability you need to run the test for at least 24hs it takes a while. I have also found a potential bug in kill-ryzen.sh when running in Antergos, it seems that you need to increase the virtual RAM drive size for the compilation to finish.

      If I can confirm that my system, that always shows the bug under Ubuntu (gcc 6.3 based), is stable under Antergos (gcc 7.1.1 based), it may give AMD engineers some hint that gcc 6.3 may be generating some code that triggers the bug even if the code is valid. This is not that unusual from what I heard and a workaround may be to identify the code path and avoid it when compiling. But let us see, still soon to tell. kill-ryzen.sh is only running for 3 hours now since I fixed the RAM disk size. It did not even complete the first round of compilations.

      Edit: Brutalix, sorry I missed you post where you say that you are running Ubuntu and Debian. Good to see more good processors in the wild.
      Last edited by pjssilva; 06 August 2017, 03:44 PM.

      Comment


      • Originally posted by Brutalix View Post
        That and removing a lot of drivers not needed. Like radio etc..
        kern.hz=1000 sounds interesting; I'll try that tomorrow at work.


        Also I ran the bug script for the bug on bsd on my ubuntu computer,
        what bug script exactly? The only scripts posted in the reports are for buildworld/buildkernel - this doesn't work on Linux - and the "ryzen_segv_test" program/script combo.


        no errors here for 26 hours. I you want i can send you the log file, only 350 MB log file.
        Perhaps you can put it somewhere on a public server so that everone can see it?

        Comment


        • Originally posted by RyzenNewbie View Post

          what bug script exactly? The only scripts posted in the reports are for buildworld/buildkernel - this doesn't work on Linux - and the "ryzen_segv_test" program/script combo.

          Perhaps you can put it somewhere on a public server so that everone can see it?
          The Ryzen_segv_Test script, it at least works in ubuntu. Running all 12 cores 100% for 26 hours. No errors.

          Any suggestions for public server? I mostly use dropbox, and thats not really public, and I don't want my email to be public.

          Kind regards

          B.


          Comment


          • Originally posted by vortex View Post
            If you like tabloid style headlines, then nothing.

            Even after his update, the title still remains "50+ Segmentation Faults Per Hour: Continuing To Stress Ryzen" just to get hits.

            It is shoddy journalism to run with a "story" and not have your facts straight, then issue a correction saying basically, it happens on other (non-Ryzen) systems as well.

            As for Ryzen itself, there does appear to be some corner case where some people are seeing, and it is incredibly difficult to replicate.
            You can bet that AMD has hundreds of machines with Epyc & threadripper & Ryzen all doing tons of different workloads trying to replicate the issue, and nothing has come up yet as far as we know.

            This isn't anything new, Intel also has erratas on their CPUs, nothing is perfect out of the gate, it takes time to find out all the bugs.
            Look at https://www.intel.com/content/dam/ww...ion-update.pdf and look at all the "no fix" items listed, most of the fixes are done via BIOS updates, some need a new stepping.
            AMD hasn't issued errata guide for Ryzen yet, the last one I can find is http://support.amd.com/TechDocs/5537...Processors.pdf which also have "no fix" & other fixes done via BIOS updates.


            If someone does come up with a repeatable workload that can show Ryzen failing, then, cool, that person should get a bug hunting award from AMD.
            The difficulty is less in reproducing it - that's easy, just compile mesa in a loop and wait several hours for a segfault to show up - but in finding a way to reproduce it on demand, not in such a semi-random fashion.
            As I've already said, I've now tested three different Ryzen machines (two of which have all UEFI settings at default values), and could get each of them to fail a mesa build due to a segfault in gcc within 8 hours. That's fine for confirming that these machines show the issue, it's just a pain to debug the problem if you have to wait several hours for new test-data...

            Comment


            • Originally posted by soulsource View Post

              The difficulty is less in reproducing it - that's easy, just compile mesa in a loop and wait several hours for a segfault to show up - but in finding a way to reproduce it on demand, not in such a semi-random fashion.
              As I've already said, I've now tested three different Ryzen machines (two of which have all UEFI settings at default values), and could get each of them to fail a mesa build due to a segfault in gcc within 8 hours. That's fine for confirming that these machines show the issue, it's just a pain to debug the problem if you have to wait several hours for new test-data...
              Well for me, its a problem to reproduce, I don't get the error, at least for 26 hours. What I would like is some feedback from AMD, that they are able or unable to reproduce the problem.

              Kind regards.
              B.

              Comment


              • Originally posted by Brutalix View Post

                1700x:
                GA-AX-370-K5, BIOS F3. Kingston 32gb ECC 2400 ram. Standard voltage (meaning auto on ram and CPU). (Ram also here runs on 1.2v) Debian testing. Custom kernel 4.11.0.
                This motherboard shouldn't support ECC, as far I know.

                Comment


                • Any suggestions for public server?
                  ufile.io - seems very anonymous to me.

                  example: ufile.io/npl9m

                  Comment


                  • Originally posted by Brutalix View Post

                    Well for me, its a problem to reproduce, I don't get the error, at least for 26 hours. What I would like is some feedback from AMD, that they are able or unable to reproduce the problem.

                    Kind regards.
                    B.
                    I'm starting to get the impression that there might be good/bad batches of Ryzens out there. Some people seem to have absolutely no problems with their chips, while others reported that even getting a replacement from RMA didn't help. I'm just hoping that the people at AMD sort out the problem soon, and tell their customers affected by the issue if they should RMA their chips or wait for a software/microcode fix.

                    Comment


                    • What comes to my mind is, did the finder test llvm 5.0 on bsd and the gcc 7 on ubuntu? If they did it and it made a difference, there is no problem. File a bug for backporting zenver1 support on ubuntu 17.04 & bsd compilers.

                      Comment

                      Working...
                      X