Announcement

Collapse
No announcement yet.

Ryzen-Test & Stress-Run Make It Easy To Cause Segmentation Faults On Zen CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    Guys, such artificial torture tests may fail any CPU. Let's not panic.
    The "situation" can also be temporal depending on the updates on BIOS, kernel, gcc, libc, etc...
    By the way, on different Linuxes different results, can it be also scheduler stuff? Do people posting about their scheduler?
    How about other compilers, interpreters? Pyhton, java,...?
    You may try compiling ns3 (http://www.nsnam.org/) network simulator which is one of the toughest compile job I have ever seen so far!
    I would love to see a "compilation of responses" from AMD and from motherboard manufacturers. Also from kernel guys!!!

    All the best...

    Comment


    • #92
      maybe now finally fixed in dragonflybsd? http://lists.dragonflybsd.org/piperm...st/626190.html

      Comment


      • #93
        Originally posted by kemalihsan View Post
        Guys, such artificial torture tests may fail any CPU. Let's not panic.
        The "situation" can also be temporal depending on the updates on BIOS, kernel, gcc, libc, etc...
        By the way, on different Linuxes different results, can it be also scheduler stuff? Do people posting about their scheduler?
        ...
        no, an artificial torture test will not fail any CPU - a stable CPU should be "bug-free" enough that how matter how long you throw math to it it will calculate correctly.

        Comment


        • #94
          Originally posted by rene View Post
          maybe now finally fixed in dragonflybsd? http://lists.dragonflybsd.org/piperm...st/626190.html
          See https://www.phoronix.com/forums/foru...206#post969206

          Comment


          • #95
            See https://www.phoronix.com/forums/foru...244#post969244

            Comment


            • #96
              Does anybody knows if the bug is just present on the full configured models (1700, 1700x and 1800x) or it's also prensent on the six cores models and on models with not SMT like the 4 cores 1300x and 1200?

              Comment


              • #97
                Fwiw I have just gotten a replacement 1700 from AMD (to replace due to MCE issues and segfaults) the new one "seems" fine and was made week 30 this year (UA 1730)

                Looks like (from information on AMD forums) anything newer than 1725 looks to be "fixed"

                Comment


                • #98
                  I have my 1700X from launch, well few days later, i got it on 7th of March. I ran that ryzen kill script for a hour with no issues. never seen the freeze bug either.
                  motherboard is asus prime x370 pro. i have my cpu under oc of 3.9ghz with voltage offset +0.1750V (offset is used to make the cpu go to idle clocks when not needed)

                  Comment


                  • #99
                    I have Ryzen 1700, stock speed, never overclocked, bought on 21st of July. I won't remove the cooler to see the production date.
                    64Gb RAM, 2933 Mhz.
                    I use Debian, latest version (9.1), KDE Plasma desktop.

                    I've run the kill-ryzen.sh script, and I can reproduce the segfault error with SMT on. But if I switch SMT off, it no longer appears.
                    As I don't use SMT, and this is my main computer, I'll keep the processor for now. I'll probably send it for replacement eventually, if no permanent solution is found.

                    The results of my tests, run on 25th of August, are:
                    -run 1h 37m, no error, SMT off
                    -run 22 minutes, SMT on, 1 thread (loop-4) quickly gave an error "TIME TO FAIL: 263 s" (no additional error message shown)
                    -run 6 minutes, SMT on, 1 thread (loop-0) quickly gave an error "TIME TO FAIL: 104 s" (also segfault error message appeared)
                    -run 16 minutes, no error, SMT off

                    For the next tests I kept SMT off, but I increased the number of threads by editing the script:
                    -run 40 minutes, no error, SMT off, 32 threads
                    -run 1h 32m, no error, SMT off, 64 threads
                    -run 39 minutes, no error, SMT off, 16 threads

                    Comment


                    • I ran more tests and was able to reproduce the error with SMT off as well, after a 35 minutes run.
                      [KERN] Aug 26 09:44:06 host kernel: as[31164]: segfault at 7c00000077 ip 0000007c00000077 sp 00007ffca8334bf1 error 14 in x86_64-linux-gnu-as[55de2995f000+5b000]
                      The loop didn't appear to die though (no ""TIME TO FAIL: " message).

                      Comment

                      Working...
                      X