Announcement

Collapse
No announcement yet.

Continuing To Stress Ryzen

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by Brutalix View Post
    That and removing a lot of drivers not needed. Like radio etc..
    kern.hz=1000 sounds interesting; I'll try that tomorrow at work.


    Also I ran the bug script for the bug on bsd on my ubuntu computer,
    what bug script exactly? The only scripts posted in the reports are for buildworld/buildkernel - this doesn't work on Linux - and the "ryzen_segv_test" program/script combo.


    no errors here for 26 hours. I you want i can send you the log file, only 350 MB log file.
    Perhaps you can put it somewhere on a public server so that everone can see it?

    Comment


    • Originally posted by RyzenNewbie View Post

      what bug script exactly? The only scripts posted in the reports are for buildworld/buildkernel - this doesn't work on Linux - and the "ryzen_segv_test" program/script combo.

      Perhaps you can put it somewhere on a public server so that everone can see it?
      The Ryzen_segv_Test script, it at least works in ubuntu. Running all 12 cores 100% for 26 hours. No errors.

      Any suggestions for public server? I mostly use dropbox, and thats not really public, and I don't want my email to be public.

      Kind regards

      B.


      Comment


      • Originally posted by vortex View Post
        If you like tabloid style headlines, then nothing.

        Even after his update, the title still remains "50+ Segmentation Faults Per Hour: Continuing To Stress Ryzen" just to get hits.

        It is shoddy journalism to run with a "story" and not have your facts straight, then issue a correction saying basically, it happens on other (non-Ryzen) systems as well.

        As for Ryzen itself, there does appear to be some corner case where some people are seeing, and it is incredibly difficult to replicate.
        You can bet that AMD has hundreds of machines with Epyc & threadripper & Ryzen all doing tons of different workloads trying to replicate the issue, and nothing has come up yet as far as we know.

        This isn't anything new, Intel also has erratas on their CPUs, nothing is perfect out of the gate, it takes time to find out all the bugs.
        Look at https://www.intel.com/content/dam/ww...ion-update.pdf and look at all the "no fix" items listed, most of the fixes are done via BIOS updates, some need a new stepping.
        AMD hasn't issued errata guide for Ryzen yet, the last one I can find is http://support.amd.com/TechDocs/5537...Processors.pdf which also have "no fix" & other fixes done via BIOS updates.


        If someone does come up with a repeatable workload that can show Ryzen failing, then, cool, that person should get a bug hunting award from AMD.
        The difficulty is less in reproducing it - that's easy, just compile mesa in a loop and wait several hours for a segfault to show up - but in finding a way to reproduce it on demand, not in such a semi-random fashion.
        As I've already said, I've now tested three different Ryzen machines (two of which have all UEFI settings at default values), and could get each of them to fail a mesa build due to a segfault in gcc within 8 hours. That's fine for confirming that these machines show the issue, it's just a pain to debug the problem if you have to wait several hours for new test-data...

        Comment


        • Originally posted by soulsource View Post

          The difficulty is less in reproducing it - that's easy, just compile mesa in a loop and wait several hours for a segfault to show up - but in finding a way to reproduce it on demand, not in such a semi-random fashion.
          As I've already said, I've now tested three different Ryzen machines (two of which have all UEFI settings at default values), and could get each of them to fail a mesa build due to a segfault in gcc within 8 hours. That's fine for confirming that these machines show the issue, it's just a pain to debug the problem if you have to wait several hours for new test-data...
          Well for me, its a problem to reproduce, I don't get the error, at least for 26 hours. What I would like is some feedback from AMD, that they are able or unable to reproduce the problem.

          Kind regards.
          B.

          Comment


          • Originally posted by Brutalix View Post

            1700x:
            GA-AX-370-K5, BIOS F3. Kingston 32gb ECC 2400 ram. Standard voltage (meaning auto on ram and CPU). (Ram also here runs on 1.2v) Debian testing. Custom kernel 4.11.0.
            This motherboard shouldn't support ECC, as far I know.

            Comment


            • Any suggestions for public server?
              ufile.io - seems very anonymous to me.

              example: ufile.io/npl9m

              Comment


              • Originally posted by Brutalix View Post

                Well for me, its a problem to reproduce, I don't get the error, at least for 26 hours. What I would like is some feedback from AMD, that they are able or unable to reproduce the problem.

                Kind regards.
                B.
                I'm starting to get the impression that there might be good/bad batches of Ryzens out there. Some people seem to have absolutely no problems with their chips, while others reported that even getting a replacement from RMA didn't help. I'm just hoping that the people at AMD sort out the problem soon, and tell their customers affected by the issue if they should RMA their chips or wait for a software/microcode fix.

                Comment


                • What comes to my mind is, did the finder test llvm 5.0 on bsd and the gcc 7 on ubuntu? If they did it and it made a difference, there is no problem. File a bug for backporting zenver1 support on ubuntu 17.04 & bsd compilers.

                  Comment


                  • Originally posted by satai View Post

                    This motherboard shouldn't support ECC, as far I know.
                    Well at least it seems to work on my motherboard. I will not claim to be a linux expert, mostly an enthusiast. I'm a radiologist, and use the systems for Freesurfer and FSL in radiological research. That's why I use ECC and 32 gb ram.
                    This is what I get from dmesg |grep -i edac:

                    [ 13.063025] EDAC MC: Ver: 3.0.0
                    [ 13.070564] EDAC amd64: Node 0: DRAM ECC enabled.
                    [ 13.070566] EDAC amd64: F17h detected (node 0).
                    [ 13.070602] EDAC MC: UMC0 chip selects:
                    [ 13.070602] EDAC amd64: MC: 0: 0MB 1: 0MB
                    [ 13.070603] EDAC amd64: MC: 2: 0MB 3: 0MB
                    [ 13.070604] EDAC amd64: MC: 4: 0MB 5: 0MB
                    [ 13.070604] EDAC amd64: MC: 6: 0MB 7: 0MB
                    [ 13.070606] EDAC MC: UMC1 chip selects:
                    [ 13.070607] EDAC amd64: MC: 0: 0MB 1: 0MB
                    [ 13.070607] EDAC amd64: MC: 2: 0MB 3: 0MB
                    [ 13.070608] EDAC amd64: MC: 4: 0MB 5: 0MB
                    [ 13.070608] EDAC amd64: MC: 6: 0MB 7: 0MB
                    [ 13.070609] EDAC amd64: using x8 syndromes.
                    [ 13.070609] EDAC amd64: MCT channel count: 2
                    [ 13.070653] EDAC MC0: Giving out device to module amd64_edac controller F17h: DEV 0000:00:18.3 (INTERRUPT)
                    [ 13.070664] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
                    [ 13.070664] AMD64 EDAC driver v3.5.0

                    and edac-util --v give me:

                    mc0: 0 Uncorrected Errors with no DIMM info
                    mc0: 0 Corrected Errors with no DIMM info
                    edac-util: No errors to report.

                    I thought that meant that it works. I tried that on my GA-AB350M Gaming 3 mb. and then there was a different dmsg message If i remember correctly. But it still booted with the ram, and ran the correct speed 2400.

                    Kind regards

                    B.

                    Comment


                    • Originally posted by _ONH_ View Post
                      What comes to my mind is, did the finder test llvm 5.0 on bsd and the gcc 7 on ubuntu?
                      no, but one of the finders(!) did a test in an i386 jail (compilations fixed to 32bit) and reported that there are no unexpected compilation failures at all:

                      https://bugs.freebsd.org/bugzilla/sh...?id=219399#c70

                      Comment

                      Working...
                      X