Announcement

Collapse
No announcement yet.

Continuing To Stress Ryzen

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What comes to my mind is, did the finder test llvm 5.0 on bsd and the gcc 7 on ubuntu? If they did it and it made a difference, there is no problem. File a bug for backporting zenver1 support on ubuntu 17.04 & bsd compilers.

    Comment


    • Originally posted by satai View Post

      This motherboard shouldn't support ECC, as far I know.
      Well at least it seems to work on my motherboard. I will not claim to be a linux expert, mostly an enthusiast. I'm a radiologist, and use the systems for Freesurfer and FSL in radiological research. That's why I use ECC and 32 gb ram.
      This is what I get from dmesg |grep -i edac:

      [ 13.063025] EDAC MC: Ver: 3.0.0
      [ 13.070564] EDAC amd64: Node 0: DRAM ECC enabled.
      [ 13.070566] EDAC amd64: F17h detected (node 0).
      [ 13.070602] EDAC MC: UMC0 chip selects:
      [ 13.070602] EDAC amd64: MC: 0: 0MB 1: 0MB
      [ 13.070603] EDAC amd64: MC: 2: 0MB 3: 0MB
      [ 13.070604] EDAC amd64: MC: 4: 0MB 5: 0MB
      [ 13.070604] EDAC amd64: MC: 6: 0MB 7: 0MB
      [ 13.070606] EDAC MC: UMC1 chip selects:
      [ 13.070607] EDAC amd64: MC: 0: 0MB 1: 0MB
      [ 13.070607] EDAC amd64: MC: 2: 0MB 3: 0MB
      [ 13.070608] EDAC amd64: MC: 4: 0MB 5: 0MB
      [ 13.070608] EDAC amd64: MC: 6: 0MB 7: 0MB
      [ 13.070609] EDAC amd64: using x8 syndromes.
      [ 13.070609] EDAC amd64: MCT channel count: 2
      [ 13.070653] EDAC MC0: Giving out device to module amd64_edac controller F17h: DEV 0000:00:18.3 (INTERRUPT)
      [ 13.070664] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
      [ 13.070664] AMD64 EDAC driver v3.5.0

      and edac-util --v give me:

      mc0: 0 Uncorrected Errors with no DIMM info
      mc0: 0 Corrected Errors with no DIMM info
      edac-util: No errors to report.

      I thought that meant that it works. I tried that on my GA-AB350M Gaming 3 mb. and then there was a different dmsg message If i remember correctly. But it still booted with the ram, and ran the correct speed 2400.

      Kind regards

      B.

      Comment


      • Originally posted by _ONH_ View Post
        What comes to my mind is, did the finder test llvm 5.0 on bsd and the gcc 7 on ubuntu?
        no, but one of the finders(!) did a test in an i386 jail (compilations fixed to 32bit) and reported that there are no unexpected compilation failures at all:

        https://bugs.freebsd.org/bugzilla/sh...?id=219399#c70

        Comment


        • Originally posted by RyzenNewbie View Post

          ufile.io - seems very anonymous to me.

          example: ufile.io/npl9m
          https://ufile.io/mhy56

          Kind regards

          B.

          Comment


          • Anecdotal evidence here: Initially I had these segfaults as well, BUT, they went away completely with latest bios, setting memory clock to 2133MHz (64GB) and me reseating the heatsink and applying better thermal grease, because I noticed that the cpu temp would almost instantly shoot up to 92.5°C under load. It's been building llvm-svn and linux-git in a loop (make -j17 on both) in a ramdisk for over 2 days now without any issues, and reported temp hovers around 90°C, heatsink is very warm.
            Last edited by mlau; 06 August 2017, 04:44 PM.

            Comment


            • Originally posted by Brutalix View Post

              https://ufile.io/mhy56

              Kind regards

              B.
              thank you for uploading it. I haven't downloaded it completely yet (ufile throttles) but I've already found three segfaults in your log file:
              Code:
              grep -v OK log.txt.part  | grep -v CPU
              Segmentation fault (core dumped)
              12226: sø. 06. aug. 01:07:16 +0200 2017: NG
              Segmentation fault (core dumped)
              48519: sø. 06. aug. 02:55:06 +0200 2017: NG
              Segmentation fault (core dumped)
              55162: sø. 06. aug. 03:15:25 +0200 2017: NG
              I would assume that your runs are not stable; but using that "ryzen_segv_test" program is not a good example as it uses some fancy xcode modifcations without fencing and stuff. I don't understand that completely, but the FreeBSD dev says that it probably is not relevant:

              https://bugs.freebsd.org/bugzilla/sh...id=219399#c124



              final grep:
              Code:
              grep -v OK log.txt  | grep -v CPU
              Segmentation fault (core dumped)
              12226: sø. 06. aug. 01:07:16 +0200 2017: NG
              Segmentation fault (core dumped)
              48519: sø. 06. aug. 02:55:06 +0200 2017: NG
              Segmentation fault (core dumped)
              55162: sø. 06. aug. 03:15:25 +0200 2017: NG
              Segmentation fault (core dumped)
              74911: sø. 06. aug. 04:14:36 +0200 2017: NG
              Segmentation fault (core dumped)
              105442: sø. 06. aug. 05:46:31 +0200 2017: NG
              Segmentation fault (core dumped)
              170666: sø. 06. aug. 09:05:05 +0200 2017: NG
              Segmentation fault (core dumped)
              228297: sø. 06. aug. 12:02:47 +0200 2017: NG
              Segmentation fault (core dumped)
              233530: sø. 06. aug. 12:16:56 +0200 2017: NG
              Segmentation fault (core dumped)
              253936: sø. 06. aug. 13:20:04 +0200 2017: NG
              Segmentation fault (core dumped)
              340422: sø. 06. aug. 17:46:12 +0200 2017: NG
              Segmentation fault (core dumped)
              378517: sø. 06. aug. 19:43:42 +0200 2017: NG
              Segmentation fault (core dumped)
              384428: sø. 06. aug. 20:00:23 +0200 2017: NG
              Segmentation fault (core dumped)
              384191: sø. 06. aug. 20:00:23 +0200 2017: NG
              Last edited by RyzenNewbie; 06 August 2017, 04:52 PM. Reason: added complete and final grep result

              Comment


              • Originally posted by mlau View Post
                (...)for over 2 days now without any issues, and reported temp hovers around 90°C, heatsink is very warm.
                damn, what Ryzen, frequency and cooling do you use? I only got around 54°C with my stock 1700...

                Comment


                • Originally posted by soulsource View Post

                  I'm starting to get the impression that there might be good/bad batches of Ryzens out there. Some people seem to have absolutely no problems with their chips, while others reported that even getting a replacement from RMA didn't help. I'm just hoping that the people at AMD sort out the problem soon, and tell their customers affected by the issue if they should RMA their chips or wait for a software/microcode fix.
                  Possibly stating the obvious here, but if the RMA didn't cover motherboard/RAM/etc., how can you conclude that the CPU might be from a bad batch?

                  A little more context for your thoughts on this might be in order?

                  Comment


                  • Originally posted by mlau View Post
                    Anecdotal evidence here: Initially I had these segfaults as well, BUT, they went away completely with latest bios, setting memory clock to 2133MHz (64GB) and me reseating the heatsink and applying better thermal grease, because I noticed that the cpu temp would almost instantly shoot up to 92.5°C under load. It's been building llvm-svn and linux-git in a loop (make -j17 on both) in a ramdisk for over 2 days now without any issues, and reported temp hovers around 90°C, heatsink is very warm.
                    Note, that Ryzen X models report CPU temperatures 20°C higher than real ones.

                    See https://community.amd.com/community/...mmunity-update

                    Still, +72.2°C is pretty high for these chips. You probably need a better cooler.
                    Last edited by shmerl; 06 August 2017, 04:52 PM.

                    Comment


                    • Originally posted by Chewi View Post

                      This sounds exactly like what happened to me on Gentoo but not Fedora until I enabled config_rcu_nocb_cpu_all in my kernel. I checked Debian and they do not enable this so I tried it and sure enough it froze just the same. My system has been stable for months now but I haven't had anyone else confirm this fix/workaround yet.
                      Originally posted by scorpio810 View Post

                      Code:
                      grep CONFIG_RCU_NOCB_CPU= /boot/config-4.11.12-vanilla
                      CONFIG_RCU_NOCB_CPU=y
                      
                      grep CONFIG_RCU_NOCB_CPU_ALL= /boot/config-4.11.12-vanilla
                      CONFIG_RCU_NOCB_CPU_ALL=y
                      This options in kernel + disable core C6 can help when you have idle freezes.
                      Never see idle freezes since I added this options in my custom kernel.
                      Thank you very much for the information! I have now built my own kernel and enabled RCU with CONFIG_RCU_NOCB_CPU and CONFIG_RCU_NOCB_CPU_ALL. Just when I was about to start building the kernel I got one of those freezes again. After I completed the build and rebooted my system everything somehow felt more stable especially while using Firefox so I have high hopes for this fix. I think I will try this on the Intel system I have access to as well. The kernel I had before came from kernel-ppa/mainline and was labelled with version 4.12.1. I built my current kernel from the same source and similar config but with with version 4.12.4 instead. Now I just need to wait and see if I get more freezes. It turns out that Ubuntu removed this option from their builds starting with Xenial (16.04). I'm not 100% sure when I first encountered this problem but it might have been with Xenial.

                      I tried running ryzen-test/kill-ryzen.sh again and got a new segmentation fault after 48 minutes so enabling the RCU options has no effect on the gcc bug.

                      Comment

                      Working...
                      X