Announcement

Collapse
No announcement yet.

Continuing To Stress Ryzen

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by RyzenNewbie View Post

    ufile.io - seems very anonymous to me.

    example: ufile.io/npl9m
    https://ufile.io/mhy56

    Kind regards

    B.

    Comment


    • Anecdotal evidence here: Initially I had these segfaults as well, BUT, they went away completely with latest bios, setting memory clock to 2133MHz (64GB) and me reseating the heatsink and applying better thermal grease, because I noticed that the cpu temp would almost instantly shoot up to 92.5°C under load. It's been building llvm-svn and linux-git in a loop (make -j17 on both) in a ramdisk for over 2 days now without any issues, and reported temp hovers around 90°C, heatsink is very warm.
      Last edited by mlau; 08-06-2017, 04:44 PM.

      Comment


      • Originally posted by Brutalix View Post

        https://ufile.io/mhy56

        Kind regards

        B.
        thank you for uploading it. I haven't downloaded it completely yet (ufile throttles) but I've already found three segfaults in your log file:
        Code:
        grep -v OK log.txt.part  | grep -v CPU
        Segmentation fault (core dumped)
        12226: sø. 06. aug. 01:07:16 +0200 2017: NG
        Segmentation fault (core dumped)
        48519: sø. 06. aug. 02:55:06 +0200 2017: NG
        Segmentation fault (core dumped)
        55162: sø. 06. aug. 03:15:25 +0200 2017: NG
        I would assume that your runs are not stable; but using that "ryzen_segv_test" program is not a good example as it uses some fancy xcode modifcations without fencing and stuff. I don't understand that completely, but the FreeBSD dev says that it probably is not relevant:

        https://bugs.freebsd.org/bugzilla/sh...id=219399#c124



        final grep:
        Code:
        grep -v OK log.txt  | grep -v CPU
        Segmentation fault (core dumped)
        12226: sø. 06. aug. 01:07:16 +0200 2017: NG
        Segmentation fault (core dumped)
        48519: sø. 06. aug. 02:55:06 +0200 2017: NG
        Segmentation fault (core dumped)
        55162: sø. 06. aug. 03:15:25 +0200 2017: NG
        Segmentation fault (core dumped)
        74911: sø. 06. aug. 04:14:36 +0200 2017: NG
        Segmentation fault (core dumped)
        105442: sø. 06. aug. 05:46:31 +0200 2017: NG
        Segmentation fault (core dumped)
        170666: sø. 06. aug. 09:05:05 +0200 2017: NG
        Segmentation fault (core dumped)
        228297: sø. 06. aug. 12:02:47 +0200 2017: NG
        Segmentation fault (core dumped)
        233530: sø. 06. aug. 12:16:56 +0200 2017: NG
        Segmentation fault (core dumped)
        253936: sø. 06. aug. 13:20:04 +0200 2017: NG
        Segmentation fault (core dumped)
        340422: sø. 06. aug. 17:46:12 +0200 2017: NG
        Segmentation fault (core dumped)
        378517: sø. 06. aug. 19:43:42 +0200 2017: NG
        Segmentation fault (core dumped)
        384428: sø. 06. aug. 20:00:23 +0200 2017: NG
        Segmentation fault (core dumped)
        384191: sø. 06. aug. 20:00:23 +0200 2017: NG
        Last edited by RyzenNewbie; 08-06-2017, 04:52 PM. Reason: added complete and final grep result

        Comment


        • Originally posted by mlau View Post
          (...)for over 2 days now without any issues, and reported temp hovers around 90°C, heatsink is very warm.
          damn, what Ryzen, frequency and cooling do you use? I only got around 54°C with my stock 1700...

          Comment


          • Originally posted by soulsource View Post

            I'm starting to get the impression that there might be good/bad batches of Ryzens out there. Some people seem to have absolutely no problems with their chips, while others reported that even getting a replacement from RMA didn't help. I'm just hoping that the people at AMD sort out the problem soon, and tell their customers affected by the issue if they should RMA their chips or wait for a software/microcode fix.
            Possibly stating the obvious here, but if the RMA didn't cover motherboard/RAM/etc., how can you conclude that the CPU might be from a bad batch?

            A little more context for your thoughts on this might be in order?

            Comment


            • Originally posted by mlau View Post
              Anecdotal evidence here: Initially I had these segfaults as well, BUT, they went away completely with latest bios, setting memory clock to 2133MHz (64GB) and me reseating the heatsink and applying better thermal grease, because I noticed that the cpu temp would almost instantly shoot up to 92.5°C under load. It's been building llvm-svn and linux-git in a loop (make -j17 on both) in a ramdisk for over 2 days now without any issues, and reported temp hovers around 90°C, heatsink is very warm.
              Note, that Ryzen X models report CPU temperatures 20°C higher than real ones.

              See https://community.amd.com/community/...mmunity-update

              Still, +72.2°C is pretty high for these chips. You probably need a better cooler.
              Last edited by shmerl; 08-06-2017, 04:52 PM.

              Comment


              • Originally posted by Chewi View Post

                This sounds exactly like what happened to me on Gentoo but not Fedora until I enabled config_rcu_nocb_cpu_all in my kernel. I checked Debian and they do not enable this so I tried it and sure enough it froze just the same. My system has been stable for months now but I haven't had anyone else confirm this fix/workaround yet.
                Originally posted by scorpio810 View Post

                Code:
                grep CONFIG_RCU_NOCB_CPU= /boot/config-4.11.12-vanilla
                CONFIG_RCU_NOCB_CPU=y
                
                grep CONFIG_RCU_NOCB_CPU_ALL= /boot/config-4.11.12-vanilla
                CONFIG_RCU_NOCB_CPU_ALL=y
                This options in kernel + disable core C6 can help when you have idle freezes.
                Never see idle freezes since I added this options in my custom kernel.
                Thank you very much for the information! I have now built my own kernel and enabled RCU with CONFIG_RCU_NOCB_CPU and CONFIG_RCU_NOCB_CPU_ALL. Just when I was about to start building the kernel I got one of those freezes again. After I completed the build and rebooted my system everything somehow felt more stable especially while using Firefox so I have high hopes for this fix. I think I will try this on the Intel system I have access to as well. The kernel I had before came from kernel-ppa/mainline and was labelled with version 4.12.1. I built my current kernel from the same source and similar config but with with version 4.12.4 instead. Now I just need to wait and see if I get more freezes. It turns out that Ubuntu removed this option from their builds starting with Xenial (16.04). I'm not 100% sure when I first encountered this problem but it might have been with Xenial.

                I tried running ryzen-test/kill-ryzen.sh again and got a new segmentation fault after 48 minutes so enabling the RCU options has no effect on the gcc bug.

                Comment


                • Originally posted by Silverthorn View Post
                  I tried running ryzen-test/kill-ryzen.sh again and got a new segmentation fault after 48 minutes so enabling the RCU options has no effect on the gcc bug.
                  do you have a non-Ryzen (AMD or Intel, doesn't matter) system at your disposal? If yes, may I ask that you install your SSD/HDD - on which your current system resides - into that and try that there? Just curious.

                  Thanks in advance...

                  Comment


                  • Originally posted by mlau View Post
                    Anecdotal evidence here: Initially I had these segfaults as well, BUT, they went away completely with latest bios, setting memory clock to 2133MHz (64GB) and me reseating the heatsink and applying better thermal grease, because I noticed that the cpu temp would almost instantly shoot up to 92.5°C under load. It's been building llvm-svn and linux-git in a loop (make -j17 on both) in a ramdisk for over 2 days now without any issues, and reported temp hovers around 90°C, heatsink is very warm.
                    Dude get a better heat sink. Unless you're heavily overclocking, 95W TDP CPUs shouldn't hit that high. I've had multiple overclocked systems since i7 2600k, 3770k, 4790k, ... They all stay below 75°C (around 90-100W TDP). Besides, how do you even tell the temperature when no official temp drivers are available? Your mobo prolly reports something lower than the on-die sensor. Just curious.

                    Comment


                    • All that talk about temperatures is quite ridiculous, normal compiling is not a thermal stresstest. Actually, it's totally harmless compared to Prime95 or LinX (even without Intel's heating AVX).

                      Comment

                      Working...
                      X