Announcement

Collapse
No announcement yet.

Continuing To Stress Ryzen

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by mlau View Post
    Anecdotal evidence here: Initially I had these segfaults as well, BUT, they went away completely with latest bios, setting memory clock to 2133MHz (64GB) and me reseating the heatsink and applying better thermal grease, because I noticed that the cpu temp would almost instantly shoot up to 92.5°C under load. It's been building llvm-svn and linux-git in a loop (make -j17 on both) in a ramdisk for over 2 days now without any issues, and reported temp hovers around 90°C, heatsink is very warm.
    Dude get a better heat sink. Unless you're heavily overclocking, 95W TDP CPUs shouldn't hit that high. I've had multiple overclocked systems since i7 2600k, 3770k, 4790k, ... They all stay below 75°C (around 90-100W TDP). Besides, how do you even tell the temperature when no official temp drivers are available? Your mobo prolly reports something lower than the on-die sensor. Just curious.

    Comment


    • All that talk about temperatures is quite ridiculous, normal compiling is not a thermal stresstest. Actually, it's totally harmless compared to Prime95 or LinX (even without Intel's heating AVX).

      Comment


      • Unfortunately, kill-ryzen.sh still segfaults under Antergos and gcc 7.1.1 toolchain. It takes much longer than Ubuntu, always around 10.500s. I don't understand this. But is is very consistent. I am trying to compile a kernel with 1000 hz now.

        Good news is that in AMD support forum one of the guys just reported that he got a good processor through RMA. We already had reports of people getting bad processors after RMA, but now we have a case of good RMA. This may indicate that maybe that are Ryzen batches with problems, or that newer batches are better after tweaks in the production line. This new, good, processor was fabricated in June. Brutalix when did you buy your processor? Is is recent?

        So there seems to be good and bad processors in wild. Which is a good sign. It may indicate that RMAs for the affected may fix the problem. Most people won't require a RMA, for example if they use Windows or even Linux but do not compile a lot. I am in the process of RMA of my processor. My fingers are crossed.


        Comment


        • Originally posted by Brutalix View Post

          That and removing a lot of drivers not needed. Like radio etc..
          Also I ran the bug script for the bug on bsd on my ubuntu computer, no errors here for 26 hours. I you want i can send you the log file, only 350 MB log file.

          Kind regards

          B.

          I thought that test script actually required a modification to the kernel which would allow access to the specific address range involved. Without that kernel modification it doesn't have the ability to run on the affected address range and so will pass every time.

          Comment


          • Originally posted by duby229 View Post
            I thought that test script actually required a modification to the kernel which would allow access to the specific address range involved. Without that kernel modification it doesn't have the ability to run on the affected address range and so will pass every time.
            I think you accidentially mix up the "ryzen_segv_test" program with the (at the moment) FreeBSD specific program



            where the latter indeed needs a little kernel patch so that user programs may map memory near to the top of the memory address range...

            Comment


            • By the way. The guy who seems to have gotten a good CPU is on his second RMA. The first one also had problems. This givens an idea that, at least for the early batches, the problem must be reasonably usual. How are the odds of doing an RMA and getting a CPU with the same problem?!

              Comment


              • Originally posted by RyzenNewbie View Post

                do you have a non-Ryzen (AMD or Intel, doesn't matter) system at your disposal? If yes, may I ask that you install your SSD/HDD - on which your current system resides - into that and try that there? Just curious.

                Thanks in advance...
                Interesting idea. I could do that but I only have an old system readily available. It's possible that I would have to clone the main SSD and disable EFI on it for it to boot. I have a busy week ahead of me so I don't have to much time. What I could do is try running the script from a live image and see what happens.

                Comment


                • Originally posted by Silverthorn View Post
                  Interesting idea. I could do that but I only have an old system readily available. It's possible that I would have to clone the main SSD and disable EFI on it for it to boot. I have a busy week ahead of me so I don't have to much time. What I could do is try running the script from a live image and see what happens.
                  no problem - whenever it suits you mostly. But I strongly suggest to use a clone of your current system in order to achieve the same environment...

                  Comment


                  • Originally posted by Silverthorn View Post



                    Thank you very much for the information! I have now built my own kernel and enabled RCU with CONFIG_RCU_NOCB_CPU and CONFIG_RCU_NOCB_CPU_ALL. Just when I was about to start building the kernel I got one of those freezes again. After I completed the build and rebooted my system everything somehow felt more stable especially while using Firefox so I have high hopes for this fix. I think I will try this on the Intel system I have access to as well. The kernel I had before came from kernel-ppa/mainline and was labelled with version 4.12.1. I built my current kernel from the same source and similar config but with with version 4.12.4 instead. Now I just need to wait and see if I get more freezes. It turns out that Ubuntu removed this option from their builds starting with Xenial (16.04). I'm not 100% sure when I first encountered this problem but it might have been with Xenial.

                    I tried running ryzen-test/kill-ryzen.sh again and got a new segmentation fault after 48 minutes so enabling the RCU options has no effect on the gcc bug.
                    Enabling the RCU options can help if only you have idle freezes !
                    For segfaults : disable Opcache in your BIOS if your motherboard have option, or disable kernel ASLR by add norandmaps in your grub kernel command :

                    or with this command :
                    echo 0 > /proc/sys/kernel/randomize_va_space


                    I not retry custom kernel 4.12.3 4.12.4, and I returned to latest 4.11.12 : see this post :
                    https://www.phoronix.com/forums/foru...566#post966566

                    and :
                    https://www.phoronix.com/forums/foru...737#post966737
                    Last edited by scorpio810; 06 August 2017, 08:04 PM.

                    Comment


                    • [to Silverthorn]
                      I can't run two machines at once with the same environment, so I'll report on an older system running Mint 17.3 MATE 64-bit that is performing a four-thread compilation as I type. The CPU is a Phenom II X4 965 running Linux kernel 3.13.0-105. It is not overclocked (as far as I remember). Board is Gigabyte 770T-USB3. Randomize is shown as 2. So far after 2 hours of operation, there have been no messages after the "loop n [date and time] start zero" messages.

                      Meanwhile, about 5 hours in on the Mint 18.1 MATE Ryzen machine running kernel 4.11.0-13, pass zero compilations are complete, and pass one compilations are in progress (for those loops that didn't fail initially). The reported faults are:

                      Pass 0
                      Loop Time Type
                      15 52s seg. fault
                      0 56s general protection fault
                      2 1926s seg. fault
                      9 8413s seg. fault

                      Pass 1
                      4 10564s seg. fault
                      1 12197s seg. fault

                      I think this supports the association of the problem with the Ryzen environment, but not with a particular aspect of the Ryzen environment. My only dog in that fight is whether this compilation defect is due to tight overclocking, or is more generic as reports seem to suggest. I would hate to have to evaluate dozens of quasi-correlated timing, voltage, and transmission line loading resistance settings with this test added.

                      Comment

                      Working...
                      X