Announcement

Collapse
No announcement yet.

Some Ryzen Linux Users Are Facing Issues With Heavy Compilation Loads

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • # Multipost to Phoronix and Gentoo forum

    Hey guys, please refer to my (id:sat) posts on the AMD community's thread about this problem.
    I consider that there is high possibility of Ryzen's hardware problem based from my analysis
    based on the result of the reproduction on Windows Subsystem on Linux (WSL) and kernel
    level trace information.

    The thread about this problem in AMD support community:
    https://community.amd.com/message/2801909

    * Reproduction in the other OSes like Windows, more precisely Windows Subsystem for Linux (WSL), so
    called Bash on Ubuntu on Windows.

    => My post beginning with "I ran my reproducer, building linux kernel with make -j16, on WSL
    and it failed at random...."

    * The result of analyzing the what caused SEGVs by setting tracer in linux kernel

    => My post beginning with "I did the above mentioned investigation and got some more information

    from other Ryzen users. Here is the summary(details are below)...."


    * Why I considers the prime suspect is Ryzen rather than other hardwares/softwares

    => My post beginning with 'Please let me summarize "what component is wrong (I bet it's a Ryzen)"

    by taking account of my past analysis and the facts that has reported here, because information

    gets complicated..."

    Comment


    • I figured I'd throw a comment in here as i've seen all these errors at one point or another.
      It seems that any memory speeds over 2666 require a vsoc voltage bump for me to get rid of errors ie: 1.225v currently
      Also today was the hottest day of the year and I started getting the uop cache parity errors during a firefox compile on settings that were otherwise stable for over a month.
      Heat could be a big factor for that. The ~1.4v turbo voltage is just intense.

      1800x
      gigabyte gaming K7
      64GB ecc mem overclocked at 2933 (no errors)

      I'm not complaining, I enjoy testing different settings and dialing things in.
      So my dream of 4GHz fixed speed just isn't looking possible for summertime heat. 1.4125v is just too much (not gonna run my fans at 100% all the time). There really is no head room on these chips, their stock speed/voltage seems to be right near their max as it is, so it's no surprise to me that a few will have issues even at default.

      I'd recommend that people who aren't overclocking, considering raising the vsoc a tad and see if it makes their issue go away.
      I'm running the llc settings at "regular" because the vrm's get hot.

      Also worth noting the PWM fan control on my system is inverted.
      So 0 is max speed and 255 is slowest speed.
      I had to modify the fancontrol script to get this right and use a git version of the it87 module for the sensors output.
      This could be very dangerous for some people if they don't notice, as the fans will spin slower as the cpu gets hotter.
      Last edited by Soul_keeper; 06-16-2017, 12:36 AM.

      Comment


      • Originally posted by Soul_keeper View Post
        1800x
        gigabyte gaming K7
        64GB ecc mem overclocked at 2933 (no errors)
        Is ECC working on your mobo? I had heard ECC only worked with certain Asrock and Asus boards.

        Comment


        • Just a quick update. I've been running a Debian kernel for a few days. I did have one freeze but it took a lot longer to occur than the other freezes and it was slightly different in that I was still able to move the X pointer so I'm discounting that one for now. Given that this kernel doesn't enable CONFIG_RCU_NOCB_CPU_ALL, there must be some other way to avoid the freezes. I'l try to find out what it is but the motivation is lacking a little now that I've already found a workaround.

          Comment


          • Originally posted by Chewi View Post
            Just a quick update. I've been running a Debian kernel for a few days. I did have one freeze but it took a lot longer to occur than the other freezes and it was slightly different in that I was still able to move the X pointer so I'm discounting that one for now. Given that this kernel doesn't enable CONFIG_RCU_NOCB_CPU_ALL, there must be some other way to avoid the freezes. I'l try to find out what it is but the motivation is lacking a little now that I've already found a workaround.
            Heh, spoke too soon. It froze again a few hours later. The X pointer kept moving at first but then that froze too. I guess Debian's kernel is affected after all.

            Comment


            • Originally posted by Chewi View Post

              Heh, spoke too soon. It froze again a few hours later. The X pointer kept moving at first but then that froze too. I guess Debian's kernel is affected after all.
              As this is a hardware bug, why the Debian kernel shouldn't be affected?

              Comment


              • Originally posted by PuckPoltergeist View Post

                As this is a hardware bug, why the Debian kernel shouldn't be affected?
                Read my previous posts. Regardless of whether this is a hardware bug or not, enabling CONFIG_RCU_NOCB_CPU_ALL stops the freezing, at least in my case. Fedora enables this, Debian does not.

                Comment


                • Originally posted by PuckPoltergeist View Post

                  As this is a hardware bug, why the Debian kernel shouldn't be affected?
                  Hardware bugs aren't necessarily triggered by all kernels.

                  Other than an undocumented hearsay of it happened to one ryzen user using netbsd, and Matt Dillon's (dragonflybsd) report, the bug is only triggered when using Linux. Still not a single Windows user reporting this.

                  Comment


                  • Originally posted by Chewi View Post

                    Heh, spoke too soon. It froze again a few hours later. The X pointer kept moving at first but then that froze too. I guess Debian's kernel is affected after all.
                    Since I added CONFIG_RCU_NOCB_CPU and CONFIG_RCU_NOCB_CPU_ALL and Norandmaps in my kernel of vanilla 4.11.x, I have never seen a freeze ...
                    Thanks for the tip !

                    Comment


                    • Originally posted by Beherit View Post
                      Hardware bugs aren't necessarily triggered by all kernels.

                      Other than an undocumented hearsay of it happened to one ryzen user using netbsd, and Matt Dillon's (dragonflybsd) report, the bug is only triggered when using Linux. Still not a single Windows user reporting this.
                      You're wrong:
                      https://community.amd.com/message/2804636#2804636
                      > Yet windows does not trigger it?

                      As I reported several time, Windows Subsystem for Linux (WSL), so called "Bash on Ubuntu on Windows"
                      triggered this kind of problem (see my past report for the detail). WSL is the linux userland on WIndows kernel
                      (more precisely it consists of Linux emulation layer and NT kernel). And NetBSD triggered the very similar problem.

                      Comment

                      Working...
                      X