Announcement

Collapse
No announcement yet.

Some Ryzen Linux Users Are Facing Issues With Heavy Compilation Loads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    To add my personal anecdote: I ran into this a few times compiling a kernel on f25 while still at stock/auto on my Ryzen 5 1600, 2933mhz xmp. (corsair 16gb b3000c15, hynix chips) Subsequently ran memtest for 12h, no memory issues.
    Have actually run into it less often since OCing to 3.7GHz + 1.325v, but that may be due to my not testing extensively yet. Survived a preliminary Prime95 12h torture test, so I figure it's stable enough.
    Asrock AB350 pro4 motherboard, most recent stable bios, so no access to LLC settings.

    Comment


    • #42
      What are your ram timings?

      Comment


      • #43
        Well, the segfault is in sh in some standard library code related to strings. I've had sh segfaulting a couple of times when compiling mesa on my 1600X, but it's very spurious. Just compiling again usually worked then. The CPU is overclocked, but I don't think that's it since the segfaults are so consistent.

        My guess would be on something being wrong with the CPU caches, maybe a synchronization issue...

        Comment


        • #44
          Kind of surprised after this many victims, people are still finding ways to blame the victims. I haven't had these issues, but they seem pretty real and many of the people with them seem extremely competent and thorough in the steps they have taken to resolve and rule out every alternative to a motherboard/BIOS/CPU issue. I hope AMD finds a good resolution soon.

          I did have some stability issues with my Ryzen with early BIOS on my ASRock AB350 K4 motherboard. The nightly 17.04 build I was running would randomly but rarely hard freeze and have to be rebooted. Never determined if it was nvidia drivers, nightly build issues, Ryzen issues, stability, or what. Went away after a course of upgrading to 17.04 release and a new BIOS.

          My only stability issues since were self inflicted when overclocking. My seemingly stable [email protected] overclock was proven not-so-stable under heavy compilation and would shut down. After backing down to 3.85GHz the crashes while compiling went away, but I found a useful tool call `stress-ng` that was able to peg all 16 threads to such a degree that it revealed 3.85GHz also wasn't perfectly stable. I'm sure a little extra voltage would have made the difference, but I decided to try 3.8GHz first which ran `stress-ng` for many hours. That was weeks ago. I'll try 3.85GHz with some extra volts on next reboot. Just haven't had one since then.

          Comment


          • #45
            I don't have any ryzen setup, but I read about a reddit user who declared that disabling C-states from BIOS solved his compilation problems. He didn't provide any details about. This is the page:


            Maybe it's worth a try

            Comment


            • #46
              Originally posted by haagch View Post
              Well, the segfault is in sh in some standard library code related to strings. I've had sh segfaulting a couple of times when compiling mesa on my 1600X, but it's very spurious. Just compiling again usually worked then. The CPU is overclocked, but I don't think that's it since the segfaults are so consistent.
              After reading the AMD community, reddit and Gentoo forum posts, I'm more certain it's a software bug in either gcc (or any/some of its dependencies) or, most likely, certain bash builds.

              A pity the Google docs document doesn't list the distro used. May I ask which distro you're using? Because contrary to the phoronix article, I've yet to find a post by someone using something other than Gentoo. I find nothing on the Arch Linux forum about this.

              Those Gentoo users reporting to have fixed the problem, did so by upgrading GCC and then rebuilding world. One comment on reddit pointed out bash as specifically being the cause, claiming recompiling bash only is enough to fix this. If the bug indeed is a hardware bug or in GCC, recompiling a whole distro wouldn't even be possible.

              I also recall this post by Philip Müller, one of the core devs of Manjaro Linux, who's been building several releases of the distro for almost two months without reporting any gcc segfaults: https://forum.manjaro.org/t/red-is-the-new-sexy/21245


              (OT: Why is Gentoo still using gcc 4.x or 5.x?)

              Comment


              • #47
                There is also the "bea0000000000108" machine check error; causes random reboots even when idle:

                Comment


                • #48
                  I have tried gcc 4.9.3, 5.3.0, 6.3.0 nothing changed. I mean all system recompiled with that gcc version.

                  Comment


                  • #49
                    Originally posted by Sloth View Post

                    FWIW, I'm also using -march=znver1.

                    Pretty sure that these people are seeing problems with insufficient voltage, since some note that a change to LLC fixes the problem - and it wouldn't surprise me even a little bit if some of the BIOSes out there are don't have the voltages right just yet.
                    Could be CPU voltage, yep. Or maybe another voltage, like that of the IMC's or the RAM's.

                    For instance, I bought RAM rated at 1.35 V, but it was never stable at that voltage. I had random crashes that kept pointing at my soundcard's drivers. Remembered the low RAM voltage after getting frustrated by said crashes, raised it a notch (I think I set it to 1.400-420 V), and now my computer is stable. Note that HWInfo64 reads my RAM voltage at 1.360-376 V.

                    Could be a Vdroop problem, since it is one with my motherboard (Gigabyte GA-990XA-UD3 rev 1.0) that doesn't have LLC.

                    Comment


                    • #50
                      Originally posted by ernstp View Post
                      It's just unstable overclocking...
                      Doesn't have to be overclocking, the chip might just be unstable even at stock settings. If Loadline Calibration is mentioned as a 'fix', then it is likely that some voltages are simply too low under some circumstances - and the cause could be anything, the CPU itself, boards with underpowered VRM designs, BIOS issues, ...

                      Comment

                      Working...
                      X