Announcement

Collapse
No announcement yet.

Ryzen-Test & Stress-Run Make It Easy To Cause Segmentation Faults On Zen CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by chuckula View Post

    I got a better idea: Instead of requiring Linux users to turn off an important security feature that apparently works with Intel chips, ARM chips, POWER chips, MIPs chips, all AMD chips other than RyZen, etc., why doesn't AMD figure out what's going on and fix their product.
    I quite agree but I believe AMD are already aware of the issue so what else can I suggest?

    Comment


    • #22
      Originally posted by aufkrawall View Post
      I think there are also reports that state that turning off ASLR doesn't workaround the issue completely.
      Well it's certainly worked for me. I'm running Gentoo where the problem was more apparent and it's running fine now.

      I had an unrelated freezing issue that may have been Ryzen-specific but I worked around that by toggling a kernel option.

      Comment


      • #23
        Originally posted by Chewi View Post

        Well it's certainly worked for me. I'm running Gentoo where the problem was more apparent and it's running fine now.

        I had an unrelated freezing issue that may have been Ryzen-specific but I worked around that by toggling a kernel option.
        Which kernel option was that?

        Comment


        • #24
          This all sounds quite familiar. I had the same kind of issue with my i7-6700K for months after I first got it. It would run fine under normal workloads. It would even run fine under prime95 for hours under any prime95 test. But compiling Wine (which I do often) and other software would more often than not randomly fail when running make with -j9. The only way I could get the thing to compile reliably was to make sure I only compiled with a single thread.

          Eventually something (possibly a new kernel?) fixed the issue, but I don't remember ever finding out what the fix was. I just remember many months later being annoyed at how slow the compilation process was, and was going to investigate the issue again so cranked up the number of make compile processes back to nproc + 1, when I discovered that I could no longer reproduce the failure.

          Seems to be a thing these days that you take a bit of a risk when running the latest CPUs. At least this time the issue sounds far more wildly reported than the one I experienced, so it'll probably be fixed more quickly.

          Comment


          • #25
            There were bugs in Skylake as well, but they got fixed via microcode updates (both intel-ucode and mainboard uefi).
            There is no confirmation by AMD for this specific bug (even though it's known to them for months) nor really anything else. There's just users with the problem having to mess around with workarounds.

            Comment


            • #26
              Originally posted by c2h5oh View Post
              One more for you to test: in bios set CPU voltage offset to +25mV. I've managed to crash my 1800X fairly consistently within 30 minutes when running large x264 encoding jobs, but this small voltage bump seems to have fixed it - It's been almost 40h of encoding and no issues. If you're running memory at speeds faster than 2666 make sure your SOC voltage is 1.1V (some bioses adjust that automatically, some don't)
              I agree these are good things to test. Ryzen seems a bit finicky when it comes to voltage, especially once you get to 3.9GHz (which makes sense with the 1800X's XFR speeds).
              I also agree that double-checking the voltages is important. My motherboard's BIOS keeps my RAM voltage at 1.2v, even though it's supposed to be 1.3v. This alone makes the RAM very unstable.

              Comment


              • #27
                Originally posted by chuckula View Post
                I got a better idea: Instead of requiring Linux users to turn off an important security feature that apparently works with Intel chips, ARM chips, POWER chips, MIPs chips, all AMD chips other than RyZen, etc., why doesn't AMD figure out what's going on and fix their product.
                You do realize that he didn't actually offer that as a solution? There's a difference between doing something to debug a problem and doing that to actually solve the problem.

                Comment


                • #28
                  Since this affects only a small number of users, and the real microcode fix probably causes huge performance lag, I doubt AMD will fix this. The problem is likely a real bug in the silicon, somewhere, in some operation that rarely happens.

                  I'll wait for v2 of the silicon. Thread-ripper has the exact same modules, right? Maybe.

                  Comment


                  • #29
                    @Michael: I understand that you have more than one Ryzen chip at your disposal, is there any chance you can run the stress test on other chips too?

                    Comment


                    • #30
                      Originally posted by debianxfce View Post

                      Can you reproduce the problem with win10?
                      Probably. Can anyone that runs GCC on win10 tests this? I don't have Ryzen and don't run GCC on windows, so I'm not the one that is going to do this.

                      Comment

                      Working...
                      X