Announcement

Collapse
No announcement yet.

Ryzen-Test & Stress-Run Make It Easy To Cause Segmentation Faults On Zen CPUs

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bug77
    replied
    Originally posted by chuckula View Post

    Oh and turn off 7 of the cores!
    That was uncalled for. When trying to find a root cause, it's not unusual to take steps you'd never take under normal circumstances.

    Leave a comment:


  • DanielG
    replied
    Looks like FreeBSD and DragonflyBSD developers have debugged and fixed it in their kernels: https://svnweb.freebsd.org/base?view...evision=321899

    Might be worth mentioning in the newspost

    Leave a comment:


  • aufkrawall
    replied
    I think there are also reports that state that turning off ASLR doesn't workaround the issue completely.

    Leave a comment:


  • chuckula
    replied
    Originally posted by Chewi View Post
    Michael, did you not try disabling ASLR, which is what fixes this for most people? I haven't had a segfault or freeze in ages.
    I got a better idea: Instead of requiring Linux users to turn off an important security feature that apparently works with Intel chips, ARM chips, POWER chips, MIPs chips, all AMD chips other than RyZen, etc., why doesn't AMD figure out what's going on and fix their product.

    Leave a comment:


  • Chewi
    replied
    Michael, did you not try disabling ASLR, which is what fixes this for most people? I haven't had a segfault or freeze in ages.

    Leave a comment:


  • duby229
    replied
    I didn't actually look yet, but just from reading this article I get the impression that really all the ryzen test script is doing is running a few compiling jobs of gcc side by side. If that's right, then that's pretty easy to duplicate. Plus Michael was able to get PTS to show this issue by running multiple benchmarks side by side, That's also pretty easy to duplicate. I just can't imagine that AMD didn't find out about this by the time of the first tape out samples. It's seems like it's just too easy to hit. They must have known.

    It's not at all like the BD bug that affected it shortly after launch, that bug literally affected nobody ever. It could only be triggered in a very specific scenario. This Zen bug seems much more obvious than that one was.

    Leave a comment:


  • c2h5oh
    replied
    Originally posted by Michael View Post

    I will have more information out later today or tomorrow as well, running several hour long tests in different workload configurations... Now that I can reproduce super-easily via phoronix-test-suite stress-run, encourages me to run more tests whenever it's PTS automated, and being able to show off PTS stress-run capabilities since I don't often get to talk about it too much otherwise.
    One more for you to test: in bios set CPU voltage offset to +25mV. I've managed to crash my 1800X fairly consistently within 30 minutes when running large x264 encoding jobs, but this small voltage bump seems to have fixed it - It's been almost 40h of encoding and no issues. If you're running memory at speeds faster than 2666 make sure your SOC voltage is 1.1V (some bioses adjust that automatically, some don't)

    Leave a comment:


  • chuckula
    replied
    Originally posted by qsmcomp View Post
    Try disabling uop-cache from BIOS.
    Oh and turn off 7 of the cores!

    Leave a comment:


  • qsmcomp
    replied
    Try disabling uop-cache from BIOS.

    Leave a comment:


  • aufkrawall
    replied
    Mesa is compiled with Clang. But it also shows the issue, I randomly encountered it when I had a Ryzen R7 1700 and was compiling llvm several times.

    Leave a comment:

Working...
X