Announcement

Collapse
No announcement yet.

Ryzen-Test & Stress-Run Make It Easy To Cause Segmentation Faults On Zen CPUs

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • soulsource
    replied
    Originally posted by storma View Post
    1800x, crosshair vi hero, 16G corsair (manually set timings and voltage in uefi), 4.12.4-1-ARCH-x86_64, microcode: 0x8001126.
    Ran the ryzen test for just over 30mins with no crash.
    Not seeing a segfault for 30 minutes is by far not enough to conclude that the system is not showing the problem. From my experience the average time until the first segfault occurs is somewhere around 2 hours. Sometimes much less, sometimes much more, also on the same hardware.
    I wouldn't rule out that a system is affected by the segfault bug, unless it has been running continuous stress testing for at least 48 hours.

    Leave a comment:


  • vein
    replied
    R5 1600X, MSI B350 Tomahawk, 16G Corsair, 4.12.4-1-ARCH-x86_64, microcode: 0x0800111c

    Seg fault after about 2 hours
    Last edited by vein; 08-05-2017, 07:09 AM.

    Leave a comment:


  • storma
    replied
    1800x, crosshair vi hero, 16G corsair (manually set timings and voltage in uefi), 4.12.4-1-ARCH-x86_64, microcode: 0x8001126.
    Ran the ryzen test for just over 30mins with no crash.

    Leave a comment:


  • soulsource
    replied
    Originally posted by GreatEmerald View Post
    Does this fail with the low-end Ryzens as well? They should leave more room for adequate power supply etc.
    Yes, it does. I've tested three Ryzen machines (8 cores, 6 cores, 4 cores), and all of them show the same issue (namely: a segfault during parallel compilation). One of the machines (the 8-core) was ordered/assembled/installed by me, the other two were computers at our office, ordered/assembled/installed by our office IT (and certainly are not, and were never overclocked).

    That's also why I'm convinced that a significant number of Ryzen CPUs is affected by it...

    Leave a comment:


  • oleid
    replied
    It would seem as if kill-ryzen doesn't kill my ryzen at all (see above). I guess it must be either memory timings or AMD's kernel.

    Michael : Would you maybe try AMD's ROCm kernel?

    Leave a comment:


  • chithanh
    replied
    Originally posted by Beherit View Post
    And it's also a compatibility layer.

    See?

    You'll also find WSL listed as an example of a compatibility layers a bit further down the article.

    Still not convinced?
    I do think that the term "compatibility layer" is used in an imprecise fashion in the Wikipedia articles.
    The Windows Subsystem for Linux is no more a compatibility layer than the Win32 subsystem, or the now-defunct OS/2 and POSIX (Interix/SUA) subsystems.


    Originally posted by GreatEmerald View Post
    Does this fail with the low-end Ryzens as well? They should leave more room for adequate power supply etc.
    The low-end Ryzens do not support SMT, so are mostly not affected.

    Leave a comment:


  • GreatEmerald
    replied
    Does this fail with the low-end Ryzens as well? They should leave more room for adequate power supply etc.

    Leave a comment:


  • oleid
    replied
    Here are my (prelimininary) results:

    phoronix stress run

    PTS_CONCURRENT_TEST_RUNS=4 TOTAL_LOOP_TIME=60 phoronix-test-suite stress-run build-linux-kernel build-php pgbench redis


    My system seems to be quite stable, so far. The only crashes I find via dmesg are a bunch from php's configure script and I'm not sure if they are supposed to crash. But I'm typing this message on a system load of 50 and spotify plays without any hiccups.

    kill-ryzen.sh
    It's running for a little while now without any problems so far.

    Edit: No problem after 1.5h++

    Would you, who get an unstable system, maybe try my kernel? I'm using the official kernel from AMD's ROCm repo, albeit on ArchLinux:

    https://github.com/RadeonOpenCompute/ROCm

    It's maybe worth mentioning, that I had to manually enter my memory's timings to get it running faster than stock speeds on ASUS PRIME B350M-A. AFAIR, I had to increase the voltage to get it working. But higher than some 2666 MHz wasn't possible.

    Overview:
    Board ASUS PRIME B350M-A
    Memory CMK32GX4M2B3200C16, two times
    CPU AMD Ryzen 7 1800X Eight-Core Processor
    microcode 0x800111c
    kernel 4.11.0-kfd-compute-rocm-rel-1.6-115
    Last edited by oleid; 08-05-2017, 04:52 AM. Reason: Final kill-ryzen result

    Leave a comment:


  • Beherit
    replied
    Originally posted by chithanh View Post
    It is a subsystem of the Windows kernel which implements the Linux ABI
    And it's also a compatibility layer.

    Originally posted by https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux
    Windows Subsystem for Linux (WSL) is a compatibility layer for running Linux binary executables (in ELFformat) natively on Windows 10.
    See?

    Originally posted by https://en.wikipedia.org/wiki/Compatibility_layer
    In software engineering, a compatibility layer is an interface that allows binaries for a legacy or foreign system to run on a host system.
    You'll also find WSL listed as an example of a compatibility layers a bit further down the article.

    Still not convinced?

    Originally posted by https://blogs.msdn.microsoft.com/wsl/2016/04/22/windows-subsystem-for-linux-overview/
    Pico processes and drivers provide the foundation for the Windows Subsystem for Linux, which runs native unmodified Linux binaries by loading executable ELF binaries into a Pico process’s address space and executes them atop a Linux-compatible layer of syscalls.
    I understand what you're trying to say, that closed source is to blame for this, as the bug appears in something fully developed in-house by Microsoft. But whether you're right or wrong has little to do with the question to why this bug hasn't been reported by those Ryzen users who use native Windows compilers. I've read confirmed reports by those using Linux, FreeBSD, gcc, clang.. but none from MSVC devs (and what other Windows compilers are used these days).

    (I'm also curious if someone managed to run a Ryzen based Hackintosh, if segfaults occur in macOS as well)

    Leave a comment:


  • malakudi
    replied
    Thank you Michael for your work. Phoronix was our only hope in this issue - I am also affected with a 1700X and I am in the process of RMA.

    @chithanh: debianxfce is a known troll in phoronix forums, don't feed it. Just ignore him.

    @Beherit: Failing with mingw-w64 under Windows will be OK for you or you want MSVC workload? I am in the process of testing the build of ffmpeg under Windows with both mingw-w64 and msvc.

    Leave a comment:

Working...
X