Announcement

Collapse
No announcement yet.

Ryzen-Test & Stress-Run Make It Easy To Cause Segmentation Faults On Zen CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by juno View Post
    garegin it does. It has been observed with test tools as well as "productive" MSVC workloads.
    Can you please paste a link to the report? I've search MSDN forums ever since this was first reported with nothing on this. All I've seen are rumours on the Gentoo forums not saying much else but "a friend of my friend's brother saw a post somewhere he can't remember, that..."

    Originally posted by chithanh View Post

    The problem does happen in Windows Subsystem for Linux (WSL). This was confirmed in the AMD community forums and in the Gentoo forums.
    Surely, you must be aware that's because of the Linux core in WSL and nothing with Microsoft? Fair enough, I'll be more specific: Please point me to an article which shows how to reproduce this bug in Windows, and that's Windows only, not any emulated environment or compatibility layer.

    Comment


    • #62
      Originally posted by debianxfce View Post
      That proves more that problem is in the open source software.
      In the open source software? Which exactly?
      Is it in the Linux kernel? But WSL/FreeBSD/DragonFlyBSD do not use the Linux kernel.
      Is it in glibc? FreeBSD/DragonFlyBSD use different libc.
      Is it in gcc? clang crashes too.

      Originally posted by Beherit View Post
      Surely, you must be aware that's because of the Linux core in WSL and nothing with Microsoft? Fair enough, I'll be more specific: Please point me to an article which shows how to reproduce this bug in Windows, and that's Windows only, not any emulated environment or compatibility layer.
      WSL is not an emulated environment or compatibility layer. It is a subsystem of the Windows kernel which implements the Linux ABI and which was written entirely by Microsoft. (Its origins are reportedly the ill-fated "Project Astoria" which implemented Android app compatibility for Windows Mobile)

      Comment


      • #63
        Thank you Michael for your work. Phoronix was our only hope in this issue - I am also affected with a 1700X and I am in the process of RMA.

        @chithanh: debianxfce is a known troll in phoronix forums, don't feed it. Just ignore him.

        @Beherit: Failing with mingw-w64 under Windows will be OK for you or you want MSVC workload? I am in the process of testing the build of ffmpeg under Windows with both mingw-w64 and msvc.

        Comment


        • #64
          Originally posted by chithanh View Post
          It is a subsystem of the Windows kernel which implements the Linux ABI
          And it's also a compatibility layer.

          Originally posted by https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux
          Windows Subsystem for Linux (WSL) is a compatibility layer for running Linux binary executables (in ELFformat) natively on Windows 10.
          See?

          Originally posted by https://en.wikipedia.org/wiki/Compatibility_layer
          In software engineering, a compatibility layer is an interface that allows binaries for a legacy or foreign system to run on a host system.
          You'll also find WSL listed as an example of a compatibility layers a bit further down the article.

          Still not convinced?

          Originally posted by https://blogs.msdn.microsoft.com/wsl/2016/04/22/windows-subsystem-for-linux-overview/
          Pico processes and drivers provide the foundation for the Windows Subsystem for Linux, which runs native unmodified Linux binaries by loading executable ELF binaries into a Pico process’s address space and executes them atop a Linux-compatible layer of syscalls.
          I understand what you're trying to say, that closed source is to blame for this, as the bug appears in something fully developed in-house by Microsoft. But whether you're right or wrong has little to do with the question to why this bug hasn't been reported by those Ryzen users who use native Windows compilers. I've read confirmed reports by those using Linux, FreeBSD, gcc, clang.. but none from MSVC devs (and what other Windows compilers are used these days).

          (I'm also curious if someone managed to run a Ryzen based Hackintosh, if segfaults occur in macOS as well)

          Comment


          • #65
            Here are my (prelimininary) results:

            phoronix stress run

            PTS_CONCURRENT_TEST_RUNS=4 TOTAL_LOOP_TIME=60 phoronix-test-suite stress-run build-linux-kernel build-php pgbench redis


            My system seems to be quite stable, so far. The only crashes I find via dmesg are a bunch from php's configure script and I'm not sure if they are supposed to crash. But I'm typing this message on a system load of 50 and spotify plays without any hiccups.

            kill-ryzen.sh
            It's running for a little while now without any problems so far.

            Edit: No problem after 1.5h++

            Would you, who get an unstable system, maybe try my kernel? I'm using the official kernel from AMD's ROCm repo, albeit on ArchLinux:

            AMD ROCm™ Software - GitHub Home. Contribute to ROCm/ROCm development by creating an account on GitHub.


            It's maybe worth mentioning, that I had to manually enter my memory's timings to get it running faster than stock speeds on ASUS PRIME B350M-A. AFAIR, I had to increase the voltage to get it working. But higher than some 2666 MHz wasn't possible.

            Overview:
            Board ASUS PRIME B350M-A
            Memory CMK32GX4M2B3200C16, two times
            CPU AMD Ryzen 7 1800X Eight-Core Processor
            microcode 0x800111c
            kernel 4.11.0-kfd-compute-rocm-rel-1.6-115
            Last edited by oleid; 05 August 2017, 04:52 AM. Reason: Final kill-ryzen result

            Comment


            • #66
              Does this fail with the low-end Ryzens as well? They should leave more room for adequate power supply etc.

              Comment


              • #67
                Originally posted by Beherit View Post
                And it's also a compatibility layer.

                See?

                You'll also find WSL listed as an example of a compatibility layers a bit further down the article.

                Still not convinced?
                I do think that the term "compatibility layer" is used in an imprecise fashion in the Wikipedia articles.
                The Windows Subsystem for Linux is no more a compatibility layer than the Win32 subsystem, or the now-defunct OS/2 and POSIX (Interix/SUA) subsystems.


                Originally posted by GreatEmerald View Post
                Does this fail with the low-end Ryzens as well? They should leave more room for adequate power supply etc.
                The low-end Ryzens do not support SMT, so are mostly not affected.

                Comment


                • #68
                  It would seem as if kill-ryzen doesn't kill my ryzen at all (see above). I guess it must be either memory timings or AMD's kernel.

                  Michael : Would you maybe try AMD's ROCm kernel?

                  Comment


                  • #69
                    Originally posted by GreatEmerald View Post
                    Does this fail with the low-end Ryzens as well? They should leave more room for adequate power supply etc.
                    Yes, it does. I've tested three Ryzen machines (8 cores, 6 cores, 4 cores), and all of them show the same issue (namely: a segfault during parallel compilation). One of the machines (the 8-core) was ordered/assembled/installed by me, the other two were computers at our office, ordered/assembled/installed by our office IT (and certainly are not, and were never overclocked).

                    That's also why I'm convinced that a significant number of Ryzen CPUs is affected by it...

                    Comment


                    • #70
                      1800x, crosshair vi hero, 16G corsair (manually set timings and voltage in uefi), 4.12.4-1-ARCH-x86_64, microcode: 0x8001126.
                      Ran the ryzen test for just over 30mins with no crash.

                      Comment

                      Working...
                      X