Announcement

Collapse
No announcement yet.

Some Ryzen Linux Users Are Facing Issues With Heavy Compilation Loads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by atomsymbol

    I would be helpful to know the instruction mnemonic causing the SEGFAULT. Specifying -march=native can select xop and fma4 on certain pre-Ryzen CPUs, which are unsupported on Ryzen.
    That should raise a SIGILL (illegal instruction) instead. Also, Zen still does support FMA4 code according to Agner Fog, it's just not advertised any longer.

    Comment


    • #52
      Originally posted by Kano View Post
      But i really think that AMD more offen than Intel sells Banana ware. In this case problems with XMP RAM - and this CPU's main audience are gamers who buy those chips. But you can not say that the professional use with lots of VMs was tested better: what so you think about the VME-bug? Too much time spent to search competitive gaming benchmarks...
      I don't really think it's fair to say that. Look at Intel's recent history. There have been LOADS of issues.

      Haswell TSX broken: http://www.anandtech.com/show/8376/i...eep-broadwelly
      Skylake stability issues: https://www.extremetech.com/computin...plex-workloads
      Several chipset issues: (1) http://www.anandtech.com/show/4142/i...-begins-recall (2) https://www.cnet.com/news/intel-conf...swell-chipset/
      Atom SoCs degrading/breaking: https://www.extremetech.com/computin...-manufacturers
      And of course, the AMT bug: https://www.wired.com/2017/05/hack-b...-7-dang-years/

      Comment


      • #53
        Let's clear up the air here.

        1. This happens with factory reset BIOS defaults. Please try to maintain your cool about overclocking.
        2. GCC versions and compiler flags are already extensively iterated on gentoo forums.
        3. Changing LLC/Disabling SMT only created a temporary illusion of a fix.
        4. Disabling C6 state did not help.
        5. Compilation issues have been reported from several distribtions: Ubuntu, CentOS and gentoo.

        Comment


        • #54
          Originally posted by Beherit View Post

          A pity the Google docs document doesn't list the distro used. May I ask which distro you're using? Because contrary to the phoronix article, I've yet to find a post by someone using something other than Gentoo. I find nothing on the Arch Linux forum about this.
          The google doc (generated from a google form) is for ... shock horror gentoo users across about 3 different gentoo threads. Hence why it doesn't list other distro's...

          Arch and Gentoo users are more likely to build packages that are specific to their system. Binary distro's cannot and will generally use -march=x86_64 so it can run on an intel or an amd system.

          My gentoo install will not execute on an i7, it will not execute on a bulldozer
          My work ubuntu install could have the hdd just removed and plugged into a completely different machine and just boot.


          Originally posted by Beherit View Post
          (OT: Why is Gentoo still using gcc 4.x or 5.x?)
          Gentoo doesn't, users choose: Gentoo offers alot of GCC versions and only masks for removal or keyword masks or archmasked based upon bugs.


          Code:
          eix sys-devel/gcc
          [I] sys-devel/gcc
               Available versions:  
               (2.95.3) [M]~*2.95.3-r10
               (3.3.6) [M](~)3.3.6-r1
               (3.4.6) [M]3.4.6-r2
               (4.0.4) [M]**4.0.4
               (4.1.2) [M]4.1.2
               (4.2.4) [M](~)4.2.4-r1
               (4.3.6) [M]4.3.6-r1
               (4.4.7) [M]4.4.7
               (4.5.4) [M]4.5.4
               (4.6.4) [M]4.6.4
               (4.7.4) [M]4.7.4
               (4.8.5) [M]4.8.5
               (4.9.3) 4.9.3
               (4.9.4) 4.9.4
               (5.4.0) (~)5.4.0^s 5.4.0-r3
               (6.3.0) (~)6.3.0
               (7.1.0) (**)7.1.0-r1
                 {altivec awt boundschecking cilk +cxx d debug doc fixed-point +fortran gcj go graphite hardened jit libssp mpx mudflap multilib +nls nopie nossp +nptl objc objc++ objc-gc +openmp +pch +pie regression-test +sanitize +ssp vanilla +vtv}
               Installed versions:  6.3.0(6.3.0)^s(14:49:33 07/05/17)(cxx fortran graphite multilib nls nptl openmp pch pie sanitize ssp vtv -altivec -awt -cilk -debug -doc -fixed-point -gcj -go -hardened -jit -libssp -mpx -objc -objc++ -objc-gc -regression-test -vanilla) 7.1.0-r1(7.1.0)^s(08:39:13 28/05/17)(cxx fortran multilib nls nptl openmp pch sanitize ssp vtv -altivec -awt -cilk -debug -doc -fixed-point -gcj -go -graphite -hardened -jit -libssp -mpx -objc -objc++ -objc-gc -pie -regression-test -vanilla)
               Homepage:            https://gcc.gnu.org/
               Description:         The GNU Compiler Collection


          So consider how Ubuntu compiles its binaries
          Code:
          echo | gcc -### -E - -march=x86_64
          Using built-in specs.
          COLLECT_GCC=/usr/x86_64-pc-linux-gnu/gcc-bin/7.1.0/gcc
          Target: x86_64-pc-linux-gnu
          Configured with: /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/gcc-7.1.0/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/7.1.0 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/7.1.0 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/7.1.0/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/7.1.0/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/include/g++-v7 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/7.1.0/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 7.1.0-r1 p1.1' --disable-esp --enable-libstdcxx-time --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --disable-altivec --disable-fixed-point --enable-targets=all --disable-libgcj --enable-libgomp --disable-libmudflap --disable-libssp --disable-libcilkrts --disable-libmpx --enable-vtable-verify --enable-libvtv --enable-lto --without-isl --enable-libsanitizer --disable-default-pie --enable-default-ssp
          Thread model: posix
          gcc version 7.1.0 (Gentoo 7.1.0-r1 p1.1)
          COLLECT_GCC_OPTIONS='-E' '-march=x86_64'
           /usr/libexec/gcc/x86_64-pc-linux-gnu/7.1.0/cc1 -E -quiet - "-march=x86_64"
          COMPILER_PATH=/usr/libexec/gcc/x86_64-pc-linux-gnu/7.1.0/:/usr/libexec/gcc/x86_64-pc-linux-gnu/7.1.0/:/usr/libexec/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/:/usr/lib/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/../../../../x86_64-pc-linux-gnu/bin/
          LIBRARY_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/:/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/../../../../x86_64-pc-linux-gnu/lib/:/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/../../../:/lib/:/usr/lib/
          COLLECT_GCC_OPTIONS='-E' '-march=x86_64'

          Now consider how a Gentoo/Arch user attempting to make use of Ryzen specific benefits:
          echo | gcc -### -E - -march=native
          Using built-in specs.
          COLLECT_GCC=/usr/x86_64-pc-linux-gnu/gcc-bin/7.1.0/gcc
          Target: x86_64-pc-linux-gnu
          Configured with: /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/gcc-7.1.0/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/7.1.0 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/7.1.0 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/7.1.0/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/7.1.0/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/include/g++-v7 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/7.1.0/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 7.1.0-r1 p1.1' --disable-esp --enable-libstdcxx-time --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --disable-altivec --disable-fixed-point --enable-targets=all --disable-libgcj --enable-libgomp --disable-libmudflap --disable-libssp --disable-libcilkrts --disable-libmpx --enable-vtable-verify --enable-libvtv --enable-lto --without-isl --enable-libsanitizer --disable-default-pie --enable-default-ssp
          Thread model: posix
          gcc version 7.1.0 (Gentoo 7.1.0-r1 p1.1)
          COLLECT_GCC_OPTIONS='-E' '-march=native'
          /usr/libexec/gcc/x86_64-pc-linux-gnu/7.1.0/cc1 -E -quiet - "-march=znver1" -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mmovbe -maes -msha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mclflushopt -mxsavec -mxsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mmwaitx -mclzero -mno-pku -mno-rdpid --param "l1-cache-size=32" --param "l1-cache-line-size=64" --param "l2-cache-size=512" "-mtune=znver1"
          COMPILER_PATH=/usr/libexec/gcc/x86_64-pc-linux-gnu/7.1.0/:/usr/libexec/gcc/x86_64-pc-linux-gnu/7.1.0/:/usr/libexec/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/:/usr/lib/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/../../../../x86_64-pc-linux-gnu/bin/
          LIBRARY_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/:/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/../../../../x86_64-pc-linux-gnu/lib/:/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/../../../:/lib/:/usr/lib/
          COLLECT_GCC_OPTIONS='-E' '-march=native'

          All those additional GCC flags. One, some or all maybe buggy. They maybe buggy in GCC, they maybe buggy in the OPCODE, there maybe a bug in the chip itself.

          There is already a bug/non-optimum gcc issue with gcc-6.x ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80313) Likewise as reported here gcc generated non-optimal code as well (https://www.phoronix.com/scan.php?pa...iciency-Fix-67)

          This swings it to a gcc issue and gcc-7.x is getting very good for Ryzen
          Last edited by Naib; 03 June 2017, 06:58 AM.

          Comment


          • #55
            Originally posted by Beherit View Post
            A pity the Google docs document doesn't list the distro used. May I ask which distro you're using? Because contrary to the phoronix article, I've yet to find a post by someone using something other than Gentoo. I find nothing on the Arch Linux forum about this.
            I'm on Arch with a lot of testing stuff but I haven't seen the segfaults recently. Possibly it's already fixed.

            I just had a gcc segfault, but I think it was a "standard" gcc segfault, not one caused by ryzen:
            Code:
            /usr/include/c++/7.1.1/bits/stl_tree.h:1769:31: internal compiler error: Segmentation fault

            Comment


            • #56
              Originally posted by Beherit View Post
              After reading the AMD community, reddit and Gentoo forum posts, I'm more certain it's a software bug in either gcc (or any/some of its dependencies) or, most likely, certain bash builds.

              A pity the Google docs document doesn't list the distro used. May I ask which distro you're using? Because contrary to the phoronix article, I've yet to find a post by someone using something other than Gentoo. I find nothing on the Arch Linux forum about this.

              Those Gentoo users reporting to have fixed the problem, did so by upgrading GCC and then rebuilding world. One comment on reddit pointed out bash as specifically being the cause, claiming recompiling bash only is enough to fix this. If the bug indeed is a hardware bug or in GCC, recompiling a whole distro wouldn't even be possible.

              I also recall this post by Philip Müller, one of the core devs of Manjaro Linux, who's been building several releases of the distro for almost two months without reporting any gcc segfaults: https://forum.manjaro.org/t/red-is-the-new-sexy/21245


              (OT: Why is Gentoo still using gcc 4.x or 5.x?)
              I ran into the same issue on fedora 25 icw gcc 6.3.1 on Qubes.

              Comment


              • #57
                Having random reboot issues with Ryzen 7 1700 when XFR is enabled.
                Disabling Turbo, disabling C6, manually set frequency at 3.2GHz, enabling LLC and increasing core voltage to 1.25v seems to help workaround the issue.
                I have been using "sensors" command with https://github.com/groeck/nct6775 driver to watch the voltage of CPU core on my MSI B350M Mortar motherboard.
                With default BIOS settings the core voltage sometimes goes up to 1.35v but for mostly it is running in 1.09V. Nothing happened.
                With some random changes with BIOS settings the core voltage sticks below 1.19v and never goes up to 1.20v. The problem occurs.

                Comment


                • #58
                  I'm also a Gentoo user.
                  I bought the Ryzen when it came out, and have been running it since. I've probably compiled for a combined 24 hours in this few months, so it has been quite heavily loaded.
                  I've seen crashes with early BIOS versions, but since over a month I haven't had a crash that wasn't caused by excessive overclocking. Nor do I see any other problems anymore.

                  Ryzen 1800X @ 3.9 GHz overclock, stock voltage
                  ASUS Prime X370 Pro
                  Kingston 2133 MT ECC Ram running at 2666 MT currently (only possible with AGESA 1.0.0.6)

                  gcc 6.3.0
                  CFLAGS_DEFAULT="-O2 -pipe -march=znver1"
                  MAKEOPTS="-j16 -l16"

                  So I can't confirm any of the issues. When I'm raising the clock to 3.95 GHz I'm seeing uops-cache ECC failures. When I raise ram clock beyond 1333 MHz, I'm seeing RAM ECC errors.
                  Crashes under heavy load appear at 4 GHz, halts due to excessive ECC failures appear under heavy load at abote 2666 MT.

                  Clearly doesn't seem to be a general problem, though hardware may be faulty for some people. Not for me.

                  Comment


                  • #59
                    As of today, I'm one step closed towards obtaining my black belt in googleizing:

                    FreeBSD system panic after 14 hours of compiling: https://bugs.freebsd.org/bugzilla/sh....cgi?id=219399
                    DragonBSD developer Matt Dillon wrote a workaround for a hardware bug in Ryzen: http://gitweb.dragonflybsd.org/drago...d301557fd9ac20

                    Still trying to find reports from Windows/Visual Studio developers.

                    Comment


                    • #60
                      Since switching to AGESA 1.0.0.6 this morning, I've already had one freeze under a "Generic-x86-64" kernel despite that optimisation level working for 14 hours on 1.0.0.4 yesterday. Either I was lucky yesterday or I'm battling multiple issues. I've just switched to GCC 7.1 in the hope that helps but I can't see how it would if you're not using -march. I'm only using it to build the kernel. You can do that by passing CC=gcc-7.1.0.

                      What I find interesting is that I haven't seen any segfaults. Every incident has been an entire system lockup. I've only rebuilt a small handful of packages and most of my system is still built against -march=nehalem from my old system.

                      I have been running the netconsole kernel module to send console messages over UDP. I usually do get some output following the freeze but there doesn't appear to be any pattern to the stack traces. It seems like something quite fundamental is failing.

                      I did try disabling XMP early on but it didn't help. I'll try it again now that I've updated the BIOS. I haven't tweaked any other BIOS settings but I'll look into that. I could also try running Fedora off a stick for a while to see how long that holds up and I might even try borrowing a Fedora kernel for my Gentoo system.
                      Last edited by Chewi; 03 June 2017, 08:34 AM.

                      Comment

                      Working...
                      X