Announcement

Collapse
No announcement yet.

Some Ryzen Linux Users Are Facing Issues With Heavy Compilation Loads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #71
    Originally posted by Chewi View Post
    Mine freezes while doing practically nothing at all.
    Sounds like the issues I had with my 1700 with early BIOS versions. It would survive torture tests, but then would randomly freeze, often while doing nothing. Hours of gaming and no freeze, then freeze on the desktop a few minutes later. After re-installing ubuntu and updating to a 1.0.0.4a series BIOS it's been up for weeks without issue at 3.8GHz and thoroughly stress tested. My motherboard (ASRock AB350 K4) hasn't released a 1.0.0.6 BIOS yet. Hoping problems don't return for me when they do as I'd like to update in hopes of better RAM speed/timings.

    You said you were going to try disabling XMP. Have you done that yet? My best guess from reading through lots of the Gentoo issues is there are still issues with the memory controller and certain DDR4 modules. If relaxing the RAM speed/timings solves your problem it may be somewhat related.

    Comment


    • #72
      Originally posted by leipero View Post

      I would assume that every chip have "hardware bugs", it's fixes are implemented in BIOS/microcode. I seriously doubt that is a hardware bug, no matter how non-deterministc it might be, it would happen for every user, and that is clearly not the case, so It must be software bug (BIOS or higher level software).
      Unless the hardware bug is compatibility between two pieces of hardware. One of the possibilities is some level of incompatibility/instability the Ryzen memory controller has with certain DDR4 modules. Issues with certain DDR4 modules would obviously be limited to certain users, and RAM issues are often manifested as intermittent issues. The line between memory controller hardware and software, and how much of this behavior can be corrected/adjusted through BIOS/opcode/firmware revisions is something only AMD and maybe some partners know. If the issue is caused by something like that AMD has shown the ability to impact RAM compatibility/stability already with BIOS updates. Hopefully they'll be able to rectify this as well.

      Comment


      • #73
        Originally posted by existensil View Post

        Unless the hardware bug is compatibility between two pieces of hardware. One of the possibilities is some level of incompatibility/instability the Ryzen memory controller has with certain DDR4 modules. Issues with certain DDR4 modules would obviously be limited to certain users, and RAM issues are often manifested as intermittent issues. The line between memory controller hardware and software, and how much of this behavior can be corrected/adjusted through BIOS/opcode/firmware revisions is something only AMD and maybe some partners know. If the issue is caused by something like that AMD has shown the ability to impact RAM compatibility/stability already with BIOS updates. Hopefully they'll be able to rectify this as well.
        Yah, but all compatibility with RAM modules is done via software (BIOS), the base functionality stays in the CPU-MC but everything else is offloaded to BIOS/EFI.

        Comment


        • #74
          Originally posted by existensil View Post
          Sounds like the issues I had with my 1700 with early BIOS versions. It would survive torture tests, but then would randomly freeze, often while doing nothing. Hours of gaming and no freeze, then freeze on the desktop a few minutes later. After re-installing ubuntu and updating to a 1.0.0.4a series BIOS it's been up for weeks without issue at 3.8GHz and thoroughly stress tested. My motherboard (ASRock AB350 K4) hasn't released a 1.0.0.6 BIOS yet. Hoping problems don't return for me when they do as I'd like to update in hopes of better RAM speed/timings.
          I've been on 1.0.0.4a from the start and 1.0.0.6 since yesterday.

          Originally posted by existensil View Post
          You said you were going to try disabling XMP. Have you done that yet? My best guess from reading through lots of the Gentoo issues is there are still issues with the memory controller and certain DDR4 modules. If relaxing the RAM speed/timings solves your problem it may be somewhat related.
          I did disable XMP once briefly and it still froze but I was using amd-staging at the time to get HDMI audio and that has its own issues. I'm using a vanilla kernel now so I could try disabling XMP again. My RAM was in the official compatibility list for my board though.

          I have booted into Fedora 26 Alpha off a USB stick and chrooted into my Gentoo system to make it a bit less annoying. It's been up 12 hours so far. I'll give it about another 12 hours, which would be the longest it's been up by far. If it survives that, I'll put this Fedora kernel on my Gentoo system.

          It sounds bizarre but I'm starting to wonder if simply building software on Ryzen produces faulty code. That would explain why so many Gentoo users are having trouble. Other distro packages are probably not built on Ryzen.

          Comment


          • #75
            I've got new MCE from my underclocked Ryzen 7 1700.

            Comment


            • #76
              Originally posted by tholin View Post
              A compile job forks a lot of processes each with their own layout. Try compiling without ASLR. "echo 0 > /proc/sys/kernel/randomize_va_space".
              ops... it seems to do the trick for me

              yesterday I left this machine compiling continuously gcc in a shell and mesa in another;

              with randomize_va_space set to 1 or 2, mesa compilation fails quite always with the usual segfault in bash or in libc; with randomize_va_space set to 0, during the test, gcc was compiled 8 times and mesa 78 times with no segfault or whatever.

              I don't know what to say: "good!" or "arggg!".

              Comment


              • #77
                Originally posted by tholin View Post

                If there is a hardware bug depending on specific address in user address space it would make sense that compile jobs triggers it. Linux use address space layout randomization to put memory segments on different addresses on each run. A compile job forks a lot of processes each with their own layout. Try compiling without ASLR. "echo 0 > /proc/sys/kernel/randomize_va_space".
                Thank you for the tip. ;-)
                Just tried and run fine now without segfault when I build my cross environment
                "make --jobs=16 MXE_TARGETS='x86_64-w64-mingw32.static i686-w64-mingw32.static' qt5" on my Debian Sid.
                Before I saw a lot of "segfault at 10 ip 0000000000000010 sp 00007ffcdbc8df58 error 14 in cc1plus"

                Comment


                • #78
                  Thanks for confirming the ASLR workaround. I don't think any distros disable it by default though so this isn't unique to Gentoo.

                  Comment


                  • #79
                    Originally posted by Chewi View Post
                    Thanks for confirming the ASLR workaround. I don't think any distros disable it by default though so this isn't unique to Gentoo.
                    Very few people outside of Gentoo compile so much, so chances are much lower.
                    ## VGA ##
                    AMD: X1950XTX, HD3870, HD5870
                    Intel: GMA45, HD3000 (Core i5 2500K)

                    Comment


                    • #80
                      Originally posted by dillon View Post
                      Hi, Matt Dillon here. Yes, I did find what I believe to be a hardware issue with Ryzen related to concurrent operations.
                      Thank you for the extensive clarification on this.

                      Originally posted by dillon View Post
                      The problem occurs more often with high %rip addresses such as near the top of the user stack, which is where DragonFly's signal trampoline traditionally resides.
                      Is the trigger OS specific, as there doesn't seem to be any Visual Studio devs using Windows reporting this (yet)?

                      Originally posted by dillon View Post
                      Only IRETQ seems to trigger it in the manner described above, which means that AMD can probably fix it with a microcode update.
                      I certainly hope so, as I'm planning on buying a Ryzen system and this made me postpone my planned upgrade until it's resolved.

                      Comment

                      Working...
                      X