Announcement

Collapse
No announcement yet.

Some Ryzen Linux Users Are Facing Issues With Heavy Compilation Loads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
    ernstp
    Senior Member

  • ernstp
    replied
    Well when I was messing with overclocking I got this problem. Also, I could run one Cinebench, but not two in a row.

    Leave a comment:

  • Beherit
    Senior Member

  • Beherit
    replied
    Originally posted by GreatEmerald View Post
    Yea, to me it sounds like motherboard issues, likely not enough voltage to drive stock clocks. There sure have been enough motherboard issues so far, so it wouldn't be very surprising.

    I wonder if anyone tried using the same processor in two motherboards, one known to work and one known not to.
    The segfault error reported here is very different than "just" random crashes or reboot. Voltage surges or undervoltage during load, would trigger errors during any heavy load, regardless of OS or activity. The workloads described here are too specific compared to the randomness of a dodgy motherboard.

    I, for one, would find it very surprising indeed if this particular issue when compiling, is caused by motherboard problems so mysterious that they only occur when using the gcc compiler in Linux. There are no reports on this amongst Windows developers, I've not been able to find a single one.

    Digging through forums and googling, those reporting this are 80% Gentoo users, 5% Fedora, 5% Debian and 10% Ubuntu. Here's the earliest report of this I was able to find, courtesy of the Australian edition of Linux Format, dated Apr 11.

    FreeBSD has one report which can be related to this, but yet to be verified. DragonBSD confirms using two workarounds on the Ryzen platform. One to fix CPU clockrate detection (thus has nothing to do with this), and the second is a Ryzen hardware bug I'd wish details on how it's triggered and what happens when it is.

    I'm curious if this fix, by recompiling bash, works for everyone. And also if segfaults occur when using LLVM/Clang on the same source code as with gcc.

    Leave a comment:

  • Chewi
    Senior Member

  • Chewi
    replied
    That's interesting but users aren't just seeing issues under load. Mine freezes while doing practically nothing at all. As I said before though, there may be multiple issues at play.

    Also be aware that I saw weird segfaults from Java under ppc64 when ASLR is disabled.
    Chewi
    Senior Member
    Last edited by Chewi; 03 June 2017, 10:17 AM.

    Leave a comment:

  • tholin
    Junior Member

  • tholin
    replied
    I've been following this problem on gentoo's forum and it's almost impossible to know what works and what doesn't. The crashes are nondeterministic so users might change bios setting and then get lucky for a while so they assume the bios change solved the problem. There could also be several separate problems with the same symptoms but requiring separate fixes.

    Originally posted by Beherit View Post
    DragonBSD developer Matt Dillon wrote a workaround for a hardware bug in Ryzen: http://gitweb.dragonflybsd.org/drago...d301557fd9ac20
    Can someone clarify this? Is is saying that returning from interrupt to a "high user %rip address near the end of the user address space (top of user stack)" sometimes crashes. Do dragonfly run programs that execute code from stack? Or is it saying that a crash might happen when the return pointer is read from stack when the stack is near the end of the user address space?

    If there is a hardware bug depending on specific address in user address space it would make sense that compile jobs triggers it. Linux use address space layout randomization to put memory segments on different addresses on each run. A compile job forks a lot of processes each with their own layout. Try compiling without ASLR. "echo 0 > /proc/sys/kernel/randomize_va_space".

    Leave a comment:

  • Peter Fodrek
    Senior Member

  • Peter Fodrek
    replied
    What about fake chips

    That was often AMD case “Only” 60 Thousands Fake AMD Chips Arrested, Million Already Shipped

    KingFish 5 Jan, 2005

    “Based on tips provided by Advanced Micro Devices (AMD) Taiwan, the police Friday raided an electronics company located in Tainan, southern Taiwan, and seized a total of 60 000 suspect AMD CPUs. The suspect AMD CPUs, including K7 [AMD Athlon XP] and K8 [AMD Athlon 64] models, were defective CPUs that would normally have been destroyed,” claims an article posted on Taiwan-based web-site DigiTimes.

    http://icrontic.com/article/2005-onl...lready_shipped


    Then waste company sold AMD chips to China and Germany instead of destroy them
    Amazon accidentally ships counterfeit AMD APUs


    https://www.extremetech.com/computin...rfeit-amd-apus


    or


    The Counterfeit Electronics Problem


    In the Fall of 2003, AMD conducted some raids in Europe,
    where some of its low speed, low priced microprocessors we being relabeled as high speed, high priced chips. On investigation it was found that some resellers in Shenzhen, China were performing the remarking. AMD also purchased some microprocessors from the resellers and found them to be fakes (Takahash,2004).


    In January 2005, Advanced Micro Devices (AMD), working

    in cooperation with Taiwanese authorities, seized a total of

    60,000 counterfeit AMD microprocessors worth US $9.46 mil ....

    -


    https://file.scirp.org/pdf/JSS_2013121215153599.pdf

    Leave a comment:

  • GreatEmerald
    Senior Member

  • GreatEmerald
    replied
    Yea, to me it sounds like motherboard issues, likely not enough voltage to drive stock clocks. There sure have been enough motherboard issues so far, so it wouldn't be very surprising.

    I wonder if anyone tried using the same processor in two motherboards, one known to work and one known not to.

    Leave a comment:

  • Chewi
    Senior Member

  • Chewi
    replied
    Since switching to AGESA 1.0.0.6 this morning, I've already had one freeze under a "Generic-x86-64" kernel despite that optimisation level working for 14 hours on 1.0.0.4 yesterday. Either I was lucky yesterday or I'm battling multiple issues. I've just switched to GCC 7.1 in the hope that helps but I can't see how it would if you're not using -march. I'm only using it to build the kernel. You can do that by passing CC=gcc-7.1.0.

    What I find interesting is that I haven't seen any segfaults. Every incident has been an entire system lockup. I've only rebuilt a small handful of packages and most of my system is still built against -march=nehalem from my old system.

    I have been running the netconsole kernel module to send console messages over UDP. I usually do get some output following the freeze but there doesn't appear to be any pattern to the stack traces. It seems like something quite fundamental is failing.

    I did try disabling XMP early on but it didn't help. I'll try it again now that I've updated the BIOS. I haven't tweaked any other BIOS settings but I'll look into that. I could also try running Fedora off a stick for a while to see how long that holds up and I might even try borrowing a Fedora kernel for my Gentoo system.
    Chewi
    Senior Member
    Last edited by Chewi; 03 June 2017, 08:34 AM.

    Leave a comment:

  • Beherit
    Senior Member

  • Beherit
    replied
    As of today, I'm one step closed towards obtaining my black belt in googleizing:

    FreeBSD system panic after 14 hours of compiling: https://bugs.freebsd.org/bugzilla/sh....cgi?id=219399
    DragonBSD developer Matt Dillon wrote a workaround for a hardware bug in Ryzen: http://gitweb.dragonflybsd.org/drago...d301557fd9ac20

    Still trying to find reports from Windows/Visual Studio developers.

    Leave a comment:

  • tg--
    Phoronix Member

  • tg--
    replied
    I'm also a Gentoo user.
    I bought the Ryzen when it came out, and have been running it since. I've probably compiled for a combined 24 hours in this few months, so it has been quite heavily loaded.
    I've seen crashes with early BIOS versions, but since over a month I haven't had a crash that wasn't caused by excessive overclocking. Nor do I see any other problems anymore.

    Ryzen 1800X @ 3.9 GHz overclock, stock voltage
    ASUS Prime X370 Pro
    Kingston 2133 MT ECC Ram running at 2666 MT currently (only possible with AGESA 1.0.0.6)

    gcc 6.3.0
    CFLAGS_DEFAULT="-O2 -pipe -march=znver1"
    MAKEOPTS="-j16 -l16"

    So I can't confirm any of the issues. When I'm raising the clock to 3.95 GHz I'm seeing uops-cache ECC failures. When I raise ram clock beyond 1333 MHz, I'm seeing RAM ECC errors.
    Crashes under heavy load appear at 4 GHz, halts due to excessive ECC failures appear under heavy load at abote 2666 MT.

    Clearly doesn't seem to be a general problem, though hardware may be faulty for some people. Not for me.

    Leave a comment:

  • qsmcomp
    Phoronix Member

  • qsmcomp
    replied
    Having random reboot issues with Ryzen 7 1700 when XFR is enabled.
    Disabling Turbo, disabling C6, manually set frequency at 3.2GHz, enabling LLC and increasing core voltage to 1.25v seems to help workaround the issue.
    I have been using "sensors" command with https://github.com/groeck/nct6775 driver to watch the voltage of CPU core on my MSI B350M Mortar motherboard.
    With default BIOS settings the core voltage sometimes goes up to 1.35v but for mostly it is running in 1.09V. Nothing happened.
    With some random changes with BIOS settings the core voltage sticks below 1.19v and never goes up to 1.20v. The problem occurs.

    Leave a comment:

Working...
X