Announcement

Collapse
No announcement yet.

Some Ryzen Linux Users Are Facing Issues With Heavy Compilation Loads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
    pjssilva
    Junior Member

  • pjssilva
    replied
    I am one of the users with an affected Ryzen in that AMD Community thread. There we are wondering how usual the problem is. Some people have already got a new CPU through RMA but the problem persists in the new CPU. We also have example of people that have exchanged all major components in the system (processor, motherboard, memory, PSU, graphics card) and still faces the segfault under heavy compilation.

    A special characteristic is that the problem is not that easy to trigger. You may or may not compile many things and not see it. So a person may have a processor that is affected but may not not see it. Fortunately some smart people have created a simple script that always shows the problem in may system and in the system of the other people of the thread. The script can be found in

    https://github.com/suaefar/ryzen-test

    You just have to clone the repository, move to the ryzen-test directory and run ./kill_ryzen.sh. It is a very simple script, it downloads gcc-7.1 source code into a vram disk and start #processors simultaneous compilation of it. If any compilation fails it writes a message in the console saying how long it took to get the failure. After a few minutes in my system I the build fails unless I turn off SMT (I am also invreasing SOC and Memory voltage, but I am not completely sure this is necessary).

    Now, I would like to ask a hand from the fellow readers. If you have a Ryzen system can you test it with the kill_ryzen.sh script? Let's same for one or two hours. After that post here the result even if no failures happen. This may be a nice way to find out how common is the problem. That is the reason that it is important to have both kindos of reports: failues and sucessful builds.

    Obs: The kill_ryzen.sh is an infinite build loop, the easiest way to stopping it is rebooting.

    Leave a comment:

  • Zucca
    Senior Member

  • Zucca
    replied
    Originally posted by foppe View Post
    the vSoC drives the chipset/mem controller, not the cpu, so I doubt it really affects overall power draw all that much.
    Oh. That's better news then, I guess. At least from power/heat generation standpoint.

    Leave a comment:

  • foppe
    Junior Member

  • foppe
    replied
    Originally posted by Zucca View Post
    Based on this, the problem couldbe circumvented by BIOS/UEFI update. Although, in my understanding, when raising the operating voltage of the cores you also raise se power consumption, which might lead to more problems with certain CPU+MB combos. Also then 7 1700 wouldn't be 65W TDP aynnmore.

    I've been planning to upgrade my Opteron 3380 to 7 1700 on my server (MB and RAM aswell, of course). I guess I'll still wait. Opteron 3380 is fairly capable for what I need, but sometimes when encoding videos it takes looooong. But being my home server it can do all the work at night. So no hurry.
    Not sure if that's necessarily true. Some bioses seem to set vSoC at .9v, while mine was at .945v, and the vSoC drives the chipset/mem controller, not the cpu, so I doubt it really affects overall power draw all that much.

    Leave a comment:

  • Zucca
    Senior Member

  • Zucca
    replied
    Originally posted by foppe View Post

    Some. A subset of users seems to benefit from upping the SoC voltage from ~.945 (auto) to 1.05-~1.1v. I'm fairly sure it's mostly an electrical issue, as exchanging processors doesn't seem to make any difference, but it's not yet led to AMD announcing any kind of fix.
    Based on this, the problem couldbe circumvented by BIOS/UEFI update. Although, in my understanding, when raising the operating voltage of the cores you also raise se power consumption, which might lead to more problems with certain CPU+MB combos. Also then 7 1700 wouldn't be 65W TDP aynnmore.

    I've been planning to upgrade my Opteron 3380 to 7 1700 on my server (MB and RAM aswell, of course). I guess I'll still wait. Opteron 3380 is fairly capable for what I need, but sometimes when encoding videos it takes looooong. But being my home server it can do all the work at night. So no hurry.

    Leave a comment:

  • timon37
    Junior Member

  • timon37
    replied
    I think even after upping the voltage it was still necessary to disable ASLR to reach stability, at least in most cases.
    Note, that's my rough impression from following the amd forum thread, I didn't keep track of who did what exactly;p
    timon37
    Junior Member
    Last edited by timon37; 06 July 2017, 05:12 AM.

    Leave a comment:

  • foppe
    Junior Member

  • foppe
    replied
    Originally posted by Zucca View Post
    So. Do we have any more information about this problem?
    Some. A subset of users seems to benefit from upping the SoC voltage from ~.945 (auto) to 1.05-~1.1v. I'm fairly sure it's mostly an electrical issue, as exchanging processors doesn't seem to make any difference, but it's not yet led to AMD announcing any kind of fix.

    Leave a comment:

  • Zucca
    Senior Member

  • Zucca
    replied
    So. Do we have any more information about this problem?

    Leave a comment:

  • PuckPoltergeist
    Senior Member

  • PuckPoltergeist
    replied
    Originally posted by Beherit View Post
    Hardware bugs aren't necessarily triggered by all kernels.

    Other than an undocumented hearsay of it happened to one ryzen user using netbsd, and Matt Dillon's (dragonflybsd) report, the bug is only triggered when using Linux. Still not a single Windows user reporting this.
    You're wrong:
    https://community.amd.com/message/2804636#2804636
    > Yet windows does not trigger it?

    As I reported several time, Windows Subsystem for Linux (WSL), so called "Bash on Ubuntu on Windows"
    triggered this kind of problem (see my past report for the detail). WSL is the linux userland on WIndows kernel
    (more precisely it consists of Linux emulation layer and NT kernel). And NetBSD triggered the very similar problem.

    Leave a comment:

  • scorpio810
    Junior Member

  • scorpio810
    replied
    Originally posted by Chewi View Post

    Heh, spoke too soon. It froze again a few hours later. The X pointer kept moving at first but then that froze too. I guess Debian's kernel is affected after all.
    Since I added CONFIG_RCU_NOCB_CPU and CONFIG_RCU_NOCB_CPU_ALL and Norandmaps in my kernel of vanilla 4.11.x, I have never seen a freeze ...
    Thanks for the tip !

    Leave a comment:

  • Beherit
    Senior Member

  • Beherit
    replied
    Originally posted by PuckPoltergeist View Post

    As this is a hardware bug, why the Debian kernel shouldn't be affected?
    Hardware bugs aren't necessarily triggered by all kernels.

    Other than an undocumented hearsay of it happened to one ryzen user using netbsd, and Matt Dillon's (dragonflybsd) report, the bug is only triggered when using Linux. Still not a single Windows user reporting this.

    Leave a comment:

Working...
X