Announcement

Collapse
No announcement yet.

Some FreeBSD Users Are Still Running Into Random Lock-Ups With Ryzen

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Apteryx
    replied
    Originally posted by vortex View Post

    I am sure if someone can find a way to replicate the issue, they will be able to fix it.
    This also don't happen on Windows AFAIK, so, it is a linux bug of some type.

    If you can't replicate it since it is too random, then it is a guessing game, and not much progress can be done.

    So, if you have this issue, then try to find a way to replicate it, then report the bugs.
    Running the dhtcluster service from OpenDHT triggers it easily... (https://github.com/savoirfairelinux/opendht). FYI, this is the tech used by GNU Ring for its distributed network. Running this service seems to trigger a crash per day about. Notice that if you run anything intensive the CPU won't lock. The system needs to be fairly idle for the crash to happen, so I'd suggest leaving that service running alone.

    I've proposed to AMD that they try an Ansible script I have to deploy this service on a Debian 9 system if they'd like to reproduce on their side but they have just ignored it.

    Leave a comment:


  • Apteryx
    replied
    Originally posted by Chewi View Post
    Yes, Linux is certainly not free of this issue or any OS for that matter because this is a hardware issue. I half joked that you wouldn't see it on Windows because it's never idle for long enough. Turns out AMD have privately said exactly the same thing. So yes, they are aware of the issue but it doesn't seem like they're going to admit it publicly. They told me directly that it would be fixed in a BIOS update but I'm not holding my breath. For now, I am using the CONFIG_RCU_NOCB_CPU workaround (which I discovered, you're welcome ) and I have seen maybe two freezes in the months since. I can't swear they weren't caused by something else but I don't believe the workaround is 100% effective.
    Are you also running with C-State disabled in UEFI? I also still had crashes with the CONFIG_RCU_NOCB_CPU workaround (thanks for discovering it). But it doesn't mean that C-State alone is the cure: I also had crashes with C-State off but *without* the CONFIG_RCU_NOCB_CPU hack. At this point I'm throwing everything I can at the problem, so ASLR is off, C-State is off, opcache is off and using CONFIG_RCU_NOCB_CPU. It seems to be holding that way -- touching wood.

    Leave a comment:


  • angrypie
    replied
    Originally posted by RavFX View Post

    From what I know, Only CPU returned by RMA (Binned/checked by AMD), don't have the segfault bug for sure. [...]
    I know that the segfault bug is fixed on week 33+ Ryzens. I'm talking about the not-so-new idle soft-lockup that is the subject of this very thread.

    I'm not sure if well-binned Ryzens (i.e. EPYC and Threadripper) are any better than the run-of-the-mill dies, since the efficiency curve is a characteristic of the 14LPP node they use. At most you'd get +100 or so MHz over the best binned 1800Xs (that mostly do 4.1), hopefully with less insane voltages.

    Leave a comment:


  • typerrrrrrrr
    replied
    Forcing off the C6 states (https://github.com/r4m0n/ZenStates-Linux) with a script in /etc/rc.local has changed my setup from having an average uptime of 1 - 1.5 days into 50+ days (generally I've rebooted for other reasons so haven't gone beyond that number). Every other attempted fix (RCU, randomizing VA space, etc) had nearly zero effect.

    Leave a comment:


  • zaphod
    replied
    @slacka:

    the segmentation fault issue [0], which was talked about here last august and which has been admitted by amd [1] seemed to only affected the first ryzens cpus. BUT this issue here however is a different one! See [2] and for the ubuntu bug report [3]. I don't know of any official response to this issue. Even a threadripper cpu seemed to be affected (last report in [3]). I haven't found any info about epyc but epyc is also another stepping (B2) than Threadripper and normal ryzen (B1).

    I am interested in buying a ryzen cpu but this issue is holding me back. Hopefully this issue will be fixed in the next ryzen cpus, which are about to be released in next months.

    [0] https://bugzilla.kernel.org/show_bug.cgi?id=196481
    [1] https://www.phoronix.com/scan.php?pa...-Segv-Response
    [2] https://bugzilla.kernel.org/show_bug.cgi?id=196683
    [3] https://bugs.launchpad.net/linux/+bug/1690085

    Leave a comment:


  • zaphod
    replied
    @slacka:
    the segmentation fault issue [0], which was talked about here last august and which has been admitted by amd [1] seemed to only affected the first ryzens cpus. BUT this issue here is a different one! See [2] and for the ubuntu bug report [3]. I don't know of any official response to this issue. Even a threadripper cpu seemed to be affected (last report in [3]).
    I haven't found any info about epyc but epyc is also another stepping (B2) than threadripper and normal ryzen (B1).

    I am interested in buying a ryzen cpu but this issue is holding me back. Hopefully this issue will be fixed in the next ryzen cpus, which are about to be released in next months.

    [0] https://bugzilla.kernel.org/show_bug.cgi?id=196481
    [1] https://www.phoronix.com/scan.php?pa...-Segv-Response
    [2] https://bugzilla.kernel.org/show_bug.cgi?id=196683
    [3] https://bugs.launchpad.net/linux/+bug/1690085


    Leave a comment:


  • RavFX
    replied
    Originally posted by angrypie View Post

    No, it's not.

    Don't get your feelings hurt, son, it's just a piece of sand.
    From what I know, Only CPU returned by RMA (Binned/checked by AMD), don't have the segfault bug for sure. CPU on the normal market still have the bug, you can be plus or less lucky. You can also ask AMD directly to purchase CPU without the bug, they can arrange that for any volume order without an issue. But it's not too much of an issue, it take less than a week to get the replacement and in bonus you get an extremely overclock-able CPU. I think they just stuff them with superior die that where supposed to be in threadripper/epyc.

    It's good that using the rcu_nocb actually fixed the idle crash on mine as disabling C6 is a noop (Living in Mexico, i'm like less than 5 KWH from DAC tariff (electricity_bill *= 4).

    Leave a comment:


  • angrypie
    replied
    Originally posted by slacka View Post
    Holy fuck why don't your read the thread before spewing garbage? It was only early CPUs that are affected [...]
    No, it's not.

    Don't get your feelings hurt, son, it's just a piece of sand.

    Leave a comment:


  • angrypie
    replied
    Originally posted by slacka View Post
    Holy fuck why don't your read the thread before spewing garbage? It was only early CPUs that are affected [...]
    No, it's not.

    Don't get your feelings hurt, son, it's just a piece of sand.

    Leave a comment:


  • soulsource
    replied
    Originally posted by JPFSanders View Post

    Please, can you check how you did that and let me know please?

    I have the idling problem, what I do to prevent the computer from freezing is to run a session of an emulator on the background to keep the computer busy, that way it never crashes. (I'm using Kernel 4.14.14 ATM)
    On my mainboard (MSI B350M Mortar) the setting can be found under Overclocking->CPU Features->Core C6 State.

    Leave a comment:

Working...
X