Linux Fixes Hosts Randomly Rebooting During Virtualization With Ryzen 7000/8000 CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Espionage724
    replied
    Originally posted by Forge View Post

    Zen 4 aka Ryzen 7000/8000 does not support VMLOAD/VMSAVE. Never has. The hardware is there, because it's shared silicon with EPYC 4004, but it's not supposed to be enabled/advertised. It is, due to motherboard OEMs just enabling everything and shipping. Easy fix kernel side, means you don't need a firmware update.
    So, AMD being cheap and leaving vendors to be indirectly incompetent, along with end-users having the results after paying for all the hardware involved? I'm still not seeing a chain of confidence

    What is newer hardware doing re-using old hardware and cutting out a feature on the old hardware? And why would someone buy that?

    Leave a comment:


  • theriddick
    replied
    I think I've experienced this a few times and was like WTF... thought I was experiencing some sort of PSU fault!

    I've got the VM flags enabled in bios and don't use a VM currently but have experienced this even when a guest vm isn't running so perhaps it happens when support is enabled even if unused as well?

    Leave a comment:


  • Forge
    replied
    Originally posted by Espionage724 View Post


    It's sounding pretty hardware to me if VM on an AMD CPU is causing a host-side reboot, especially if it's a bug/not correct behavior

    The whole point of userspace or whatever from Windows XP -> Vista and other OSs is to not have software crashing hardware. Software being able to trigger hardware to cause a reboot sound pretty busted and I haven't heard of anything comparable on Intel yet.

    With that above code patch; wtf does that even mean? That sounds like a platform-specific fix that's implying AMD allows their CPUs alongside their own AGESA updates to somehow exist on broken hardware. Yeah that's not inspiring confidence in a chain of stability
    Zen 4 aka Ryzen 7000/8000 does not support VMLOAD/VMSAVE. Never has. The hardware is there, because it's shared silicon with EPYC 4004, but it's not supposed to be enabled/advertised. It is, due to motherboard OEMs just enabling everything and shipping. Easy fix kernel side, means you don't need a firmware update.

    In a more perfect world, we'd probably be able to dump out some supporting microcode/firmware enablement from a board that properly supports EPYC 4004 and enable VMLOAD/VMSAVE for everyone on Zen4, but it's really not worth the effort.

    I was affected by this, replaced quite a bit of hardware trying to diagnose. Unlike some, I won't be making any melodramatic edicts about it.

    Leave a comment:


  • Espionage724
    replied
    Originally posted by dayone View Post
    This Post is about nested VMs aka VMs running in VMs. Its an Software issue and not a hardware issue like with Intel CPUs.

    It's sounding pretty hardware to me if VM on an AMD CPU is causing a host-side reboot, especially if it's a bug/not correct behavior

    The whole point of userspace or whatever from Windows XP -> Vista and other OSs is to not have software crashing hardware. Software being able to trigger hardware to cause a reboot sound pretty busted and I haven't heard of anything comparable on Intel yet.

    With that above code patch; wtf does that even mean? That sounds like a platform-specific fix that's implying AMD allows their CPUs alongside their own AGESA updates to somehow exist on broken hardware. Yeah that's not inspiring confidence in a chain of stability
    Last edited by Espionage724; 17 November 2024, 06:54 PM.

    Leave a comment:


  • yump
    replied
    Wait, Epyc 4004? I thought that was just client AM5 Ryzen 7000 with a different part number.

    Patch is:

    Code:
    +    /*
    +     * These Zen4 SoCs advertise support for virtualized VMLOAD/VMSAVE
    +     * in some BIOS versions but they can lead to random host reboots.
    +     */
    +    switch (c->x86_model) {
    +    case 0x18 ... 0x1f:
    +    case 0x60 ... 0x7f:
    +        clear_cpu_cap(c, X86_FEATURE_V_VMSAVE_VMLOAD);
    +        break;
    +    }​+ break;​
    Anybody with an actual Epyc 4004 feel like checking?

    Leave a comment:


  • MastaG
    replied
    Just settle down with 6.12 and all of your troubles belong to the past.

    Leave a comment:


  • WannaBeOCer
    replied
    Originally posted by dayone View Post

    That was an issue with Early Ryzen 1000 CPUs, you shouldve RMAd your CPU like everyone else at that time.

    This Post is about nested VMs aka VMs running in VMs. Its an Software issue and not a hardware issue like with Intel CPUs.
    Intel's issue is also a software, just like AMD's Ryzen 7000X3D chips that were overvolting too high due to their microcode causing them to instantly fry. Unlike AMD, Intel's was slow roasting.

    Leave a comment:


  • caligula
    replied
    Originally posted by sophisticles View Post
    I can confirm this behavior with Ryzen 1600 running different distros bare metal and doing long encodes that last hours.

    Sometimes the system can complete the work, sometimes after hours of going strong it will just shutdown as if it got tired of working.
    First gen Ryzen had some stability issues in idle mode. There were fixes. Some had HW bugs, some could be updated with a bios patch or Linux service poking some port. I've used several generations of ryzens since then without any hardware lockups. Typically uptime is 20+ days according to uprecords. Then it's time to update the kernel.
    Last edited by caligula; 18 November 2024, 07:44 AM.

    Leave a comment:


  • domih
    replied
    Damn, another exercise in sophistry in a thread. What an idiotic post, attempting to equate a software bug to a hardware damaging bug under the fallacious use of "stability". When you see this commenter name, you know you're going to read something mostly deranged. Sometimes it is so blatant that it is funny. Most of the times, it's tragic.

    Leave a comment:


  • intelfx
    replied
    Originally posted by sophisticles View Post
    Imagine all the ignoramuses that built systems around these AMD CPUs to run VMs only to find that they are not stable.
    Oh boy, someone is really salty.

    Originally posted by sophisticles View Post
    I can confirm this behavior with Ryzen 1600 running different distros bare metal and doing long encodes that last hours.
    It's not this behavior. You should really learn some reading comprehension skills one of these days.

    Leave a comment:

Working...
X