Linux Fix Pending For Borked Hibernation After Disabling Hyper Threading
If you have begun disabling Intel Hyper Threading on your systems over security concerns in light of MDS/Zombieload and other vulnerabilities making HT look increasingly unsafe, you may have noticed your system doesn't resume properly after hibernation. Fortunately, a fix is on the way.
More operating systems have been adding options or even on the BSD front considering a default around disabling Hyper Threading out of security concerns. On the Linux front HT/SMT is enabled by default but there is now the new convenient mitigations= option (granted also other ways to disable HT/SMT previously, now just bundled under the "mitigations" umbrella) and even with the case of openSUSE has added mitigations/HT options to their installer. If you've decided to disable Hyper Threading, it turns out resuming after hibernation would run into problems and likely just reboot the system rather than successfully resume.
That resume after hibernation issue when Hyper Threading is disabled is now figured out and a patch is pending for the mainline kernel and back-porting back through Linux 4.19.
The fix is bringing back up all the SMT threads during the resume process before offlining them again. The commit message below explains the peculiar issue in more detail.
More operating systems have been adding options or even on the BSD front considering a default around disabling Hyper Threading out of security concerns. On the Linux front HT/SMT is enabled by default but there is now the new convenient mitigations= option (granted also other ways to disable HT/SMT previously, now just bundled under the "mitigations" umbrella) and even with the case of openSUSE has added mitigations/HT options to their installer. If you've decided to disable Hyper Threading, it turns out resuming after hibernation would run into problems and likely just reboot the system rather than successfully resume.
That resume after hibernation issue when Hyper Threading is disabled is now figured out and a patch is pending for the mainline kernel and back-porting back through Linux 4.19.
The fix is bringing back up all the SMT threads during the resume process before offlining them again. The commit message below explains the peculiar issue in more detail.
We always, no matter what, have to bring up x86 HT siblings during boot at least once in order to avoid first MCE bringing the system to its knees. That means that whenever 'nosmt' is supplied on the kernel command-line, all the HT siblings are as a result sitting in mwait or cpudile after going through the online-offline cycle at least once.
This causes a serious issue though when a kernel, which saw 'nosmt' on its commandline, is going to perform resume from hibernation: if the resume from the hibernated image is successful, cr3 is flipped in order to point to the address space of the kernel that is being resumed, which in turn means that all the HT siblings are all of a sudden mwaiting on address which is no longer valid.
That results in triple fault shortly after cr3 is switched, and machine reboots.
Fix this by always waking up all the SMT siblings before initiating the 'restore from hibernation' process; this guarantees that all the HT siblings will be properly carried over to the resumed kernel waiting in resume_play_dead(), and acted upon accordingly afterwards, based on the target kernel configuration. Symmetricaly, the resumed kernel has to push the SMT siblings to mwait again in case it has SMT disabled; this means it has to online all the siblings when resuming (so that they come out of hlt) and offline them again to let them reach mwait.
23 Comments