Linux's Speculation Handling Was Messed Up After Resuming From Suspend For Boot CPU
Hitting the mainline Linux Git tree today was a rather interesting fix... It turns out that when Linux was resuming from S3 suspend, it wasn't correctly restoring the MSRs for the boot CPU around handling speculative execution mitigations.
The Linux kernel was not restoring speculation-related model specific registers (MSRs) for the boot CPU when resuming from S3 suspend. These model specific registers are important for mitigating speculative execution vulnerabilities but unfortunately the x86 power code wasn't restoring their intended state on resume. In turn this could leave the boot CPU vulnerable (though secondary CPUs were correctly restored) after suspending the Linux system. At least with only the boot CPU not being covered, the exposure is limited but still surprising the issue was only uncovered and addressed now in 2022 after all the attention these speculative execution vulnerability mitigations have received the past few years.
The fix makes use of the Linux framework for saving/restoring extra MSR registers around the suspend/resume cycle. MSRs not previously handled correctly until this fix included MSR_IA32_SPEC_CTRL for speculation control, MSR_IA32_TSX_CTRL and MSR_TSX_FORCE_ABORT for the TSX Async Abort (TAA) vulnerability, MSR_IA32_MCU_OPT_CTRL for the Special Register Buffer Data Sampling around the Microarchitectural Data Sampling (MDS) vulnerability, and MSR_AMD64_LS_CFG.
While hitting Linux 5.18 Git today, the fix by Intel engineer Pawan Gupta is also marked for back-porting to currently supported Linux stable series.