Torvalds Blasts "Beyond Stupid" Flushing L1d On Context Switches - Reverts Code For Now

Written by Michael Larabel in Linux Kernel on 1 June 2020 at 06:03 PM EDT. 51 Comments

As part of the initial set of changes merged today for Linux 5.8 was the x86/mm material that included the controversial feature of opt-in flushing of the L1 data cache on context switching. Linus Torvalds ended up deciding to revert this functionality as for now at least he views it as crazy.

While this feature is opt-in via new prctl options and not enabled by default and done in the name of helping those concerned about snoop assisted data sampling vulnerabilities or cache leakage via side channels and yet to be uncovered CPU vulnerabilities, for the time being Linux creator Linus Torvalds is not convinced.

Here are the highlights of his commentary he just posted to the kernel mailing list from this change in the x86/mm PR:

Am I mis-reading this?

Because it looks to me like this basically exports cache flushing instructions to user space, and gives processes a way to just say "slow down anybody else I schedule with too".

I don't see a way for a system admin to say "this is stupid, don't do it".

In other words, from what I can tell, this takes the crazy "Intel ships buggy CPU's and it causes problems for virtualization" code (which I didn't much care about), and turns it into "anybody can opt in to this disease, and now it affects even people and CPU's that don't need it and configurations where it's completely pointless".

To make matters worse, it has that SW flushing fallback that isn't even architectural from what I remember of the last time it was discussed, but most certainly will waste a lot of time going through the motions that may or may not flush the L1D after all.

I don't want some application to go "Oh, I'm _soo_ special and pretty and such a delicate flower, that I want to flush the L1D on every task switch, regardless of what CPU I am on, and regardless of whether there are errata or not".

Because that app isn't just slowing down itself, it's slowing down others too.

I have a hard time following whether this might all end up being predicated on the STIBP static branch conditionals and might thus at least be limited only to CPU's that have the problem in the first place.

But I ended up unpulling it because I can't figure that out, and the explanations in the commits don't clarify (and do imply that it's regardless of any other errata, since it's for "undiscovered future errata").

Because I don't want a random "I can make the kernel do stupid things" flag for people to opt into. I think it needs a double opt-in.

At a _minimum_, SMT being enabled should disable this kind of crazy pseudo-security entirely, since it is completely pointless in that situation. Scheduling simply isn't a synchronization point with SMT on, so saying "sure, I'll flush the L1 at context switch" is beyond stupid.

I do not want the kernel to do things that seem to be "beyond stupid".

Because I really think this is just PR and pseudo-security, and I think there's a real cost in making people think "oh, I'm so special that I should enable this".

I'm more than happy to be educated on why I'm wrong, but for now I'm unpulling it for lack of data.

Maybe it never happens on SMT because of all those subtle static branch rules, but I'd really like to that to be explained.

We'll see if the Amazon engineer responsible for this original patch work, the others involved in reviewing the code, or Intel's open-source team, have enough justification to get this code back into the mainline Linux kernel.

51 Comments