Announcement

**smitty3268** · 01 June 2020, 08:00 PM

Originally posted by eltomito View Post

Shit! Linus didn't say shit or fuck even once in such a long post! He really did get brainwashed. Don't listen to him anymore, he's controlled by the alien brainwashers!

But wait, we all know he can't reject patches or provide technical arguments against including code now that the CoC exists. It must be an alien lizard who's taken his place and impersonating him.

**skeevy420** · 01 June 2020, 08:03 PM

Originally posted by eltomito View Post

Shit! Linus didn't say shit or fuck even once in such a long post! He really did get brainwashed. Don't listen to him anymore, he's controlled by the alien brainwashers!

Am I mis-reading this shit?

Because it looks to me like this basically exports cache flushing instructions to user space, and gives processes a way to just say "slow down anybody else I schedule with too".

I don't see a way for a system admin to say "this is fucking stupid, don't do it".

In other words, from what I can tell, this takes the crazy "Intel ships buggy CPU's and it causes problems for virtualization" code (which I didn't much care about), and turns it into "anybody can opt in to this disease, and now it affects even people and CPU's that don't need it and configurations where it's completely pointless".

To make matters worse, it has that SW flushing fallback that isn't even architectural from what I remember of the last time it was discussed, but most certainly will waste a lot of time going through the motions that may or may not flush the L1D after all.

I don't want some shit application to go "Oh, I'm _soo_ fucking special and pretty and such a delicate flower, that I want to flush the L1D on every task switch, regardless of what CPU I am on, and regardless of whether there are errata or not".

Because that app isn't just slowing down itself, it's slowing down others too. Fuck that shit.

I have a hard time following whether this might all end up being predicated on the STIBP static branch conditionals and might thus at least be limited only to CPU's that have the damn problem in the first place.

But I ended up unpulling it because I can't figure that out, and the explanations in the commits don't clarify (and do imply that it's regardless of any other errata, since it's for "undiscovered future errata").

Because I don't want a random "I can make the kernel do stupid things" flag for people to opt into. I think it needs a double opt-in.

At a _minimum_, SMT being enabled should disable this kind of crazy pseudo-security entirely, since it is completely pointless in that situation. Scheduling simply isn't a synchronization point with SMT on, so saying "sure, I'll flush the L1 at context switch" is beyond stupid.

I do not want the kernel to do shit that seems to be "beyond stupid".

Because I really think this is just PR and pseudo-security bullshit, and I think there's a real cost in making people think "oh, I'm so fucking special that I should enable this".

I'm more than happy to be educated on why I'm wrong, but for now I'm unpulling this shit for lack of data.

Maybe it never happens on SMT because of all those subtle static branch rules, but I'd really like to that to be explained.

FTFY?

**hotaru** · 01 June 2020, 08:03 PM

Originally posted by KoenDG View Post

If Linus's statements on performance check out, and I would suspect they are, it would be harmful to performance everywhere and would swamp countless developers with sudden bugtickets about bad performance, and tarnish Linux's reputation as a whole.

so now what we end up with is Linux having a reputation for not being secure. obviously the real solution here is to just drop support for all Intel CPUs, but since Linus won't do that, making the security mitigations optional and pushing decisions about the tradeoffs between security and performance onto the people who will have to deal with the consequences of those decisions seems like the only real way forward.

**nuetzel** · 01 June 2020, 09:46 PM

Good that Linus is on AMD, now...

**chithanh** · 01 June 2020, 09:48 PM

Originally posted by stormcrow View Post

It exposes a CPU scheduling and caching decision better left in kernel space and to admins to user space where any particular user can slip in a flag that enables L1D flushing whether the CPU errata specifies a need for it or not because $reasons. $reasons need not even be valid. On a multiuser system that's untenable. It appears to have not had a flag to disable it from ever being activated by admins if it's not useful or undesired.

Given that the patches come from Amazon, I think that gives some indication what could be $reason. Amazon wants to move customer VMs between hosts which may have different hardware/firmware mitigation status. Enabling unconditional L1d flush would make vulnerable hosts slow for everything, so Amazon can just tell their customers to enable it for security-critical processes only.

**Kver** · 01 June 2020, 09:51 PM

Originally posted by Danny3 View Post

But what about wallets, encryption programs that must be very secure?
Wouldn't this be helpful for those?

No. It would be like nailing the door shut while the window is open because you're afraid of flying shark attacks.

If anything, use of this function could potentially make programs using it *more vulnerable* in a twisted way. Here's how (in a brutally simplified example):

I'm a bad acting program and I want to read your super secret information, and I'm using the l1 cache to do it. I need to continually read data until I can confirm that I'm reading from that l1 cache. I do this by knowing how long it took to fetch data: the l1 cache is fast, so if it is read quickly I'm fetching l1 data. I get as much of the cache as I can and just start watching. Lets say a wallet program is using this new cache flushing API and has just put some sensitive information in that cache, and it wants to flush it out. Here's where the stupidity comes in; by flushing the entire cache I've just flagged to everyone that sensitive data was there. If the bad acting program was monitoring and recording the l1 cache, it could see that the cache was flushed (everything suddenly slowed down) and know it just hit the jackpot. Doubly so if it had its own "known" memory it kept primed, if its own data was wiped then something definitely flushed the cache.

There are problems with this attack; this isn't particularly reliable. Certainly less reliable than the current crop of bugs. But it's still vulnerable, and for certain applications which continually encrypt data the attack can still work. E.g. you can steal HTTPS encryption fairly readily. Now, you could also randomly (or regularly) flush the cache, but at that point you're literally destroying performance for a sysadmin-level threat using marketing-level logic. Also, attacks of this level are usually done in tandem with other vulnerabilities, so there may be methods to make it more reliable.

The fix on this one is with hardware mitigation. Sadly, it does mean that damn near every server today is going to be somewhat vulnerable to these classes of threats forever, but it doesn't mean we throw out nuance. In general, kernel-level security should use kernel-level solutions; if you go higher than that you start giving away userland-level behaviors.

**ryao** · 01 June 2020, 11:17 PM

Originally posted by hotaru View Post

so now what we end up with is Linux having a reputation for not being secure. obviously the real solution here is to just drop support for all Intel CPUs, but since Linus won't do that, making the security mitigations optional and pushing decisions about the tradeoffs between security and performance onto the people who will have to deal with the consequences of those decisions seems like the only real way forward.

If this code were merged, I guarantee someone would point out that it permits a DoS attack on multiuser systems. There is no winning when it comes to this.

**ryao** · 01 June 2020, 11:24 PM

Originally posted by chithanh View Post

Given that the patches come from Amazon, I think that gives some indication what could be $reason. Amazon wants to move customer VMs between hosts which may have different hardware/firmware mitigation status. Enabling unconditional L1d flush would make vulnerable hosts slow for everything, so Amazon can just tell their customers to enable it for security-critical processes only.

That makes sense only if SMT is disabled, but if that is the case, then Amazon should directly address that instead of putting a hack to workaround it into Linux. This hack is a potential DoS issue for multiuser systems.

**yoshi314** · 01 June 2020, 11:32 PM

Originally posted by hotaru View Post

so now what we end up with is Linux having a reputation for not being secure. obviously the real solution here is to just drop support for all Intel CPUs, but since Linus won't do that, making the security mitigations optional and pushing decisions about the tradeoffs between security and performance onto the people who will have to deal with the consequences of those decisions seems like the only real way forward.

i think what we end up with is linux having sane developer at the lead. the patch as it is can be misused in so many ways, just read the thread here.

you can drop intel cpu's in your dreams. companies still keep buying those because "nobody got fired for getting intel".

**duby229** · 01 June 2020, 11:47 PM

Originally posted by Templar82 View Post

While Linus certainly knows more about this kind of "deep" CPU stuff than me, this seems like a bit of an over reaction for something that will be optional.

He's totally correct though. flushing the l1d isn't gonna just slow down the app that initiated it, it's gonna slow down -everything-. And you know damn well some douchebag developing some kind of wallet or something like that is gonna force this on.

Announcement

Torvalds Blasts "Beyond Stupid" Flushing L1d On Context Switches - Reverts Code For Now

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment