Announcement

Collapse
No announcement yet.

Disabling Spectre V2 Mitigations Is What Can Impair AMD Ryzen 7000 Series Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • F.Ultra
    replied
    Originally posted by Developer12 View Post

    Do you know and understand the ways spectre v2 is mitigated? I said *IBPB* which is distinct from both IBRS and retpoline. AMD chips don't even possess IBRS.

    *IBPB* is issued during context switches to prevent past branches from affecting future predictions. That's it's entire purpose. [1]

    In this test IBPB was enabled during the "mitigations enabled" scenario, though selectively applied, and completely disabled during the no-mitigations run.

    [1] This is particularly important on windows, because they can't just recompile the world to use repolines on AMD hardware. People always use old versions of software and issuing IBPB on every context switch protects *all* applications regardless of whether they've been recompiled.
    Zen 4 have IBRS, it's enabled automatically when you enter ring 0 and disabled once you exit. Spectre V2 can be exploited by another process running on a sibling processor so only doing mitigation when scheduling processes is not enough to mitigate it. Also Michael runs a single process for each benchmark so the number of context switched due to scheduling should be quite low (though he doesn't pin threads to cores so some scheduling does happen within the same process of course).

    IBPB is used to protect from the scheduling issue as you write, but are there enough such cases in benchmarks of single processes at a time to create this type of overhead? Unsure if IBPB can be disabled while keeping the repolines/IBRS but if that is the case then it would be interesting to see a run of that to figure this out, because if AMD doesn't do a real barrier with IBPB then things can get real ugly here and it would be a strange path of them to take.

    edit: I also fail to see how this would benefit them in mitigations=on vs off since this is benchmark runs, aka the entire machine only runs a single application so there would be no benefit from "oh this is a new application so lets do retraining" since it's the same application and also retraining from scratch is what every cpu have to do after IBPB anyway so I still fail to see how this could explain it.
    Last edited by F.Ultra; 05 October 2022, 08:44 PM.

    Leave a comment:


  • Developer12
    replied
    Originally posted by F.Ultra View Post

    The problem with this theory is that the retpoline or the IBRS is not used when switching between applications, it's used when you make indirect calls. Thousands if not millions of those can and will be done within the same application context/thread.
    Do you know and understand the ways spectre v2 is mitigated? I said *IBPB* which is distinct from both IBRS and retpoline. AMD chips don't even possess IBRS.

    *IBPB* is issued during context switches to prevent past branches from affecting future predictions. That's it's entire purpose. [1]

    In this test IBPB was enabled during the "mitigations enabled" scenario, though selectively applied, and completely disabled during the no-mitigations run.

    [1] This is particularly important on windows, because they can't just recompile the world to use repolines on AMD hardware. People always use old versions of software and issuing IBPB on every context switch protects *all* applications regardless of whether they've been recompiled.

    Leave a comment:


  • F.Ultra
    replied
    Originally posted by ll1025 View Post
    Linus himself rejected the original retbleed patches back in 2018
    Retbleed was first announced in 2022, did you mean some other patches? The only ones I can recall having criticism from Linus was the one from Amazon for the snoop vulnerability and that was criticism, he didn't reject it.

    Leave a comment:


  • Anux
    replied
    Originally posted by ll1025 View Post
    The discussions on past articles have had plenty of "we dont need no mitigations" comments and Linus himself rejected the original retbleed patches back in 2018 due to both performance concerns and some belief that academic security vulnerabilities are unimportant. Heck, the very next comment after mine by skeevy420 expressed the very same sentiment, as did others here (Weasel, rclark).
    Yep theses are some of the names that regularly pop up in rust threads and other security related threads with their specific view on things, I would have never thought the they represent the linux enthusiast community. Did they ever claim to be? Do they even know of each other being the same community? Or are you just putting them in your tiny box to put a label on it and generalize your prejudices?

    Linus is only an individual in the Linux community and he regularly gets critic for what he says.

    Pretending that this is not a common view is just disingenuous.
    Do you know the meaning of common view? Maybe after reading up on it, you get why that's incompatible with listing 4 individuals.

    It would be much better if we discuss about what was said and not about who said it.

    Leave a comment:


  • ll1025
    replied
    Originally posted by Anux View Post
    Since when does ​Espionage724 represent the whole Linux enthusiast community? Do you even logic or just like to troll?
    The discussions on past articles have had plenty of "we dont need no mitigations" comments and Linus himself rejected the original retbleed patches back in 2018 due to both performance concerns and some belief that academic security vulnerabilities are unimportant. Heck, the very next comment after mine by skeevy420 expressed the very same sentiment, as did others here (Weasel, rclark).

    Pretending that this is not a common view is just disingenuous.

    Leave a comment:


  • skeevy420
    replied
    Originally posted by ll1025 View Post

    The overwhelming majority of people making those comments are using internet connected PCs, about internet-connected PCs.

    A graphics rendering farm behind "layers of firewalls" is still connected to networks and runs code that will be poorly vetted and is vulnerable to insider threats and.... makes a rather valuable target, as you could learn by asking major studios that have been hacked.

    APTs these days leverage a vulnerability on some foothold and then pivot internally. Maybe you use some DNS or ARP attacks from the DMZ to get at that render farm, however you do it the fact that its 2 degrees of separation from the internet is just attack implementation details, not some panacea.

    Security is layers because breach is inevitable. Even airgapped organizations have been compromised in the past, and that gets a lot easier with this overconfident sense of invulnerability that seems to have spread across the community.
    Can't say that I disagree with that. Unfortunately, networks are are controlled by humans and those humans listen to capitalistic humans which means they'll have everyone do the bare minimum to get it running and spending money on security is a waste of profits; or they're just 1337, I mean, dumb, and think they know better than everyone else. Forcing security by default is a way to protect the greedy and the stupid from themselves.

    Still though, I can also recognize the need for running at 100% performance with no mitigating factors slowing me down regardless of the risk; especially for ones with no internet access, ones running scientific studies, etc. There's an ironic joke in all of this about a supercomputer or PC group running environmental studies using 20% more energy due to mitigations.

    Leave a comment:


  • ll1025
    replied
    Originally posted by skeevy420 View Post

    Dammit. Schrodinger's Comment. While I agree with the sentiment, not every PC is internet facing or needs that kind of security. Something like a massive video encoding or graphics rendering farm behind layers of firewalls and security doesn't need or want mitigations enabled, or, rather, didn't traditionally need or want them enabled.
    The overwhelming majority of people making those comments are using internet connected PCs, about internet-connected PCs.

    A graphics rendering farm behind "layers of firewalls" is still connected to networks and runs code that will be poorly vetted and is vulnerable to insider threats and.... makes a rather valuable target, as you could learn by asking major studios that have been hacked.

    APTs these days leverage a vulnerability on some foothold and then pivot internally. Maybe you use some DNS or ARP attacks from the DMZ to get at that render farm, however you do it the fact that its 2 degrees of separation from the internet is just attack implementation details, not some panacea.

    Security is layers because breach is inevitable. Even airgapped organizations have been compromised in the past, and that gets a lot easier with this overconfident sense of invulnerability that seems to have spread across the community.

    Leave a comment:


  • F.Ultra
    replied
    Originally posted by Developer12 View Post
    Ah, ok. This makes perfect sense.

    For the people commenting here who don't know how spectre v2 works: the branch predictor is trained to predict which way code will go, so the CPU can work ahead down that path. Unfortunately, it can be mis-trained to predict a jump to a desired part of the kernel or another application, running code there which leaks data. The mitigation is to wipe all the accumulated training from the predictor every time you switch contexts. You can't really fix this in hardware 100%: the CPU needs to be *told* when you're switching from one context to another.

    It's clear what the AMD engineers have done: they're using these "erasures" to their advantage. Every time one is issued it's also a signal to the CPU that you've switched between applications, and so any predictions made with current training are probably wrong. Taking this into account, they've streamlined the process of dumping and retraining. If I were them, I'd even have implemented a cache, so that when I'm told to "wipe" the predictions, I actually *store* them along with probably a tag indicating where they occurred in the address space. Then when I'm told to wipe the predictions again I can look through the cache and try to find a set of trainings that match the current context I've just switched into.

    Suddenly you don't have to re-train the CPU's predictor every time you jump from one process to another. The OS is conveniently telling the CPU that it needs to save and restore the predictor state, so you come back to each app with the saved predictor state ready for action. It's merely a convenient side-effect that this also mitigates the vulnerability. This kinda turns the IBPB (Indirect Branch Prediction Barrier) instruction into a sort of IBSR (Indirect Branch Save and Restore, not to be confused with Indirect Branch Restricted Speculation).

    I suppose you could potentially run into vulnerabilities with cache collisions or with the OS moving one app into the space of another app, but the probability is probably(?) small.I suppose they could just add an extra knob in the microcode that allows the OS to say "hey, I'm moving this app, re-tag your training" or "hey, I'm terminating this app, forget your training."

    I suppose the only way to make this better would be to allow the microcode to write to your hard drive. :P Then it could save the predictor state there and persist it across reboots. It could even be provided by the compiler or from profiling tools so your CPU would never have to learn it itself. Probably not worth the absolutely minuscule increase in performance though. In fact just reading the filesystem might take longer than the initial training.
    The problem with this theory is that the retpoline or the IBRS is not used when switching between applications, it's used when you make indirect calls. Thousands if not millions of those can and will be done within the same application context/thread.

    Leave a comment:


  • F.Ultra
    replied
    Originally posted by Luke View Post

    Takes lots of good old fashioned spy work to first get Putin to order a machine and not have someone buy one off a shelf randomly w cash, then get whatever fake name it is ordered under, intercept the shipment possibly from a warehouse already in Russia, and get the custom CPU, custom flashed firmware, or whatever they use installed. Kudos to anyone who managed to pull off all of that. Best way in might not be Putin's box at all, but rather one he talks to in the Russian Embassy in an easier country to access.

    Suposedly the Chinese Embassy in either Washington or a pro-US country elsewhere was stupid enough to order computers for delivery, naturally they got some "customizations" during a pit stop on the way. That does not work against an adversary buying of the shelf w cash. While Putin must have a computer SOMEWHERE, word is he has a manual typewriter in his main office.
    Putin, or rather the Kremlin, does not buy stuff under false name, no such big government does. Also no one is dumb enough to target Putin directly for something like this, you target a department and those buys machines in the hundreds or thousands. Yes it requires spy work but that is that the Yanks have NSA and CIA for. You don't regulate that Intel and AMD put backdoors into their CPU:s in the west to target Putin when he and Kremlin simply runs sensitive stuff on Baikal (or other Russian developed platforms) instead.

    Targeting him via Intel/AMD is closed so it doesn't matter just how much spying you have to do, you have to do it if you want the access.

    Leave a comment:


  • Developer12
    replied
    Ah, ok. This makes perfect sense.

    For the people commenting here who don't know how spectre v2 works: the branch predictor is trained to predict which way code will go, so the CPU can work ahead down that path. Unfortunately, it can be mis-trained to predict a jump to a desired part of the kernel or another application, running code there which leaks data. The mitigation is to wipe all the accumulated training from the predictor every time you switch contexts. You can't really fix this in hardware 100%: the CPU needs to be *told* when you're switching from one context to another.

    It's clear what the AMD engineers have done: they're using these "erasures" to their advantage. Every time one is issued it's also a signal to the CPU that you've switched between applications, and so any predictions made with current training are probably wrong. Taking this into account, they've streamlined the process of dumping and retraining. If I were them, I'd even have implemented a cache, so that when I'm told to "wipe" the predictions, I actually *store* them along with probably a tag indicating where they occurred in the address space. Then when I'm told to wipe the predictions again I can look through the cache and try to find a set of trainings that match the current context I've just switched into.

    Suddenly you don't have to re-train the CPU's predictor every time you jump from one process to another. The OS is conveniently telling the CPU that it needs to save and restore the predictor state, so you come back to each app with the saved predictor state ready for action. It's merely a convenient side-effect that this also mitigates the vulnerability. This kinda turns the IBPB (Indirect Branch Prediction Barrier) instruction into a sort of IBSR (Indirect Branch Save and Restore, not to be confused with Indirect Branch Restricted Speculation).

    I suppose you could potentially run into vulnerabilities with cache collisions or with the OS moving one app into the space of another app, but the probability is probably(?) small.I suppose they could just add an extra knob in the microcode that allows the OS to say "hey, I'm moving this app, re-tag your training" or "hey, I'm terminating this app, forget your training."

    I suppose the only way to make this better would be to allow the microcode to write to your hard drive. :P Then it could save the predictor state there and persist it across reboots. It could even be provided by the compiler or from profiling tools so your CPU would never have to learn it itself. Probably not worth the absolutely minuscule increase in performance though. In fact just reading the filesystem might take longer than the initial training.
    Last edited by Developer12; 04 October 2022, 10:11 PM.

    Leave a comment:

Working...
X