Announcement

Collapse
No announcement yet.

With AMD Zen 4, It's Surprisingly Not Worthwhile Disabling CPU Security Mitigations

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by david-nk View Post

    That doesn't explain why browser performance is down by 40%. The browser shouldn't spend enough time in kernel mode for any kernel performance differences to make much of an impact. That definitely requires further investigation.
    I agree that the hardware probably expect mitigations in software and performs worse, when they are not there.
    The browser performance could be maybe be explained by the browser itself is doing mitigations in userspace, if it detects it is disabled in the kernel? 🤔

    Comment


    • #22
      Originally posted by david-nk View Post

      That doesn't explain why browser performance is down by 40%. The browser shouldn't spend enough time in kernel mode for any kernel performance differences to make much of an impact. That definitely requires further investigation.
      How is browser performance down by 40%? Versus mitigations being off on the Selenium tests? Take that with a grain of salt translating that to any practical performance on even Chrome let alone Firefox.

      With browsers you have one nearly complete machine either interpreting or performing a JIT on untrusted code that it then passes on to the operating system. The OS then takes that and tells the hardware what to do. A lot of mitigations go into preventing that untrusted code from doing anything nefarious to the rest of the system, so it's not surprising there's operations common to all the browsers and in the kernel and system libraries that AMD is optimizing at the processor level.

      When you know you're going to reach Z by path X, then it makes sense to review the path X takes in your hardware then optimize X's road surface as much as realistically possible. Known question always results in known answer via proven path. Proven path has certain inefficiencies. Remove inefficiencies. Proven path is then traversed more quickly, say 2 instructions rather than 4 or 5.

      It's really not that much different in concept going from a CPU that can only do addition to adding a path that can manage multiplication. Both manage to reach the same conclusion, but the CPU with multiplication can do it in 1 instruction rather than N number.

      While I think it would be worthwhile to find out exactly what the details are with the code paths that have been optimized, I don't find the result all that surprising given a 30k ft view of what's going on in browsers since Spectre was discovered.

      Like I said earlier, it's impressive work. I just wonder if the optimizations are robust, or if they're brittle - that is whether they break less easily or more easily given unexpected circumstances.
      Last edited by stormcrow; 30 September 2022, 05:12 PM.

      Comment


      • #23
        Originally posted by ssokolow View Post

        Maybe they tuned the branch predictor to expect the patterns embodied by the mitigations. Mispredictions could explain it being worse when you turn them off.
        They did. They now calculate 2 predictions per cycle, something that would easily help here when going to the safe branch that was now calculated already. They also increased the btb size in both L1 and L2

        Comment


        • #24
          Could this baffling performance be at least in part because linux or other software no longer expect to run without mitigations and thus that option has become under optimized?

          Comment


          • #25
            Originally posted by JosiahBradley View Post

            They did. They now calculate 2 predictions per cycle, something that would easily help here when going to the safe branch that was now calculated already. They also increased the btb size in both L1 and L2
            Isn't defeating the branch predictor the intended effect of the mitigations? If it's predicting through the mitigations, does that mean that the data leak has possibly been reintroduced?

            Also, this isn't really a case of mitigations having ~zero cost. It's a case of mitigations improving performance by large margins in several benchmarks and small margins in some more, and hurting performance by large margins in enough others that the average is close to zero. That needs to be explained.

            I guess Selenium W.i is WASM imageConvolute, which is presumably some kind of streaming DSP workload, and the others with high margins are also DSP. They all sound like nearly pure userspace benchmarks, which have been largely insensitive to mitigations until now.

            Wild-ass guess: It's speculative store bypass disable. The affected workloads are bandwidth-bound at some level of the cache hierarchy, and incorrectly speculated loads waste slots in the bottleneck. If that's correct, maybe speculative store bypass was discovered before the design of this CPU was finalized, and the SSB-enabled case was not given very much die area/engineering effort.

            ​

            Comment


            • #26
              Originally posted by yump View Post

              Isn't defeating the branch predictor the intended effect of the mitigations? If it's predicting through the mitigations, does that mean that the data leak has possibly been reintroduced?

              ​
              My thoughts exactly…

              Also, didn't this happen with an Intel processor before?

              Comment


              • #27

                Like it or not, these mitigations are VERY strong signals to the CPU as to what the software will do next. And CPU designers looking for optimizations LOVE such signals.​

                It's not that hard to imagine that AMD have tuned their design and made optimizations _expecting_ the usage patterns that they themselves recommend. If the CPU is tuned to *expect* a memory barrier at specific points and you don't issue one, then you do so at your detriment.

                If I were to hypothesize, one possibility is that issuing a memory barrier tells the CPU that it can forget about past state, and that frees up microarchitectural registers to prefetch the next bit of processing. A simple optimization *built on the assumption that mitigations=on.* It could also, ironically, be a very helpful signal to prediction and prefetch when done correctly.
                Last edited by Developer12; 30 September 2022, 09:05 PM.

                Comment


                • #28
                  Originally posted by milkylainen View Post
                  Nope. Not believing that more code and worse code is going to do better than them removed.
                  Something is fishy here.
                  I agree something just doesn't add up. The idea that AMD secretly in-built the mitigations into the Zen 4 in such a way that disabling them at the software level (mitigations=off) hurts perf could be one possible answer. In the event that yet another variant or vulnerability were to come down the pipe, it would be an endless game of cat and mouse.

                  It's a crazy theory but what if AMD has found that in-built mitigation at the CPU hardware level turned out to be a more elegant solution that a software / OS solution a la mitigations=true? Someone has got to reach out to AMD and satisfy all of our curiosity. We are all agreeing that this makes zero sense but the answer could be much simpler than we realize.

                  Of course, how will it compare to Intel's 13th gen? Will Intel have baked in the mitigations at the CPU level to advantage themselves as it seems AMD may have? Will it produce the same net negative result via mitigations=off? Man, I am freaking curious about this whole thing now.

                  Come AMD, don't be coy. Out with it.

                  Comment


                  • #29
                    Originally posted by GunpowderGuy View Post
                    Could this baffling performance be at least in part because linux or other software no longer expect to run without mitigations and thus that option has become under optimized?
                    Everyone seems to have a theory on what this is. I can only speculate (pun intended) that maybe AMD found an elegant solution and just baked it into the CPU design. Until Lisa herself says it, AMD probably will keep hush hush. I'm going with they baked it in somehow. Occam's razor​?

                    Comment


                    • #30
                      Originally posted by yump View Post
                      Isn't defeating the branch predictor the intended effect of the mitigations? If it's predicting through the mitigations, does that mean that the data leak has possibly been reintroduced?
                      So, if the Zen4 branch-predictor turns out to be leaky, what that means is we now need yet another layer to fool the new generation of predictors! That will definitely have a cost on this generation of hardware, and yet further costs on older hardware.

                      This suggests that effectively optimizing away mitigations is not as wise an approach as relying on software to detect they're unnecessary and automatically disabling them.

                      Comment

                      Working...
                      X