Announcement

Collapse
No announcement yet.

ZombieLoad Mitigation Costs For Intel Haswell Xeon, Plus Overall Mitigation Impact

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ZombieLoad Mitigation Costs For Intel Haswell Xeon, Plus Overall Mitigation Impact

    Phoronix: ZombieLoad Mitigation Costs For Intel Haswell Xeon, Plus Overall Mitigation Impact

    With tests over the past week following the disclosure of the Microarchitectural Data Sampling (MDS) vulnerabilities also known as "Zombieload", we've looked at the MDS mitigation costs (and now the overall Spectre/Meltdown/L1TF/MDS impact) for desktop CPUs, servers, and some laptop hardware. I've also begun doing some tests on older hardware, such as some Phoronix readers curious how well aging Intel Haswell CPUs are affected...

    http://www.phoronix.com/scan.php?pag...ombie-Load-Ref

  • #2
    Someone who want's to make an estimate as to world electricity cost based based on efficiency losses?

    Comment


    • #3
      Originally posted by milkylainen View Post
      Someone who want's to make an estimate as to world electricity cost based based on efficiency losses?
      Long-term, it could lead towards adoption of more-efficient micro-architectures, which have minimal reliance on things like branch prediction, speculative execution, and prefetching (all of which waste energy on unnecessary work and on trying to figure out which work to attempt). That would actually be more energy-efficient, as well. Not the most likely outcome, but possible.

      Predicting the energy costs would be complicated by the fact that there's some elasticity in demand (i.e. meaning less aggregate computation, as the cost goes up). Anyway, if you're going to do that, you might as well also estimate the cost of Intel's manufacturing process delays. How much energy could've been saved, if they'd delivered the efficiency and performance gains expected of 10 nm, on schedule? I'd expect we might see numbers in a similar ballpark.

      Comment


      • #4
        Originally posted by coder View Post
        Long-term, it could lead towards adoption of more-efficient micro-architectures, which have minimal reliance on things like branch prediction, speculative execution, and prefetching (all of which waste energy on unnecessary work and on trying to figure out which work to attempt). That would actually be more energy-efficient, as well. Not the most likely outcome, but possible.

        Predicting the energy costs would be complicated by the fact that there's some elasticity in demand (i.e. meaning less aggregate computation, as the cost goes up). Anyway, if you're going to do that, you might as well also estimate the cost of Intel's manufacturing process delays. How much energy could've been saved, if they'd delivered the efficiency and performance gains expected of 10 nm, on schedule? I'd expect we might see numbers in a similar ballpark.
        Totally agree with you. And it would be stupid complex to estimate it.
        It was just a whim on how much world energy gets wasted (and won, as you suggest) by even small "mistakes" in design, like this one.
        In absolute percentages, probably not so much.
        In absolute terms of money from a fraction of total world energy costs, probably a bunch of zeros in that number.
        When you talk global penalties, number tends to get pretty big, even from small mistakes.

        Comment


        • #5
          Originally posted by milkylainen View Post
          Someone who want's to make an estimate as to world electricity cost based based on efficiency losses?
          last year I made a rough estimate on power-savings for the new linux idle loop, considering only cloud-computing data centers for the 2015 year [1]

          based on the same data, power consumption increase would be 215-315MW for 13-19% performances decrease

          Comment


          • #6
            Originally posted by coder View Post
            Long-term, it could lead towards adoption of more-efficient micro-architectures, which have minimal reliance on things like branch prediction, speculative execution, and prefetching (all of which waste energy on unnecessary work and on trying to figure out which work to attempt). That would actually be more energy-efficient, as well. Not the most likely outcome, but possible..
            The problem is, things like branch prediction, speculative execution, etc, exists because it's more efficient and effective way, performance-wise. If done right, that is.

            Comment


            • #7
              Originally posted by t.s. View Post

              The problem is, things like branch prediction, speculative execution, etc, exists because it's more efficient and effective way, performance-wise. If done right, that is.
              Nope. They are not more efficient. Spending that silicon on more classic units is way more energy efficient.
              Eg. Fetch, decode, execute, write-back. The problem is that it is _hard_ to parallelize a problem to maximize the use of multiple execution contexts.
              So while more energy efficient if fed, it is not an easier problem to feed a lot of execution units.
              Ergo the statement that single-threaded performance still outweighs multi-threaded performance to this day.
              That is why the generic CPU building is spending resources on single-threaded speed.

              Or to simplify it for you. If you had one execution unit that executes as fast as four execution units,
              that single unit will handle problems that doesn't parallelize better than the quads. That's why branch prediction and speculation exists.
              Not because it is more energy efficient. But because solving the other problems is way harder.

              That is also why an FPGA or a custom parallel ASIC will beat the living crap out of a generic CPU on problems that can be parallelized.

              Comment


              • #8
                Originally posted by t.s. View Post
                The problem is, things like branch prediction, speculative execution, etc, exists because it's more efficient and effective way, performance-wise. If done right, that is.
                Not that milkylainen 's post really needs further clarification, but I would say those features are simply ways to squeeze more performance out of the chip, but at the expense of disproportionate increases in power consumption. It's like putting a supercharger on a petrol engine.

                As an alternative, cloud operators (the ones seemingly most concerned about these exploits) could run larger numbers of slower, more-efficient cores. The real question is how much single-thread performance they require. That comes down to a question of latency - and whether it ultimately affects user response time. Once the single-thread performance is adequate to meet the latency demands, then the main concern becomes one of energy efficiency.

                What's interesting about the efficiency question is that much in the way that performance gains from manufacturing process advancements have been dropping off, we've also seen declining efficiency gains, as current leakage becomes an ever-bigger problem with each smaller node. In the cloud/datacenter context, this becomes a problem as they continue to scale up. If you blindly extrapolate trends, I wouldn't be surprised to see predictions of datacenters' energy usage outweighing entire cities, at some point. So, something's gotta give.

                This was widely expected to propel ARM and other more trimmed-down, efficient ISAs into the datacenter, though Intel has shown quite some staying power. But, with every new exploit that comes to light, that inertia is probably just a little closer to being overcome. Intel is currently predicting declines in their datacenter stranglehold by up to 20%, this FY. I expect once they lose that marketshare, they'll never regain it. Not with x86, and by the time they have something else, there will be other entrenched players they'll have to uproot.

                Comment


                • #9
                  Originally posted by milkylainen View Post

                  Nope. They are not more efficient. Spending that silicon on more classic units is way more energy efficient.
                  Eg. Fetch, decode, execute, write-back. The problem is that it is _hard_ to parallelize a problem to maximize the use of multiple execution contexts.
                  So while more energy efficient if fed, it is not an easier problem to feed a lot of execution units.
                  Ergo the statement that single-threaded performance still outweighs multi-threaded performance to this day.
                  That is why the generic CPU building is spending resources on single-threaded speed.

                  Or to simplify it for you. If you had one execution unit that executes as fast as four execution units,
                  that single unit will handle problems that doesn't parallelize better than the quads. That's why branch prediction and speculation exists.
                  Not because it is more energy efficient. But because solving the other problems is way harder.

                  That is also why an FPGA or a custom parallel ASIC will beat the living crap out of a generic CPU on problems that can be parallelized.
                  It's a given it's not that simple, but I think those implementation is like searching in programming algorithm, i.e:
                  - classic unit is like sequential search. This, of course need powerful single thread.
                  - branch prediction etc is like binary or interpolation search. Try to do it more efficient, can use less powerful single thread.

                  They try to find for more efficient way to do things.

                  But I could be wrong.

                  Comment

                  Working...
                  X