Ampere Altra Performance Shows It Can Compete With - Or Even Outperform - AMD EPYC & Intel Xeon

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • AmericanLocomotive
    Senior Member
    • Aug 2017
    • 227

    #71
    We also can't forget that less than ~10 years ago, people were saying that ARM would never compete with big high-end x86 cores.

    The answer is that there simply just wasn't a huge push for ARM in the high-end space, just like there was never really a big push or demand for x86 in the ultra low power space. With Intel's extremely high enterprise pricing, lack of meaningful perf/watt and overall perf improvements for 6-7 years, and complete lack of competitiveness from AMD - there was suddenly a huge drive for ARM to move up market.

    There's a very select group of people in the world that truly know if the current x86 efficiency "deficiencies" are an inherent result of x86's x86-ness or just the current architecture implementations being lacking. I doubt any of them would publicly post what they know.

    EDIT: I also wanted to add that a lot of ARM's massive improvements over the past few years has to do with getting access to modern, high-end cutting-edge process nodes. Mainly thanks to TSMC finally pulling ahead of Intel with their really unbeatable 7nm node.
    Last edited by AmericanLocomotive; 16 December 2020, 04:51 PM.

    Comment

    • coder
      Senior Member
      • Nov 2014
      • 8964

      #72
      Originally posted by Michael_S View Post
      All this makes it surprising that AMD abandoned their own ARM server architecture plans. Though maybe that was just due to financial problems, and we might see an AMD ARM announcement in a few years.
      AMD abandoned their ARM-based server CPUs, because they got ahead of the market and were starved for cash. So, they put all their eggs in the Zen basket, and it's paid off well for them.

      No reason they can't still enter the ARM market (or RISC-V, for that matter), although it'll be tougher to out-design ARM's own cores, at this point.

      Comment

      • coder
        Senior Member
        • Nov 2014
        • 8964

        #73
        Originally posted by coats View Post
        We're at the point where the decoding of variable-length complex-encoding instructons is the real bottleneck for x86[_64] performance: no one has managed more than 4-wide decode using a reasonable amount of silicon and reasonable power dissipation.
        Why wouldn't decoding scale linearly in area and power? Granted, the simpler your decoder can be, the better.

        ARM's other advantages seem to lie in its larger GPR size and perhaps its relaxed memory-ordering. Maybe also the fact that its scalar FP isn't a bolt-on to its vector FP, like all of the non-x87 extensions to x86 have done? (edit: and almost nobody uses x87, any more.)
        Last edited by coder; 16 December 2020, 05:49 PM.

        Comment

        • coder
          Senior Member
          • Nov 2014
          • 8964

          #74
          Originally posted by PerformanceExpert View Post
          No that's total rubbish. In both cases distros target the base architecture which is ARMv8.0-A for AArch64, so none of the many extensions are enabled.
          Comparing baseline x86-64 with Cascade Lake ISA support (i.e. AVX-512) is not even on the same planet as ARMv8.0-A vs ARMv8.2-A, unless we're also talking about SVE (but we're not).



          If you want to be taken seriously, start by not making claim so ridiculous on their face!

          Originally posted by PerformanceExpert View Post
          This is simply the way all software works (and has always worked). You cannot ship binaries that rely on the latest features since they don't work on every CPU (you can add runtime checks of course but that is only worth it in specific cases).
          Someone doing heavy-duty machine learning or HPC workloads is absolutely going to recompile their libraries to make full use of their CPUs.

          Comment

          • coder
            Senior Member
            • Nov 2014
            • 8964

            #75
            Originally posted by edwaleni View Post
            Is this ARM CPU the result of incredible design or simply the next step in its evolution?
            It's the initial point of divergence between mobile-oriented and server-oriented cores, for ARM. Proir to that, ARM cores have been mobile-first, prizing efficiency over raw performance.

            Now the N-series of cores (of which this uses the N1), is still supposed to strike a balance between power, performance, and area. The upcoming V-series is to be a truly performance-first design.

            Comment

            • PerformanceExpert
              Senior Member
              • May 2020
              • 391

              #76
              Originally posted by Space Heater View Post
              Not being the primary driver of performance and efficiency improvements is not the same as "doesn't matter".
              I'm not sure you meant it that way, but that's often implied when people make claims like that. There is no doubt ISA has a significant impact on the complexity of a CPU and thus PPA. Intel's huge process advantage in the past used to mask the x86 tax, but that has evaporated, and we can now directly compare Arm and AMD cores on the same process.

              Comment

              • PerformanceExpert
                Senior Member
                • May 2020
                • 391

                #77
                Originally posted by AmericanLocomotive View Post
                "Wittich said the Ampere chip is 14% better than AMD’s fastest Epyc chip on power efficiency and 4% faster on raw performance." - That's from Ampere's Senior's VP of products, and that matches up with those slides.
                So you prefer to use marketing numbers rather than actual measured power efficiency? Really? Is it because real power results are too good to contemplate?

                Those slides and quotes were projections since there was no hardware available at the time. Even that Watt/core number is a useless maxTDP / #cores value! The Phoronix results suggest Altra draws significantly below TDP, eg. the Conjugate Gradient runs at an average of 72% of max TDP (vs 93% for EPYC).

                Fugaku was never the #1 in the Green 500. That system you linked is not Fugaku, but a much smaller and lower clocked system using the same processors.
                That's grasping straws - it's the same CPU, just at a slightly lower frequency. The efficiency differs only by 9.5% - a tiny fraction of the efficiency difference between typical CPU bins.

                Comment

                • rajcina12
                  Junior Member
                  • Jan 2018
                  • 13

                  #78
                  Do I understand it correctly that the power draw figures are drawn via software from internal sensors,
                  implemented by 3 different vendors and therefore might have completely different semantics or scales? Or is there a standard these sensors implement?

                  Comment

                  • coder
                    Senior Member
                    • Nov 2014
                    • 8964

                    #79
                    Originally posted by AmericanLocomotive View Post
                    there was never really a big push or demand for x86 in the ultra low power space.
                    Whoa there. You grossly underestimate Intel's desire to push x86 into every possible corner of the computing universe!

                    Intel sunk $Billions into the phone and tablet market, including going so far as to subsidize their SoCs, substantially. They finally cut their losses, on this failed endeavor, in 2016:

                    https://www.anandtech.com/show/10288...socs-cancelled

                    Here's one of the most high-profile (if not also best) x86-based phones to result from that effort:

                    https://www.anandtech.com/show/9251/...nfone-2-review

                    And as for even lower-power, they tried to push x86 into IoT with the Quark product line, along with the Edison compute modules:

                    https://www.anandtech.com/show/7305/...r-tiny-devices
                    https://www.anandtech.com/show/8511/...m-now-shipping
                    https://www.anandtech.com/show/13888...crocontrollers

                    Comment

                    • PerformanceExpert
                      Senior Member
                      • May 2020
                      • 391

                      #80
                      Originally posted by coder View Post
                      Comparing baseline x86-64 with Cascade Lake ISA support (i.e. AVX-512) is not even on the same planet as ARMv8.0-A vs ARMv8.2-A, unless we're also talking about SVE (but we're not).



                      If you want to be taken seriously, start by not making claim so ridiculous on their face!

                      Someone doing heavy-duty machine learning or HPC workloads is absolutely going to recompile their libraries to make full use of their CPUs.
                      If you want to be taken seriously, stop making baseless claims. If anything Arm has the disadvantage here. Many Phoronix benchmarks have heavily optimized code paths for x86 but not for AArch64 (there is only one example where the reverse is true). Even when compiling for a base architecture these libraries can automatically run AVX512 code if the CPU supports it.

                      People who want the ultimate performance will obviously recompile and optimize their code. It would be interesting to see how results change for different settings, however there is no reason to believe the results would be radically different.

                      Comment

                      Working...
                      X