Ampere Altra Performance Shows It Can Compete With - Or Even Outperform - AMD EPYC & Intel Xeon

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Nelson
    Junior Member
    • Feb 2014
    • 42

    #51
    Originally posted by PerformanceExpert View Post

    It's insane to claim ISA doesn't matter. x86 isn't in any way modern or efficient and requires a significant amount of extra design effort and area just to maintain compatibility and implement all the complex x86 quirks. Having a modern streamlined ISA makes a huge difference, and we can see this everywhere: at the low end where Atom had no chance of competing in mobile phones, with laptops and desktops (hint: M1), and at the high end with much more efficient Arm servers and super computers. A tiny company like Arm is able to out-design Intel and AMD together with 2-4 high-end new microarchitectures each year. How is that even possible?

    At some point you have got to ask yourself: are Intel/AMD CPU designers totally incompetent, or does ISA actually matter?

    I think there were two main drivers for mobile adoption: cost and the ability to make custom packages. After a few devices took off the network effect took over and the risk of making an architecture change against the whole ecosystem (think tooling) was too big. No way Samsung or Apple would miss a generation of phones to switch, they are printing money as it is.

    A simple ISA absolutely helps though. The way Intel and amd have added to x86, almost haphazardly, I’m sort of surprised they haven’t published a set of choice existing instructions with fixed sizes, decodes and alignments and started to encourage the industry to use only those with the expectation that they would add a mode to accelerate them or exclude others in the future. They had recommendations in the p5/p6 era that boosted performance >10%. It’s possible they are stuck in big business brain haze, but I suspect there are other demons in x86 that eat power and are hard to remove.

    intel has been insanely good over the decades, I wouldn’t count them out, but the most shocking thing here is this isn’t a custom core, it’s an off-the-shelf design that ARM licenses that is hanging with AMD’s and Intel’s crown Jewels. Custom accelerators is where the giant performance will be and now you can see some compelling ways to get there. Rumor is AMD has been working ARM core designs too. Intel needs a really big hit in the next year or two to keep x86 on top.

    Comment

    • PerformanceExpert
      Senior Member
      • May 2020
      • 391

      #52
      Originally posted by coder View Post
      You already have at least 4 options:
      1. Amazon AWS Graviton2 cloud instance.
      2. Qualcomm 8cx or 7c-based laptops
      3. Ampere eMAG 32-core workstation: https://www.anandtech.com/show/15733...64-workstation
      4. Huawei's Kunpeng 920-based 24-core desktop: https://www.notebookcheck.net/Huawei....485582.0.html
      Plus of course the Arm Apple laptops (they can run Linux and Windows).

      Comment

      • s_j_newbury
        Senior Member
        • Feb 2013
        • 555

        #53
        Originally posted by AdrianBc View Post
        The only problem with ARM is the pending NVIDIA acquisition, because NVIDIA competes with other users of ARM cores, so it is not clear if in the future they will continue to provide ARM cores with higher performance to their competitors.
        The elephant in the room...

        Comment

        • Michael_S
          Senior Member
          • Aug 2011
          • 1296

          #54
          Originally posted by PerformanceExpert View Post

          No that is wrong. All modern OoO cores translate instructions into micro-ops, but those micro-ops are still very similar to the original ISA. ISA differences are pervasive throughout the whole CPU - kind of obvious when you think of ISA specifics like flag setting, semantics of instructions, special registers etc. So you can never replace say the x86 frontend in one CPU with say an Arm frontend in another.



          This "brand new" 9 month old Ampere Altra uses Neoverse N1/Cortex-A76 microarchitecture which is 2 years old now and has been used in Graviton 2 for a year. Next year we'll get the 128-core Altra Max which will beat Milan. Plus Neoverse N2/V1, up to 192 cores on 5nm and with DDR5. Neither AMD nor Intel have anything similar for at least 2-3 years. So I don't see either regaining the performance crown again. It's not the funeral for x86 yet but certainly the beginning of the end.
          I stand corrected.

          And it makes sense that Amazon and Apple wouldn't invest big in ARM unless the long term picture relative to x86_64 looked very good.

          All this makes it surprising that AMD abandoned their own ARM server architecture plans. Though maybe that was just due to financial problems, and we might see an AMD ARM announcement in a few years.

          Comment

          • Michael_S
            Senior Member
            • Aug 2011
            • 1296

            #55
            Originally posted by PerformanceExpert View Post

            Plus of course the Arm Apple laptops (they can run Linux and Windows).
            But not on the bare metal, as far as I understand it. Maybe the virtualization in the M1 will let you get bare metal performance out of your Linux and Windows VMs on an Apple ARM device, but this is a Linux enthusiast site so at least some of us, ahem, wouldn't be interested in anything with an M1 unless we could run Linux on the bare metal.

            Comment

            • Weasel
              Senior Member
              • Feb 2017
              • 4501

              #56
              Originally posted by Nelson View Post
              A simple ISA absolutely helps though.
              A simple ISA is like caring for saving 1 MB of space when you have 1 TB of total disk space.

              Originally posted by Nelson View Post
              The way Intel and amd have added to x86, almost haphazardly, I’m sort of surprised they haven’t published a set of choice existing instructions with fixed sizes, decodes and alignments and started to encourage the industry to use only those with the expectation that they would add a mode to accelerate them or exclude others in the future. They had recommendations in the p5/p6 era that boosted performance >10%. It’s possible they are stuck in big business brain haze, but I suspect there are other demons in x86 that eat power and are hard to remove.
              They actually went with laxer alignment requirements with AVX2 and up for most instructions. You guys are always "surprised", keep smoking the crap stuff.

              What you think matters is not what reality is. Also variable length instructions are a perk (not µops). This is why ARM will always be inferior.

              Comment

              • coats
                Junior Member
                • May 2008
                • 31

                #57
                Originally posted by PerformanceExpert View Post

                It's insane to claim ISA doesn't matter. x86 isn't in any way modern or efficient and requires a significant amount of extra design effort and area just to maintain compatibility and implement all the complex x86 quirks....
                We're at the point where the decoding of variable-length complex-encoding instructons is the real bottleneck for x86[_64] performance: no one has managed more than 4-wide decode using a reasonable amount of silicon and reasonable power dissipation. OTOH, the fixed-width ARM instructions is much friendlier to wide decoding: some have attributed a substantial fraction of M1's performance to its 8-wide decode.

                FWIW


                Comment

                • tuxd3v
                  Senior Member
                  • Nov 2014
                  • 1731

                  #58
                  Originally posted by coder View Post
                  He specifies the compilation options in the image captions of the plots, but some of the tests, like the TNN deep learning benchmark, do not use -march=native. This should mean that x86 is only using SSE, instead of AVX2 or AVX-512. That would put them at a huge disadvantage.
                  That is a true reality, AMD64 distros ship with unoptimized software to take advantage of amd64 arch..
                  Meanwhile ARM64, is compiled for ARM64 taking advantage of ARM64 features..

                  Comment

                  • tuxd3v
                    Senior Member
                    • Nov 2014
                    • 1731

                    #59
                    Originally posted by Michael_S View Post
                    I stand corrected.

                    And it makes sense that Amazon and Apple wouldn't invest big in ARM unless the long term picture relative to x86_64 looked very good.

                    All this makes it surprising that AMD abandoned their own ARM server architecture plans. Though maybe that was just due to financial problems, and we might see an AMD ARM announcement in a few years.
                    The reality is that ARM64 exists because AMD helped ARM to create ARM64, in the time when AMD was thinking ingoing ARM..

                    Comment

                    • AmericanLocomotive
                      Senior Member
                      • Aug 2017
                      • 225

                      #60
                      Originally posted by PerformanceExpert View Post
                      Eh, did you not notice the 71% higher perf/Watt on Coremark or 60% on Conjugate Gradient? Single-threaded results are irrelevant in servers, so the other power results aren't interesting indeed (neither are single-threaded benchmarks!). I'm hoping AnandTech will publish SPEC scores with perf/Watt results soon, but it seems very unlikely Zen 3 will be able to catch up.
                      Don't you think Ampere would be bragging a teeny bit more about their performance/watt and total performance advantage if every benchmark reflected those two?



                      Zen 3 will close that "performance per rack" metric easily, and the efficiency gains will bring it essentially exactly in line.
                      Most of the US super computer wins are political, not wins on actual performance. And most x86 supercomputers are just PCIe accelerators. Not so for the #1 supercomputer which is Arm based and has no accelerators (and it is also the most efficient supercomputer btw).
                      Fugaku isn't anywhere even close to the most efficient super computer. Not even by a long shot. It's marginally more efficient than Summit, a PowerPC and nvidia Tesla accelerated system that came out 2 years earlier.

                      The current most efficient super computing systems are almost all AMD + nvidia systems.

                      Comment

                      Working...
                      X