Ampere Altra Performance Shows It Can Compete With - Or Even Outperform - AMD EPYC & Intel Xeon

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • PerformanceExpert
    Senior Member
    • May 2020
    • 391

    #41
    Originally posted by cb88 View Post

    You mean compete with CPUs launched August 2019... EPYC Milan is right around the corner also.

    Also the margins on this chip are probalby not good compared to EPYC as it's a big monolithic die (which does contribute to it's performance I imagine but also will push up the price of the chip).
    Neoverse-N1 was launched back in 2019 too, and Altra Max with 128 cores is around the corner as well. And Neoverse N2/V1 should be out next year too. So it doesn't look like EPYC will ever regain the performance crown again.

    In terms of cost, TSMC 7nm yield is above all expectations and a ~350-400mm^2 die will be cheaper than ~1100mm^2 with complex wiring to connect 9 dies.

    Comment

    • markg85
      Senior Member
      • Oct 2007
      • 509

      #42
      I don't think you've seen the max the hardware is capable of yet.
      I don't have some inside knowledge or something, it's just a hunch based on an educated guess.

      Think of it this way.
      Nearly all the software that's tested still is running mainly in x86 cpu's and made by developers with that architecture.
      The fact that it's this competitive is darn impressive to say the least! But i'm willing to bet that much of the software is far optimized for ARM.
      To give you an example. Making something scalable on a couple of threads is already a challenge. Making something scalable on 80 cores is a whole new beast to tackle. In benchmarks where scalability testing is done for multicore tuned software you often see it scale quite well in the 8-12 core range after which adding more cores doesn't make it (much) faster anymore. This isn't true for everything but in general it is. Thus there is a lot of room for optimization here.

      What would be interesting to see is the per-core cpu utilization. For example for the LZ4 compression.
      Even then 100% cpu usage on all cores doesn't mean there isn't room for improvement. Lets not forget that while(true){} is also 100% by which i only mean to say that the CPU usage itself - while a good indicator - doesn't tell the whole story either.

      Comment

      • PerformanceExpert
        Senior Member
        • May 2020
        • 391

        #43
        Originally posted by cb88 View Post

        This is dead wrong, this chip is performing basically identically to what a monolithic version of EPYC would while also having higher cost to manufacture.... this chip is available now, but EPYC 7742 has been on the market for over a year, and Milan is about to launch, monolithic server chips even ARM chips are old hat at this point. Also there is the rumors around Milan that it has the same increases as Zen 3 on the desktop, high IPC and higher clocks.

        The idea that the ARM architecture is more energy efficient has been laughable for about a decade now, you need to implement all of the power hungry architectural features that x86 has to performa at this level and the instruction set for ARM is just as ugly as x86 is at this point.

        The problem with your statement is ARM doesnt' scale any better than x86 does.... it scales *LOWER* but it doesn't scale higher.
        No, these efficient monolitic Arm designs show how inefficient chiplets are. Basically it is beating EPYC at a fraction of the power, silicon area and cost. Neoverse N1 has been available for over a year as well (eg. Graviton 2). And Altra Max is about to launch too - Milan won't stand a chance when we compare 128 SMT threads with 128 real cores...

        At this point it is not only obvious Arm scales higher than x86, but that x86 has no chance of keeping up. From now on the fastest servers in the world are Arm based. You can see the industry shifting big-time to Arm (most recently Twitter moved to Graviton 2).

        Comment

        • Michael_S
          Senior Member
          • Aug 2011
          • 1296

          #44
          Originally posted by s_j_newbury View Post

          x86 was always legacy. It's not a co-incidence you can run 8086/8 code on modern x86 CPUs. It was always the "unique selling point", virtually every other architecture has had ABI breaks where legacy cruft was removed. While it hasn't completely held it back, it has shaped the development and the ethos of the design. Remember, even Intel wanted to leave it behind via IA64.
          Thank you.

          My understanding is that for years, maybe more than ten years, x86 CPUs are actually some kind of RISC processor internally that exposes an x86/x86_64 compatibility layer. So the underlying processor architectural differences between a modern Intel Xeon, AMD EPYC, or this Ampere Altra might even be tiny.

          I may be wrong about that, but if I'm right then the efficiency overhead of the x86/x86_64 compatibility layer might be enough to hurt x86's competitiveness a little. But as others pointed out, this performance put the brand new Ampere Altra against an Intel Xeon from Q2 2019 and an AMD EPYC from Q3 2019. Intel and AMD generational improvements on performance-per-watt might put them in the lead now, or soon. And then the Altra 2 can retake the crown in 2022 or 2023.

          This ARM server chip is a nice development, but don't hold the funeral for x86 yet.

          Comment

          • PerformanceExpert
            Senior Member
            • May 2020
            • 391

            #45
            Originally posted by Space Heater View Post
            Frankly I think it's insulting to the architects to pretend that the choice of ISA is what is primarily driving improvements in efficiency and performance.
            It's insane to claim ISA doesn't matter. x86 isn't in any way modern or efficient and requires a significant amount of extra design effort and area just to maintain compatibility and implement all the complex x86 quirks. Having a modern streamlined ISA makes a huge difference, and we can see this everywhere: at the low end where Atom had no chance of competing in mobile phones, with laptops and desktops (hint: M1), and at the high end with much more efficient Arm servers and super computers. A tiny company like Arm is able to out-design Intel and AMD together with 2-4 high-end new microarchitectures each year. How is that even possible?

            At some point you have got to ask yourself: are Intel/AMD CPU designers totally incompetent, or does ISA actually matter?
            Last edited by PerformanceExpert; 16 December 2020, 09:51 AM.

            Comment

            • Dr. Righteous
              Senior Member
              • Nov 2015
              • 122

              #46
              The ancient struggle between CISC and RISC lives on! In modern terms at least.

              Comment

              • AmericanLocomotive
                Senior Member
                • Aug 2017
                • 216

                #47
                Originally posted by PerformanceExpert View Post

                No, these efficient monolitic Arm designs show how inefficient chiplets are. Basically it is beating EPYC at a fraction of the power, silicon area and cost. Neoverse N1 has been available for over a year as well (eg. Graviton 2). And Altra Max is about to launch too - Milan won't stand a chance when we compare 128 SMT threads with 128 real cores...
                Might want to recheck your power consumption figures. The only test where the Ampere used a "fraction" of the power was during the PHP ST test, and I don't think someone buys a 128 core EPYC server to run a single-thread PHP load. The actual "energy needed to complete a task" of the Amperer system in highly-threaded loads wasn't hugely different. As it stands, the Altra's efficiency and performance advantage over Zen 2 Epyc (by Ampere's own claims) are less than the Zen 2 > Zen 3 improvement.
                At this point it is not only obvious Arm scales higher than x86, but that x86 has no chance of keeping up. From now on the fastest servers in the world are Arm based. You can see the industry shifting big-time to Arm (most recently Twitter moved to Graviton 2).
                ...which is why AMD has already secured major super-computing wins with Zen 3 and Zen 4 chips, right? I would assume the people at Cray know what they are doing.



                Comment

                • PerformanceExpert
                  Senior Member
                  • May 2020
                  • 391

                  #48
                  Originally posted by Michael_S View Post
                  My understanding is that for years, maybe more than ten years, x86 CPUs are actually some kind of RISC processor internally that exposes an x86/x86_64 compatibility layer. So the underlying processor architectural differences between a modern Intel Xeon, AMD EPYC, or this Ampere Altra might even be tiny.
                  No that is wrong. All modern OoO cores translate instructions into micro-ops, but those micro-ops are still very similar to the original ISA. ISA differences are pervasive throughout the whole CPU - kind of obvious when you think of ISA specifics like flag setting, semantics of instructions, special registers etc. So you can never replace say the x86 frontend in one CPU with say an Arm frontend in another.

                  I may be wrong about that, but if I'm right then the efficiency overhead of the x86/x86_64 compatibility layer might be enough to hurt x86's competitiveness a little. But as others pointed out, this performance put the brand new Ampere Altra against an Intel Xeon from Q2 2019 and an AMD EPYC from Q3 2019. Intel and AMD generational improvements on performance-per-watt might put them in the lead now, or soon. And then the Altra 2 can retake the crown in 2022 or 2023.

                  This ARM server chip is a nice development, but don't hold the funeral for x86 yet.
                  This "brand new" 9 month old Ampere Altra uses Neoverse N1/Cortex-A76 microarchitecture which is 2 years old now and has been used in Graviton 2 for a year. Next year we'll get the 128-core Altra Max which will beat Milan. Plus Neoverse N2/V1, up to 192 cores on 5nm and with DDR5. Neither AMD nor Intel have anything similar for at least 2-3 years. So I don't see either regaining the performance crown again. It's not the funeral for x86 yet but certainly the beginning of the end.

                  Comment

                  • PerformanceExpert
                    Senior Member
                    • May 2020
                    • 391

                    #49
                    Originally posted by AmericanLocomotive View Post
                    Might want to recheck your power consumption figures. The only test where the Ampere used a "fraction" of the power was during the PHP ST test, and I don't think someone buys a 128 core EPYC server to run a single-thread PHP load. The actual "energy needed to complete a task" of the Amperer system in highly-threaded loads wasn't hugely different. As it stands, the Altra's efficiency and performance advantage over Zen 2 Epyc (by Ampere's own claims) are less than the Zen 2 > Zen 3 improvement.

                    ...which is why AMD has already secured major super-computing wins with Zen 3 and Zen 4 chips, right? I would assume the people at Cray know what they are doing.
                    Eh, did you not notice the 71% higher perf/Watt on Coremark or 60% on Conjugate Gradient? Single-threaded results are irrelevant in servers, so the other power results aren't interesting indeed (neither are single-threaded benchmarks!). I'm hoping AnandTech will publish SPEC scores with perf/Watt results soon, but it seems very unlikely Zen 3 will be able to catch up.

                    Most of the US super computer wins are political, not wins on actual performance. And most x86 supercomputers are just PCIe accelerators. Not so for the #1 supercomputer which is Arm based and has no accelerators (and it is also the most efficient supercomputer btw).
                    Last edited by PerformanceExpert; 16 December 2020, 10:38 AM.

                    Comment

                    • AdrianBc
                      Senior Member
                      • Nov 2015
                      • 292

                      #50
                      Originally posted by pal666 View Post
                      arm is neither modern nor well designed. it was "cheap and dirty" design from 80s, not much newer than x86, its main feature was "low number of transistors"
                      A few others have also replied to this, but I want to state it more clearly.

                      The 64-bit ARM Instruction-Set Architecture, a.k.a. ARMv8, has very little in common with the 32-bit traditional ARM ISA, except the name.

                      Unlike the 64-bit AMD/Intel or POWER or MIPS or SPARC ISAs, the 64-bit ARM ISA is not an extended version of the previous 32-bit ISA, but a new ISA.

                      Because they were not constrained by backward-compatibility, like the others, the 64-bit ARM ISA is both modern and well-designed.

                      For now, there is nothing better than it that also has acceptable support for tools like compilers etc.

                      The only problem with ARM is the pending NVIDIA acquisition, because NVIDIA competes with other users of ARM cores, so it is not clear if in the future they will continue to provide ARM cores with higher performance to their competitors.












                      Comment

                      Working...
                      X