Ampere Altra Performance Shows It Can Compete With - Or Even Outperform - AMD EPYC & Intel Xeon

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • coder
    Senior Member
    • Nov 2014
    • 8863

    Originally posted by PerformanceExpert View Post
    It's not just smaller but also faster. The Neoverse N1 core is about a third of the size of a Zen 2 core and total silicon area of EPYC 7742 is ~1100mm^2 vs ~350-400mm^2 for Altra (so also about a third).
    Zen2 has 256-bit AVX2 pipelines & registers, though.

    Originally posted by PerformanceExpert View Post
    Yet the Altra beats the 7742 on raw performance.
    Based on what the fact that it manages to win 10 out of the 21 weird smattering of benchmarks (with sometimes odd compiler options) shown here?
    Last edited by coder; 17 December 2020, 10:50 PM.

    Comment

    • coder
      Senior Member
      • Nov 2014
      • 8863

      Originally posted by PerformanceExpert View Post
      These results were measured using stock installs and settings, thus relevant to what most people would see. You can always tweak and tune further but that was not the goal of this comparison.
      The goal of benchmarking should be to characterize real world performance. If we're talking about a web browser benchmark, then using the stock distro build makes sense, since that's what 99% of users will do. However, if you're measuring HPC or machine learning, then absolutely use -march=native, since anyone seriously doing either will do essentially that.

      Comment

      • coder
        Senior Member
        • Nov 2014
        • 8863

        Originally posted by pal666 View Post
        what word is difficult for you?
        All of them, when assembled in this sequence:
        x86_64 also didn't retain full binary compatibility, they are even on this metric
        What compatibility did it not retain?

        Originally posted by pal666 View Post
        can x86 code be successfully run in x86-64 mode? no.
        What does that have to do with anything?

        Comment

        • coder
          Senior Member
          • Nov 2014
          • 8863

          Originally posted by PerformanceExpert View Post
          However the standard benchmark is too simplistic and overstating the benefit of GPUs. Fujitsu's goal was to make programming and optimization easier and enable much higher efficiencies on the many HPC codes that don't work well on GPUs.
          I'm not saying which benchmark -- just that benchmarks are the best way to compare its efficiency. The choice of benchmark depends on what you want to do with it.

          Comment

          • Dukenukemx
            Senior Member
            • Nov 2010
            • 1388

            Originally posted by coder View Post
            Technically true, but now it just feels like you're trying to move the goalposts. So, what is it you're really after? I thought you just wanted a way to code for ARM other than on a cellphone or Pi.
            The goal is to let the average person use ARM and learn to work with it. You could do it on a RPI or cellphone but that's harder than using a desktop x86 machine that has standards. This is why I hate Android because everyone has their own boot loader for the OS. iOS is even worse. You need a desktop machine that anyone can buy that can run applications that don't depend on 80+ cores. The Apple M1's don't count because they cost a fortune and only adhere to Apple standards.

            Comment

            • Siuoq
              Senior Member
              • May 2013
              • 126

              Originally posted by coder View Post
              It did actually win a majority, but some of the benchmarks it won are variations of each other.

              The thing that often arouses my suspicion is how the benchmarks featured in these articles are chosen. PTS has thousands of test cases, yet there are only about 25 or so that feature here.
              Yes, there is a huge correlation between some benchmarks, which makes the result -- the geometric mean of all tests -- practically garbage.

              I don't know what will be a better method tho. ( Isn't there a "g factor" or something, that correlates with the test results? Like between different mental tasks. Idk, I am dumb in statistics, and also never learnt it. )

              Comment

              • kieffer
                Phoronix Member
                • Nov 2015
                • 63

                Originally posted by coder View Post
                What's weird about this claim that "Arm scales higher than x86" is that it's so filled with caveats. For one thing, Altra has separate NUMA domains, which AMD specifically rejected, in their 7002 generation. So, if you really need a lot of cores because you have a heavy workload that requires lots of communication, then symmetric is probably still the way to go. However, if you're just making a cloud/density play, and plan to partition up with lots of VMs or containers that each fit in a single NUMA domain, then this NUMA approach is better.
                AMD Advises NSP4 and L3 as NUMA domain in their documentation for tuning Epyc Rome for HPC: https://developer.amd.com/wp-content.../56827-1-0.pdf

                Comment

                • juanrga
                  Senior Member
                  • Mar 2013
                  • 137

                  Originally posted by tuxd3v View Post
                  The reality is that ARM64 exists because AMD helped ARM to create ARM64, in the time when AMD was thinking ingoing ARM..
                  Not true. AMD didn't play any role in the development of ARM64.

                  The ARM and the AMD ISAs couldn't be more different; 64bits must the only both have in common because one is RISC and the other is CISC, one is a 3-operand ISA and the other isn't, one is a load+op ISA and the other isn't, one is a new ISA created from scratch and the other is an extension to former ISAs, one has weak memory model and the other has strong model,...

                  Comment

                  • juanrga
                    Senior Member
                    • Mar 2013
                    • 137

                    Originally posted by olivier View Post
                    Despite those impressive results, I'm a bit concerned about the fragmentation coming in the server world… x86 "super compatible" is a double edged sword: bad for efficiency but major for compatibility and having no questions on choosing your hardware or upgrading it.

                    More details on that potential fragmentation on the market here: https://medium.com/@olivier.lambert/...d-abddca7a6268
                    It seems the author has missed all the server standards such as SBSA, SBBR



                    ARM has publicly released the Server Base Boot Requirements (SBBR) Specification, a follow-on companion to the Server Base System Architecture (SBSA) specification. The SBBR defines the ARMv8 platform firmware abstractions necessary for OS deployment...


                    Originally posted by cb88 View Post
                    If you think modern ARM has any semblance to a "clean" ISA you have no idea what you are talking about... you can't build a clean performant ISA, you have to make concessions for code density and performance.
                    From the RWT link I posted in a former message: "The ARMv8 architecture is classically British; a clean and elegant 64-bit instruction set".

                    Comment

                    • juanrga
                      Senior Member
                      • Mar 2013
                      • 137

                      Originally posted by coder View Post
                      Yeah, but also no.

                      Fujitsu made a deliberate decision to build a CPU that would fill both roles. In my opinion, it's fair to judge its efficiency on standard benchmarks, because efficiency has real world consequences, and that's what you're measuring.
                      Sure, if we are comparing full computers. But not when we are comparing ISAs. If we want to discuss the advantages of the ARM ISA, then we cannot compare the efficiency of Fugaku with the efficiency of x86 plus accelerators. We would compare Fugaku with a x86-only supercomputer.

                      Originally posted by Weasel View Post
                      Yeah it's smaller because it's slower on the same process node.
                      If you read the link, it is smaller because it lacks the x86 tax. The ARM server cores do not have to support 32bit and 16bit legacy stuff unlike the x86 server cores.

                      Moreover, TX2 was so fast as the best x86 despite using a worse 16nm node. And TX3 using 7nm is faster than any x86 processor. In fact the link shows performance benchmarks comparing TX3 with Rome and with Cascade Lake.

                      Comment

                      Working...
                      X