No announcement yet.

Amazon Graviton3 vs. Intel Xeon vs. AMD EPYC Performance

  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by eoerl View Post
    price wise, then at least for AWS the Gravitons are clearly a steal.
    Even if you normalize performance by price, it wouldn't reduce the number of x86 wins by that much. He chose instances priced in the same ballpark, and when the x86 machines took a win, it was often by a sizeable margin.

    Originally posted by eoerl View Post
    could well be that it's at a loss for Amazon, that they see it as an investment, or that AMD and Intel are just inherently overpriced. It's not really possible to answer the question with this article
    Intel & AMD indeed have profit margins on their hardware that Amazon doesn't have to pay itself. I'd bet there are also other parties that get a cut, when Amazon buys x86 systems vs. using its own homebuilt Graviton machines.

    The main takeaway for me is that which instance to use depends quite a lot on your specific workload. The geomeans hide a tremendous amount of variation, from one benchmark to the next.


    • #12
      Originally posted by linuxgeex View Post
      ServeTheHome did a narrower comparison but they involved Intel engineers to bring x86 acceleration features into the picture and the Graviton won very few, by small margins, but Intel spanked Graviton by a factor of 10 in several tests and in some the margin was so broad STH didn't even bother publishing it because the results were too embarassing for AWS... "there's no comparison."
      Did they try using -mcpu=native (unlike Michael)? Which compiler, for Graviton 3?

      Edit: in newer versions of GCC -mcpu is deprecated! The manual even states that:

      Specifying -march=cpu-type implies -mtune=cpu-type.


      Originally posted by linuxgeex View Post
      So it's interesting to see how Michael's results without acceleration enabled worked out.
      Some of them clearly are benefiting from cpuid-based optimized codepaths.
      Last edited by coder; 28 May 2022, 11:06 PM. Reason: Added correction + gcc user manual citation.


      • #13
        Originally posted by smitty3268 View Post
        But if you were attempting to compare the CPU architectures, then I think getting the same # of physical cores makes more sense, because the extra SMT "cores" are just a side-benefit of the architecture, the same way that the looser memory model is a benefit of the ARM architecture.
        In this context, there are two primary metrics which matter:
        • perf/$
        • absolute perf of maximal instance (i.e. if you have a sufficiently scalable workload and really care about performance)

        More generally speaking, I agree that peak single-thread performance and peak all-thread performance are better indicators of microarchitecture sophistication.


        • #14
          It seems like the CPU's massive memory bandwidth(most of the gains seem to come from here), ROB compression and decent general core throughput are making the Cortex V1 core quite decent. Well, also the fact that cloud providers continue to be cheap AF by offering VCPUs instead of VMs based on CPUs. At this point and at Amazon, I wonder if this is not only done because of greed, but also to make their CPUs seem to be better than they are.

          Of course, these CPUs must cost a pretty penny to make because they're on TSMC N5, use a monolithic architecture with only some chiplets for IO, with that same IO using very expensive interconnects(EMIB), etc.

          Still, if they can only get similar performance on TSMC N5, I fear Sapphire Lake and especially Genoa will obliterate Graviton3, especially on a cloud provider providing fair pricing inside of the VCPU BS.


          • #15
            Originally posted by coder View Post
            Did they try using -mcpu=native (unlike Michael)? Which compiler, for Graviton 3?

            Some of them clearly are benefiting from cpuid-based optimized codepaths.
            This can work to ARM's benefit, eventually. They have the potential to be much wider and more SIMD heavy, though Amazon may or may not be interested in that when they can cleanly substitute two leaner ARM cores for one SMT x86 one.

            Also, hand written assembly for ARM is more portable.​​​​​
            Last edited by brucethemoose; 27 May 2022, 02:07 AM.


            • #16
              Originally posted by coder View Post
              Yeah, but he's comparing 16-core Graviton instances with 8-core / 16-thread x86 instances.

              I'm not complaining, as I think it makes sense to stay within comparable price brackets, but it's something to keep in mind. Also, that Amazon is able to offer time on its own CPUs for less $ for business reasons, as well as technical ones.
              Well ARM CPU's don't tend to have hyperthreading due to the fact that hyperthreading is mainly a result of trying to squeeze out more performance from an older CISC based style ISA that has different word size's for ISA instructions. ARM ISA doesn't have this issues so afaik there isn't any modern ARM CPU that has hyperthreading/SMT.

              So you are either going to compare it to a 16 core graviton instance or an 8 core instance and I do think that the 16 core is a more legitimate comparison but I think at the end of the day you should be also having cost as a factor which is hard to do since graviton is an Amazon only thing and we have no idea if they are selling it at a loss.


              • #17
                Getting a "115500" 7-zip score using a Desktop Ryzen 3950X / ArchLinux


                • #18
                  So graviton 3 is hpc King 🤔


                  • #19
                    Originally posted by jjmcwill2003 View Post
                    Interesting to see how far their Arm architecture has come in terms of performance.

                    It's too bad there's nothing comparable in the desktop market. (Honeycomb LX seems comparatively ancient and lacks single thread performance).

                    Maybe one of the upcoming RockChip RK3588 based boards will turn out to be interesting, depending on how quickly Linux support arrives.

                    The RK3588 boards will be much faster than any *cheap* (i.e. under $300 for a complete computer) ARM boards that have ever been available until now.

                    Nevertheless the Cortex-A76 cores from RK3588 are nowhere near the performance of the Neoverse V1 cores of Graviton 3 or of any of the Zen cores or of the big Intel cores.

                    The Cortex-A76 cores of RK3588 have about the same speed as the Tremont cores of the Intel Jasper Lake and Elkhart Lake CPUs, which are also available in very cheap computers, even if not as cheap as the cheapest of those based on RK3588 (where some smaller boards with 8 GB DRAM should be less than $150).


                    • #20
                      Originally posted by tambre View Post

                      NVIDIA's Jetson Orin might be the best option once they release the cheaper variants in the autumn. Of course they aren't very open source friendly.
                      The Jetson Orin AGX are extremely overpriced, which was also true for the previous generation, Jetson Xavier AGX.

                      On the other hand, the development kit for Jetson Xavier NX had a much more acceptable price in comparison with alternatives. I have one and I am content with it, even if the NVIDIA Carmel cores are rather slow and its Volta GPU is not really faster than the GPU of an Intel NUC of the same price, where the CPU is much faster.

                      Because the cheapest of the Jetson Orin NX modules has kept the same price as Xavier NX, even if the slow Carmel cores have been replaced with fast Cortex-A78AE cores, making the 6-core CPU part comparable with a 4-core Skylake CPU, and the GPU has been replaced by a much faster Ampere GPU, it can be hoped that the development kit for Orin NX will also have about the same price as before.

                      If that will be true, then a Jetson Orin NX will be a good buy. However it would be in a different class of prices than a SBC with Rockchip RK3588.

                      While the latter has similar performance and slightly lower prices than Intel Jasper Lake, a Jetson Orin NX will have a price similar to an Intel NUC with Tiger Lake or Alder Lake. A NUC will have a faster CPU and a slower GPU. The Intel GPU is not much slower, having 75% of the ALU number of Orin NX, but the main advantage of NVIDIA Orin remains the software support for NVIDIA CUDA.

                      One important advantage of NVIDIA Orin is that, if it will continue to have a documentation as good as NVIDIA Xavier has now, the documentation needed for improving the Linux support will much better and much more complete than the documentation for any other CPU having ARM cores of comparable speed.

                      Better documentation than for NVIDIA Xavier exists only for the small and slow microcontrollers with ARM cores (from NXP, ST, Microchip, TI etc.).

                      NVIDIA Xavier is very open-source friendly. Now even the GPU driver is open source (even if it was not in the beginning), so it can be hoped that this will continue to be true for NVIDIA Orin.

                      Last edited by AdrianBc; 27 May 2022, 06:42 AM.