Announcement

Collapse
No announcement yet.

Amazon Graviton3 vs. Intel Xeon vs. AMD EPYC Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by mdedetrich View Post

    Well ARM CPU's don't tend to have hyperthreading due to the fact that hyperthreading is mainly a result of trying to squeeze out more performance from an older CISC based style ISA that has different word size's for ISA instructions. ARM ISA doesn't have this issues so afaik there isn't any modern ARM CPU that has hyperthreading/SMT.
    Why RISC like IBM POWER has 4 and even 8 (8 threads per core) HT on core? SPARC has also 8 HT.

    RISC ISA of POWER and Sparc are newer than ARM's ISA.

    So what you wrote is not true

    Comment


    • #22
      Originally posted by HEL88 View Post

      Why RISC like IBM POWER has 4 and even 8 (8 threads per core) HT on core? SPARC has also 8 HT.

      RISC ISA of POWER and Sparc are newer than ARM's ISA.

      So what you wrote is not true
      I was talking about ARM ISA specifically. And its not that its impossible, its that its a lot less necessary. SMT by definition is solving a problem of pipelining instructions which is an issue that ARM doesn't really have (and its why even the most high powered ARM CPU's don't have SMT, they don't need it).

      Comment


      • #23
        Originally posted by HEL88 View Post

        Why RISC like IBM POWER has 4 and even 8 (8 threads per core) HT on core? SPARC has also 8 HT.

        RISC ISA of POWER and Sparc are newer than ARM's ISA.

        So what you wrote is not true
        SPARC is ancient (and long dead), POWER is more modern, but AArch64 came a decade later.

        Recent POWER CPUs are 8-way SMT to reduce per-core software licensing cost (they basically slap 2 4-way SMT cores together...). Various Arm CPUs have SMT, but they weren't very successful. It turns out that you get more performance by adding extra cores than to add SMT...

        Comment


        • #24
          Originally posted by mdedetrich View Post

          I was talking about ARM ISA specifically. And its not that its impossible, its that its a lot less necessary.
          you wrote:
          has different word size's for ISA instructions.

          What variable instruction length has relation to hyperteading?? Please tell me.

          SMT by definition is solving a problem of pipelining instructions which is an issue that ARM doesn't really have (and its why even the most high powered ARM CPU's don't have SMT, they don't need it).
          Utilization pipeline depend on program, not ISA. If program has many depends data or use randomly memory utilization of pipeline will be small. Good predictor and big ROB has slight help solve this problems.

          So if program uses e.g. RAM in randomly pattern and pipeline is mostly stall, you may put another thread into pipeline without performance degradation. It is completely independent from ISA.
          Last edited by HEL88; 28 May 2022, 08:56 AM.

          Comment


          • #25
            Originally posted by AdrianBc View Post


            The RK3588 boards will be much faster than any *cheap* (i.e. under $300 for a complete computer) ARM boards that have ever been available until now.

            Nevertheless the Cortex-A76 cores from RK3588 are nowhere near the performance of the Neoverse V1 cores of Graviton 3 or of any of the Zen cores or of the big Intel cores.

            The Cortex-A76 cores of RK3588 have about the same speed as the Tremont cores of the Intel Jasper Lake and Elkhart Lake CPUs, which are also available in very cheap computers, even if not as cheap as the cheapest of those based on RK3588 (where some smaller boards with 8 GB DRAM should be less than $150).
            No, Cortex-A76 is basically the same core as used in Graviton 2 and Ampere Altra (Max). It achieves ~91% of single-threaded SPECINT2017 of EPYC 7763. So it is a pretty quick core despite being 4 years old... The problem with most of these boards is that they use cost optimized phone CPUs with very little cache and a slow memory system.

            Comment


            • #26
              Originally posted by PerformanceExpert View Post
              AArch64 came a decade later.
              LOL So x86-64 ISA came from 2003 and it modern too .


              SPARC is ancient
              last SPARC came from 2017, so not so ancient.

              Yes, but now it's dead.

              Recent POWER CPUs are 8-way SMT to reduce per-core software licensing cost
              Yes, if you IMPROVE performance without adding new core you save money on software licensing.
              Last edited by HEL88; 28 May 2022, 09:01 AM.

              Comment


              • #27
                Originally posted by smitty3268 View Post
                I'm not sure how useful it really is to compare processors with the same threadcount when some have SMT and some don't.

                Ultimately I think performance per $ is what primarily matters for the cloud.

                But if you were attempting to compare the CPU architectures, then I think getting the same # of physical cores makes more sense, because the extra SMT "cores" are just a side-benefit of the architecture, the same way that the looser memory model is a benefit of the ARM architecture.
                Just because there are 16 threads in the chosen instances doesn't mean they are being used in every benchmark. Various are single-threaded (however this isn't clear from any of the results). There is a reason many look at rate-1 and rate-N SPEC results - this also avoids the differences in CPU-specific optimizations in many Phoronix benchmarks.

                The same number of cores doesn't give a good comparison either, you'd have to look at performance per area (and power) in the same process. Consider for example that the Gravitons have less than half the L2/L3 cache of the EPYC instances.

                But yes, for customers the only thing that ultimately matters is perf/$, and that's exactly why Graviton is getting so popular.

                Comment


                • #28
                  Originally posted by HEL88 View Post
                  LOL So x86-64 ISA came from 2003 and it modern too .
                  No, x86-64 isn't a modern ISA like AArch64. Practically everything from x86 was kept as is - only a few opcodes were removed to be used as prefixes. It would have been a great opportunity to make major changes and remove a lot of ancient stuff, but that didn't happen unfortunately.

                  Comment


                  • #29
                    Originally posted by PerformanceExpert View Post
                    The same number of cores doesn't give a good comparison either, you'd have to look at performance per area (and power) in the same process. Consider for example that the Gravitons have less than half the L2/L3 cache of the EPYC instances.

                    But yes, for customers the only thing that ultimately matters is perf/$, and that's exactly why Graviton is getting so popular.
                    I think it just depends on what you are curious about comparing.
                    • 1-core vs 1-core of each architecture. In this case, I think an SMT "core" should be included for x86 for anything able to take advantage of multi-threading, but there should also be single-threaded tests where it is useless.
                    • Max multi-threading performance of the largest chips with the most cores available for each architecture.
                    • 2 chips that each use the same amount of power, at least approximately.
                    • 2 chips that cost the same amount of money, at least approximately.
                    • 2 chips that have the same amount of performance, at least approximately.
                    • The most power-efficient chips of each architecture. Or most cost-effective.
                    • The most popularly used chips of each.
                    • Probably some others too...
                    What I don't think makes any sense is taking an 8-core chip with SMT and directly comparing it to a 16-core chip just because they can both run 16 threads. If one of the other things above matches, then sure. But thread count alone is a poor reason for a comparison on it's own.
                    Last edited by smitty3268; 28 May 2022, 07:30 PM.

                    Comment


                    • #30
                      Originally posted by BlueSwordM View Post
                      Still, if they can only get similar performance on TSMC N5, I fear Sapphire Lake and especially Genoa will obliterate Graviton3, especially on a cloud provider providing fair pricing inside of the VCPU BS.
                      Michael did not normalize for power (he lacks the relevant information to do so). However, Amazon claims that a 64-core Graviton3 runs @ 100 W, whereas we know 64-core EPYC and 40-core Ice Lake Xeons both run well into the 200 W territory. Even if Amazon isn't running them at peak clocks, they're absolutely using significantly more than 100 W.

                      The comparison isn't necessarily fair without taking into account perf/W, depending on your concern. Certainly not, if you want the most direct comparison between the respective cores.

                      Comment

                      Working...
                      X