Announcement

Collapse
No announcement yet.

AWS Graviton4 Benchmarks Prove To Deliver The Best ARM Cloud Server Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AWS Graviton4 Benchmarks Prove To Deliver The Best ARM Cloud Server Performance

    Phoronix: AWS Graviton4 Benchmarks Prove To Deliver The Best ARM Cloud Server Performance

    This week AWS announced that Graviton4 went into GA with the new R8G instances after Amazon originally announced their Graviton4 ARM64 server processors last year as built atop Arm Neoverse-V2 cores. I eagerly fired up some benchmarks myself and I was surprised by the generational uplift compared to Graviton3. At the same vCPU counts, the new Graviton4 cores are roughly matching Intel Sapphire Rapids performance while being able to tango with the AMD EPYC "Genoa" and consistently showing terrific generational uplift.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Anyone know if word tearing is still a thing? IIRC reading the JVM has/had problems with using volatile on 64bit values with Arm?

    Comment


    • #3
      Does Amazon offer any instances with SMT/HyperThreading disabled? I think the main way Graviton4 jumps in the lead (when it does) is that you're using 64 distinct cores vs. 64 threads on 32 cores on the x86 CPUs. It would be interesting to nullify that performance advantage.

      Ultimately, what matters for most AWS customers is perf/$, where the whole SMT question becomes rather moot. I'm really just curious how they compare on an apples-to-apples basis, for "science".
      Last edited by coder; 12 July 2024, 02:00 PM.

      Comment


      • #4
        Originally posted by Jonjolt View Post
        Anyone know if word tearing is still a thing? IIRC reading the JVM has/had problems with using volatile on 64bit values with Arm?
        When? Could this have been on 32-bit ARM cores (or 32-bit code on 64-bit cores)?

        Comment


        • #5
          Originally posted by Jonjolt View Post
          Anyone know if word tearing is still a thing? IIRC reading the JVM has/had problems with using volatile on 64bit values with Arm?
          Originally posted by coder View Post
          When? Could this have been on 32-bit ARM cores (or 32-bit code on 64-bit cores)?
          It's not just an Arm issue. It's enough of a problem with processors that the Java language spec has a special caveat about word tearing with example code intended to detect the problem with the intention of helping to mitigate it in the broader sense when necessary.

          If any one really cares that much about it, it's trivial to check any architecture with an existing JVM implementation.

          Edit: FWIW, the M1 CPU in Macs doesn't trigger the tearing warning when running the example code. I just tested it for my own curiosity utilizing the Arm native version of OpenJDK 22. It's rash to be sure and I don't have a 2, 3, or 4 machine, but I'd imagine none of the subsequent M class processors would if the first version doesn't.

          Edit #2: I realize just because Apple's Arm chips doesn't exhibit tearing doesn't necessarily mean other AArch64 CPUs like Graviton 4, which may be based on different ARM specification revisions and vendor specific extensions or exclusions, won't either.

          Edit #3: Volatile is a special case however. It's only thread safe in narrow circumstances to begin with no matter what hardware you run it on.
          Last edited by stormcrow; 12 July 2024, 04:37 PM.

          Comment


          • #6
            Originally posted by coder View Post
            Does Amazon offer any instances with SMT/HyperThreading disabled? I think the main way Graviton4 jumps in the lead (when it does) is that you're using 64 distinct cores vs. 64 threads on 32 cores on the x86 CPUs. It would be interesting to nullify that performance advantage.
            Each vCPU on non-Graviton-based Amazon EC2 instances is a thread of x86-based processor, except for R7a instances.
            The EPYC results should therefor already be 64 distinct cores.
            Source: https://aws.amazon.com/ec2/instance-...mory_Optimized

            See also: https://www.phoronix.com/review/aws-m7a-ec2-benchmarks (There's a paragraph about SMT or rather the lack thereof)

            Comment


            • #7
              Originally posted by YCbCr View Post
              The EPYC results should therefor already be 64 distinct cores.
              Source: https://aws.amazon.com/ec2/instance-...mory_Optimized

              See also: https://www.phoronix.com/review/aws-m7a-ec2-benchmarks (There's a paragraph about SMT or rather the lack thereof)
              Thanks!

              That helps explain why EPYC does well, overall. It also makes Graviton4's wins more impressive, when it has them.

              Comment


              • #8
                Originally posted by Jonjolt View Post
                Anyone know if word tearing is still a thing? IIRC reading the JVM has/had problems with using volatile on 64bit values with Arm?
                The ARM memory model is weakly ordered (stores can occur out of order from the program's point of view), and other-multi-copy atomic (you can see values you updated that other readers may not, although all other readers will see the same value (updated, or not)). The JVM explicitly calls out the possibility that 64bit values may be processed as 32bit chunks, and those two chunks may not, necessarily, be seen as having been updated at the same time. The x86 memory model is somewhat more strict. There is a history of people assuming the more strict memory model, and having this fail in interesting ways on ARM vs x86. However, as with all else, ARM implementations may be more strict about their memory model implementation, but don't automatically presume other or later implementations will also be more strict, as there can be performance benefits to relaxing the strictness, and sometimes a designer/manufacturer will chose performance.

                Comment


                • #9
                  For anybody interested in a GCP-focused CPU comparison, check out my SPEC mid-2024 take: https://medium.com/google-cloud/spec...h-f57be21e67cd

                  Comment


                  • #10
                    Originally posted by stormcrow View Post


                    It's not just an Arm issue. It's enough of a problem with processors that the Java language spec has a special caveat about word tearing with example code intended to detect the problem with the intention of helping to mitigate it in the broader sense when necessary....
                    Originally posted by CommunityMember View Post

                    The ARM memory model is weakly ordered (stores can occur out of order from the program's point of view), and other-multi-copy atomic (you can see values you updated that other readers may not, although all other readers will see the same value (updated, or not)). The JVM explicitly calls out the possibility that 64bit values may...

                    Thank you sir's for the explanation.

                    Comment

                    Working...
                    X