Amazon Graviton3 Compiler Tuning Benchmarks For The Arm Neoverse-V1 Cores
Stemming from my recent AWS Graviton3 benchmarks and looking at Graviton3 against Intel Xeon and AMD EPYC, a number of Phoronix readers expressed interest in seeing some compiler tuning benchmarks for the Graviton3 around its Arm Neoverse-V1 cores with SVE support. Here are some benchmarks for those interested in the compiler tuning impact for this new high performance Arm cloud processor.
The tests in this article were with an Amazon EC2 c7g.8xlarge instance featuring 32 cores from a Graviton3 platform. Ubuntu 22.04 LTS was running on this EC2 instance with its stock Linux 5.15 kernel while opting to use gcc-snapshot on Ubuntu 22.04 for providing a newer GCC12-based compiler stack.
From this same Graviton3 instance, a number of open-source C/C++ benchmarks were carried out while testing with the following CFLAGS/CXXFLAGS under test:
-O3 -march=armv8.4-a for ARMv8.4 as Neoverse-V1 is based.
-O3 0march=armv8.4-a+sve for specifying the Arm SVE (Scalable Vector Extension) support found with Neoverse-V1.
-O3 -march-armv8.4-a+sve -mcpu=neoverse-v1 for also enjoying CPU tuning specific to the Neoverse-V1.
Added in GCC 11 was Arm Scalable Vector Extension (SVE) support including the ability for auto-vectorization for many SVE features. Let's take a look to see what difference these compiler flags have on the resulting performance of the generated binaries running on Graviton3 in the Amazon cloud.
The compiler testing on Graviton3 was limited to just those three configurations in order to conserve operating costs in the cloud.