AmpereOne Performance Scaling From 32 To 192 Cores, Core-For-Core Benchmarks Against Ampere Altra Max

Written by Michael Larabel in Processors on 29 August 2024 at 10:25 AM EDT. Page 1 of 8. 21 Comments.

Earlier this week I began with AmpereOne A192-32X benchmarks and will continue for the next several weeks in finally having hands-on with the 192-core AArch64 server processor using a Supermicro ARS-211M-NR R13SPD 2U server platform. In today's next phase of AmpereOne performance benchmarking is looking at how AmpereOne scales across 32, 64, 96, 128, 160, and 192 core counts plus seeing core-for-core at 128 cores how AmpereOne compares to the Ampere Altra Max M128-30 processor. Plus these AmpereOne benchmarks at varying core counts against the AMD EPYC and Intel Xeon competition.

AmpereOne Supermicro BIOS

See the earlier AmpereOne A192-32X review if wanting a lot of benchmarks for how AmpereOne compares to the earlier Ampere Altra Max and then the Intel Xeon and AMD EPYC server CPU competition. Today's article is more focused in looking at the AmpereOne performance across varying core counts for both power and performance.

AmpereOne FlexSpeed and FlexSKU

Originally I wanted to look at AmpereOne's new FlexSKU and FlexSpeed features. These features were talked about back during the May 2024 update on AmpereOne. Unfortunately though it turns out that the FlexSpeed and FlexSKU features aren't yet supported on the Supermicro platforms. Thus for this article's testing I was offlining the CPU cores from within Linux. Offlining the AmpereOne CPU cores worked out well and was the next best option besides FlexSKU in only having the AmpereOne A192-32X flagship processor on hand but being curious about the core scaling and how AmpereOne compares to AmpereAltra at the same core count.

CPU Power Consumption Monitor benchmark with settings of Phoronix Test Suite System Monitoring.

In offlining a varying number of CPU cores also helped in looking at the per-core CPU power consumption with AmpereOne. Each AmpereOne core was consuming less than one Watt, which was impressive. But the base CPU power consumption remained high. As shown in the earlier AmpereOne review, the A192-32X CPU had a high base power consumption even at idle and was much higher than Ampere Altra Max and the Intel Xeon / AMD EPYC CPUs. In turn the idle server power consumption with AmpereOne was also much higher than on the other tested server platforms for their wall power. With this latest testing, we see even when offlining a great number of CPU cores the minimum/idle power consumption never dips below 100 Watts. That's very high compared to other CPUs able to idle in the 10~30 Watt range.

AmpereOne CPU Core Scaling Benchmarks

When offlining the CPU cores, the AmpereOne A192-32X continued operating at a consistent 3.2GHz across cores. With having no boost/turbo frequency mode, even when cores were being offlined it meant just keeping to 3.2GHz and in turn reduced CPU power consumption. But without any turbo/frequency type handling that additional power capacity was left untapped.

It's also worth pointing out that with the current AmpereOne SKU stack, the smallest processor is currently the AmpereOne A96-37X that is 96 cores at 2.7GHz. Ampere Computing is leaving the Ampere Altra products for the smaller core counts from 32 to 128 cores. Hopefully in time AmpereOne will be expanded to cover the lower core counts given the age already of the Ampere Altra platform and those that may be wanting an AArch64 workstation with DDR5 memory, PCIe 5 connectivity, more cache per core, etc.

John The Ripper benchmark with settings of Test: Blowfish. AmpereOne A192-32X @ 192 Cores was the fastest.
John The Ripper benchmark with settings of Test: bcrypt. AmpereOne A192-32X @ 192 Cores was the fastest.
RocksDB benchmark with settings of Test: Random Read. AmpereOne A192-32X @ 192 Cores was the fastest.

With no Simultaneous Multi-Threading (SMT) (a.k.a. Hyper Threading) on Ampere processors, in the highly scalable workloads they scaled nicely up to the 192 cores available with the AmpereOne A192-32X processor. Having all physical cores with no SMT siblings is quite beneficial especially for cloud/VM workloads where many vendors will also treat the sibling thread as an additional vCPU.

Speedb benchmark with settings of Test: Random Read. AmpereOne A192-32X @ 192 Cores was the fastest.
m-queens benchmark with settings of Time To Solve. AmpereOne A192-32X @ 192 Cores was the fastest.
GROMACS benchmark with settings of Implementation: MPI CPU, Input: water_GMX50_bare. AmpereOne A192-32X @ 192 Cores was the fastest.
QuantLib benchmark with settings of Configuration: Multi-Threaded. AmpereOne A192-32X @ 192 Cores was the fastest.

Nice scalability out of many workloads these days that can enjoy scaling up to today's high core count AArch64 and x86_64 processors... Let's continue.

Related Articles