AMD EPYC 9554 & EPYC 9654 Benchmarks - Outstanding Performance For Linux HPC/Servers
Those wishing to see all of the benchmarks I ran in full can do so via this OpenBenchmarking.org result page that also has all the per-result CPU power data, performance-per-cost, etc.
Above is a look at the combined power consumption observed for all of the processor configurations under test for the entire duration of benchmarks carried out. Again, all the CPU power measurements are from the exposed RAPL interfaces on Linux. The EPYC 9554 in its default (performance determinism) mode had an average power draw of 221 Watts with a peak of 355 Watts, compared to the EPYC 7763 with a 170 Watt average and peak of 286 Watts, but when enabling the power determinism mode it jumped to a 234 Watt average with a peak of 404 Watts. Meanwhile the flagship 96-core EPYC 9654 had an average power draw of 223 Watts and a peak of 363 Watts, or a 256 Watt average in the power determinism mode and a peak there of 415 Watts. In the EPYC 9654 2P mode that was a 366 Watt average and peak of 697 Watts or in the power determinism mode an average of 443 Watts and a peak of 833 Watts. The power consumption is higher with these new Socket SP5 processors but as shown by many of the performance-per-Watt metrics, when it comes to the power efficiency it's often ahead of AMD EPYC 7003 "Milan" or worst case was roughly similar performance-per-Watt to those prior generation parts. So the power increases are justified and there are lower EPYC 9004 "Genoa" processors too if not wanting to get into the 300~400 Watt range.
When taking the geometric mean of all the benchmarks that successfully ran on all processors, here is how things shake out. Even a single EPYC 9554 comes out ahead of the 2P EPYC 7773X configuration overall... AMD 4th Gen EPYC is great with its AVX-512 implementation, DDR5 system memory, twelve memory channels, and other Zen 4 architectural improvements. The 64-core EPYC 9554 2P was 64% faster than the 64-core EPYC 7763 2P configuration overall, or 67% if running the EPYC 9554 2Ps in the power determinism mode. Meanwhile the flagship EPYC 9654 2P was 74% faster than the EPYC 7763 2P or that went up to 85% when the Genoa flagship CPUs were running in the power determinism mode. The AMD EPYC 9654 2P was running at over 2x the speed of Intel's current flagship, the Xeon Scalable 8380 2P "Ice Lake" processors.
The generational uplift from Milan to Genoa was incredible across the wide-range of server and HPC benchmarks I've carried out. I am now left to daydream about what Genoa-X will look like next year in knowing there still is even more potential to squeeze out of Zen 4 on the server side as well as next year's Bergamo CPUs for up to 128 cores for focused on cloud computing workloads.
As I've shown a lot already on the Ryzen 7000 series desktop side, AMD Zen 4's AVX-512 implementation is remarkably efficient and that holds even more true on the server side. With there being even more relevant workloads here able to make use of AVX-512 and some stunning uplift as shown throughout these benchmarks.
On a CPU pricing basis, the EPYC 9004 series is competitive with existing EPYC 7003 "Milan(X)" processors and Xeon Scalable Ice Lake processors. However, the transition to Genoa does mean also needing DDR5 ECC system memory that is pricier than DDR4. I haven't received any advanced pricing information yet on any EPYC Genoa retail motherboards so I am not sure how that will play out but presumably with the more complex Socket SP5 and the higher power requirements it will command higher relative pricing to what we've seen with the EPYC SP3 motherboards. EPYC Milan processors will continue to be available for those looking at lower-priced servers but with still very healthy performance.
On the Linux support side, the upstream Linux kernel and other key components are in good shape for at-launch support with EPYC 9004 series... Granted, that's rather a given with today's Linux server marketshare. But there still is room for AMD to make strides in their Linux/open-source support. For example, AMD was late with their Automatic IBRS patches for the Linux kernel in only posting those last week. It's also only with Linux 6.1 where the AMD CPU cache-to-cache and memory reporting with perf is landing, for those interested in those expanded profiling capabilities. Also only premiering with Linux 6.1 is the LbrExtV2 Last Branch Record functionality new to Zen 4. Meanwhile Linux 6.0 squared away AMD X2AVIC for KVM virtual machines. Still yet to be mainlined in the Linux kernel but available in patch form is the QoS support around slow memory bandwidth allocation with CXL memory and the Bandwidth Monitoring Event Configuration (BMEC). So there are a few non-critical features that have seen late arrivals for the mainline Linux kernel, but at least in terms of all the key support it's in good shape for launch. Of course once the features reach mainline there is also the added time before finding these kernels in use by various Linux distributions or back-ported to the enterprise kernel versions for the likes of RHEL and SLES. On a positive note, AMD's Linux upstreaming trendline for pre-launch timing has been improving for succeeding generations of EPYC/Zen processors (in large part as they have been hiring a lot more Linux engineers over the past two years).
There still is the unfortunate angle of arguably late compiler tuning support for this new generation of processors. It was only in mid-October when AMD sent out their Znver4 compiler support for GCC that added the "-march=znver4" target and was then merged into GCC 13 Git in late October. But with this initial support, it's carrying over the cost/tuning table from Znver3 -- the Znver4 tuning is expected "later". Hopefully that tuned support still will make it in time for GCC 13, which in turn should see its stable release as GCC 13.1 around March~April next year. But then it won't be until most of the H2'2023 Linux distribution releases like Ubuntu 23.10 where GCC 13 is used as the default system compiler. Had AMD gotten their Znver4 support into GCC well in advance of launch (like Intel is known for and having squared away much of their Sapphire Rapids and AMX enablement for GCC 12), it could already be shipping in Ubuntu 22.04 LTS and other recent distributions. There is also a Znver4 patch for GNU Binutils that is sitting on the mailing list and as of writing this article has yet to be merged.
Or put another way, in the annual GCC 13 compiler release where AMD is only debuting their Zen 4 support, Intel has already worked out and merged for GCC 13 support for various 2023~2024 processors. GCC 13 already has queued up Grand Ridge and Granite Rapids, Meteor Lake, Sierra Forest, and that included enabling various new instructions coming with those processors. It's that kind of timely support I'd love to see out of AMD (and many years ago they were punctual with their early GCC support) so that by the time these processors are shipping, the Znver4 support would ideally already be in a released/stable compiler found by the latest Linux distributions. As of writing, there haven't been any Znver4 patches posted for upstream review on the LLVM/Clang side while there is at least the six month release cadence. Intel continues to lead on the software side when it comes to their stellar open-source/Linux timing in the vast majority of cases over the past number of years. The upstream enablement timing is a recurring pet peeve I have with AMD each launch cycle; on the compiler side the only logical reason I have is that they want to play their cards close to the vest and not reveal new ISA extension plans for future CPU generations too early.
Granted, unless you are compiling optimized code for the server CPU target, this Znver4 compiler support isn't much (or any) issue to you. But given AMD's growing appeal in the high performance computing (HPC) space, it's a bit surprising they haven't been pushing out this compiler support earlier. There at least though should be a new AMD Optimizing C/C++ Compiler (AOCC) release soon where Zen 4 be in good shape. Once that new AOCC release is out, I'll certainly be running some compiler benchmarks on Genoa for looking at the impact of the tuned compiler support on these Zen 4 server processors.
In addition to the terrific performance and Linux support for launch, another exciting aspect of 4th Gen EPYC from the reference platform side is Titanite running with OpenBMC! It was exciting to see the Linux-based, open-source OpenBMC being used as the software stack for the reference BMC and hopefully this will carry through to seeing OpenBMC being used by more EPYC 9004 series servers. As well, hopefully the industry/customer interest in open-source firmware continues and AMD is enable to engage more around Coreboot and other open-source firmware elements.
From my testing of the EPYC 9554 and EPYC 9654 (and my preliminary testing with the EPYC 9374F -- stay tuned for my full review there!), the performance is incredible generationally and against the current Xeon Ice Lake competition while waiting for Sapphire Rapids. Particularly for the HPC workloads, the generational gains from Milan(X) to Genoa were some of the most captivating generational improvements and against the competition that I've seen out of the past 18+ years of running Phoronix for Linux hardware reviews. The huge raw performance gains are matched by competitive performance-per-Watt and competitive performance-per-dollar for driving great value in the data center. For many of the AVX-512 workloads, the performance-per-dollar of 4th Gen is extremely compelling.
How well Intel Sapphire Rapids stacks up against 4th Gen EPYC will be an interesting battle. Sapphire Rapids will only go up to 60 cores compared to 96 cores with Genoa, but to Intel's advantage is the new Advanced Matrix Extensions (AMX), AVX-512 FP16, and various new accelerator blocks. For software able to leverage AMX and Intel's accelerator IP it will be a very interesting competition at least to Genoa but for more traditional server workloads will present a rather significant challenge -- need I remind you the geo mean on the EPYC 9654 2P was 2x that of the Xeon Platinum 8380 2P. It will also be interesting to see how Intel competes with the EPYC 9004 series on pricing especially with Sapphire Rapids introducing Intel On Demand / Software Defined Silicon that further complicates the pricing scene especially if tied to the new accelerator blocks that become paramount for delivering competitive performance. One area that will be interesting for Intel with Sapphire Rapids is their HBM2e SKUs now known as Xeon Max while also next year AMD will have Genoa-X to announce. Among the other benefits of the AMD 4th Gen EPYC series processors include CXL 1.1+ support and expanded SEV-SNP support with increased memory encryption capabilities and more VMs.
Thanks to AMD for providing the review hardware for this EPYC 9004 series launch-day testing. Stay tuned to Phoronix for many follow-up benchmarks to come over the weeks ahead looking at the AOCC compiler performance, a more detailed AVX-512 server comparison, memory channel comparison, FreeBSD testing, and more. For any Phoronix reader requests please let me know what else you would like to see tested (@michaellarabel on Twitter or commenting on this article in the forums). And one last thing, as a friendly reminder, if you enjoy these extensive launch-day benchmarks from this one-man band, consider showing your support by joining Phoronix Premium to enjoy the site ad-free and multi-page articles on a single page, among other benefits. Corporate subscriptions are also available as are any tips via PayPal/Stripe - any support is appreciated given the unfortunate state of the ad industry [what makes these reviews possible] these days and rampant ad-block usage. Now back to more Linux benchmarking with the captivating AMD EPYC Genoa processors.
If you enjoyed this article consider joining Phoronix Premium to view this site ad-free, multi-page articles on a single page, and other benefits. PayPal or Stripe tips are also graciously accepted. Thanks for your support.