Intel Xeon Platinum 8490H "Sapphire Rapids" Performance Benchmarks

Written by Michael Larabel in Processors on 10 January 2023. Page 14 of 14. 41 Comments

Well, it was a busy past four days of benchmarking Sapphire Rapids and only hitting the tip of the iceberg when it comes to exploring Sapphire Rapids' performance potential around AMX and the accelerators. Due to the very short time with having my hands on this Xeon Platinum 8490H server so far, stay tuned for more benchmarks and looking more closely at various optimized/tuned software packages for making use of all the 4th Gen Xeon Scalable functionality over the weeks ahead. Plus looking at how well Sapphire Rapids performs with Intel's extensively-tuned Clear Linux platform, various compiler benchmarks, and all the usual follow-up tests I tend to do with each major new CPU generation.

From the benchmarks carried out in-time for launch day, when taking the geometric mean of all the "creator" tests the Xeon Platinum 8490H managed to come out just ahead of the EPYC 9554 when looking at dual socket performance. The EPYC 9554 is 64 cores to the Xeon Platinum 8940H's 60 cores while the AMD Genoa part costs just over $9k to the 8940H's $17k price-tag. In these creator workloads the Xeon Platinum 8940H 2P came in at 2.16x the performance of the Xeon Platinum 8380 2P Ice Lake processor. Counting as "creator" workloads were OSPRay, OSPRay Studio, C-Ray, Tachyon, POV-Ray, Blender, GraphicsMagick, Embree, oneDNN, OIDN, OpenVINO, ASTC Encoder, timed Godot compilation, and BRL-CAD.

When taking the geometric mean of all the conventional high performance computing (HPC) benchmarks for this launch-day comparison, the Xeon Platinum 8490H effectively matched the EPYC 9554 2P as well. Here the 8940H 2P enjoyed 2.27x the performance of the 8380 2P. A single Xeon Platinum 8490H also managed to deliver 1.28x the performance of two Xeon Platinum 8380 processors in these HPC benchmarks. Counting as HPC benchmarks for what was run in time for launch day included NPB, Rodinia, HPCG, MT-DGEMM, AMG, NAMD, GROMACS, LULESH, Pennant, Incompact3D, OpenFOAM, RELION, oneDNN, OpenVINO, LCzero, WRF, and Graph500.

It's with the machine learning benchmarks like oneDNN and OpenVINO with AMX where the Xeon Platinum 8490H dominated among the benchmarks carried out for the Sapphire Rapids launch. With the geometric mean of the machine learning benchmarks the Xeon Platinum 8490H enjoyed over 3x the performance of the Xeon Platinum 8380 and where the Xeon Platinum 8490H was able to easily outpace the EPYC 9654 and EPYC 9554 processors. Yes, in follow-up benchmarks will also be more benchmarks looking at the AI performance across other software packages and closer exploration of toggling AMX, etc.

To no surprise, software that is part of Intel's oneAPI collection or leveraging Intel's vast open-source software portfolio is in good shape for Sapphire Rapids and the new Max Series products.

But for software tasks like code compilation or CPU-based 3D rendering where the outright core/thread count is most pressing and/or memory bandwidth intensive where the 12 channel memory shines, the AMD Genoa parts easily lead.

The Sapphire Rapids performance matched the AMD EPYC Genoa general purpose SKUs when it came to the Python execution performance as well as other common scripting language benchmarks.

When taking the geometric mean of all 100+ benchmarks I ran in time for launch day, the Xeon Platinum 8490H 2P was positioned between the EPYC 9374F and EPYC 9554 2P SKUs overall. The benchmarks I completed in time for launch day can be viewed here. Across the wide range of benchmarks carried out the Xeon Platinum 8490 was delivering 1.79~1.83x the performance of the prior generation Xeon Platinum 8380 Ice Lake processor. A single Xeon Platinum 8490H was commonly outperforming the two Xeon Platinum 8380 and Xeon Platinum 8362 processors. The flagship AMD EPYC 9654 "Genoa" processor meanwhile was around 16% faster than the Xeon Platinum 8940H 2P configuration while the 64-core EPYC 9554 2P was 9% faster.

Lastly is a look at the CPU power consumption across all of the benchmarks ran for this article. The Xeon Platinum 8490H in both single and dual socket configurations consumed significantly more power than the prior-generation Ice Lake CPUs and even the EPYC Genoa SKUs. Across all of the benchmarking a 303 Watt average was observed and a peak power consumption of 379 Watts, as exposed via the RAPL sysfs interfaces. Meanwhile the Xeon Platinum 8380 had a 223 Watt average and a 293 Watt peak across the set of benchmarks while the EPYC 9654 also had a 223 Watt average while at a 363 Watt peak.

The Xeon Platinum 8940H outright dominated in the AI benchmarks with the likes of oneDNN, DeepSparse, and OpenVINO able to fully leverage the Sapphire Rapids capabilities with AMX. There was also very strong showings out of the 4th Gen Xeon processors with the likes of Open Image Denoise for image denoising and OSPRay, GraphicsMagick, Python, PHP, OpenJDK Java, etc. The 4th Gen Xeon Scalable performance struggled when it came to HPC/server workloads that scale well to high thread counts where the 96-core / 192 thread EPYC 9654 could benefit, even in 2P configurations up to 384 threads. Some of the memory-intensive workloads also did better thanks to AMD EPYC "Genoa" supporting 12 channels of DDR5 memory, but for those same workloads it's also where the Intel Xeon CPU Max Series with HBM2e should perform very well too. Unfortunately, no hands-on access to the Sapphire Rapids HBM2e SKUs yet for benchmarking there.

Over the coming weeks will also be benchmarks across these processors looking at more workloads able to take advantage of the different accelerators new to Sapphire Rapids. The new accelerators have a lot of potential but will take time for the software ecosystem to catch-up in embracing them. At least though the kernel driver support is in order for Linux 5.19+ and now up to the Linux user-space software to catch-up. With Sapphire Rapids availability expected in public clouds expected soon, that should help in allowing more developers try out 4th Gen Xeon Scalable and ideally begin adapting their software for leveraging the accelerators. Unfortunately for the lower-end Xeon Sapphire Rapids SKUs, the accelerators are limited or outright disabled unless engaging the Intel On Demand licensing model. It will be an interesting few months ahead to see how everything plays out and watching the software adoption for the accelerator support just as I closely watch the other areas of Linux development.

The pricing of 4th Gen Xeon Scalable at the top-end of $17k for the Xeon Platinum 8490H is arguably a bit steep, unless all of your workload(s) happen to be able to make optimal use of the accelerators and new Sapphire Rapids capabilities. The AMD EPYC 9654 meanwhile has a list price of $11,805 which even when factoring in the costs if going for 12 DIMMs per socket rather than 8 DIMMs if wanting to populate all available memory channels will still come out ahead of the 8940H pricing.

The pricing for Intel's HPC-optimized Xeon CPU Max Series is actually in better shape if not planning to make use of the accelerators aside from DSA. The Xeon CPU Max Series 9480 has 56 cores and the same base/turbo frequency as the 8490H, same 112.5MB of cache, same 350 Watt TDP rating, eight channels of DDR5-4800, and 4 DSA devices but no QAT/DLB/IAA accelerators. That 9480 is priced at $12,980 USD as much closer to the EPYC 9654 while the 64GB of HBM2e should be extremely interesting for HPC benchmarks. If you are able to get by just on 64GB of HBM2e as system memory, the pricing is extremely favorable for those HPC-optimized SKUs in running HBM-only mode and not needing to deal with pricey DDR5 DIMMs. As we've seen from the great uplift with the EPYC 7773X for its larger cache, even that 768MB of L3 cache per socket was enough to provide significant advantages in my real-world workloads. Having HBM2e speeds as system memory should impress for workloads like OpenFOAM, NWChem, WRF, and others. The HBM-only mode for Xeon CPU Max Series is interesting though with the 9480 at just over 1GB of RAM per core will not be enough for many modern MPI workloads, but the HBM flat/caching modes can come to the rescue. In any event this Intel's most exciting server line-up in many years and with many significant new features that will only see more positive returns as software is adapted to make use of these new capabilities.

Thanks to Intel for providing the Xeon Platinum 8490H review samples and SPR reference server for delivering these benchmarks. Stay tuned to Phoronix for many more Sapphire Rapids benchmarks on Phoronix over the weeks and months ahead.

If you enjoyed this article consider joining Phoronix Premium to view this site ad-free, multi-page articles on a single page, and other benefits. PayPal or Stripe tips are also graciously accepted. Thanks for your support.


Related Articles
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.