Intel Xeon Max 9480/9468 Show Significant Uplift In HPC & AI Workloads With HBM2e

Written by Michael Larabel in Processors on 28 June 2023 at 05:00 PM EDT. Page 7 of 7. 26 Comments.
CPU Power Consumption Monitor benchmark with settings of Phoronix Test Suite System Monitoring.

Here is a look at the CPU power consumption across the wide range of benchmarks conducted. When operating in HBM-only mode, the combined dual socket processor power consumption was slightly higher. With the recorded peak CPU power consumption with the HBM use there was some times with significantly higher power draw. However, this in part may be due to a PowerCap/RAPL driver bug or some other platform oddity... Since when looking at the IPMI-reported AC server power consumption numbers below, they don't align with the peaks seen from the PowerCap-provided results.

System Power Consumption Monitor benchmark with settings of Phoronix Test Suite System Monitoring.

The AC power consumption numbers obtained via the Super Micro IPMI interface shows in HBM-only mode slightly lower power consumption than with HBM inactive or HBM caching modes -- since the sixteen DDR5 server DIMMs weren't populated. So there is some power-savings to enjoy if you are able to operate in HBM-only mode for your workloads. Plus avoiding the expense of all the DDR5 server memory

Geometric Mean Of All Test Results benchmark with settings of Result Composite, Intel Xeon Max 9468 and Xeon Max 9480 Linux Benchmarks. Xeon Max 9468 2P: HBM Only was the fastest.

When taking the geometric mean for these workloads able to leverage the HBM2e on Xeon Max, the HBM caching mode boosted the performance by around 10~11%. When going HBM-only the performance improved by another ~8%. Or overall if comparing the performance of the Xeon Max 9468/9480 with no HBM2e memory being used to operating everything off the 128GB (dual socket) of HBM2e, there was 18~20% higher performance overall for this wide mix of workloads from OpenVINO to OpenFOAM and many other HPC/AI benchmarks tested.

Again though it highly depends upon the workloads of relevance to your computing purposes. For OpenFOAM CFD, OpenVINO AI, and many other workloads were significant improvements in HBM-only mode. Paired with the savings on not having to invest in DDR5 server memory if able to get by on 64GB or 128GB of HBM2E and the flagship Xeon Max 9480 costing roughly $12k, the Xeon Max line-up is very interesting for Sapphire Rapids, especially for various HPC and AI workloads. Particularly for AI workloads prepared to make use of Intel's Advanced Matrix Extensions, Xeon Max is effectively a double win between AMX and HBM2E.

With this geo mean the Xeon Max 9468 and Xeon Max 9480 overall were quite close. The Xeon Max 9468 does have the slight frequency advantage over the Xeon Max 9480 while the 9480 obviously has the core advantage... But for both processors fighting for just 64GB of HBM2e memory or a little bit more than 1GB per core, the Xeon Max 9468 can enjoy slightly less resource contention with eight less cores.

This Xeon Max testing was also done with air cooling based on the hardware provided. However, Intel does encourage the use of liquid cooling by their partners particularly for the Xeon Max 9480 SKU.

Xeon Max does support AMX and DSA but does not offer any of the QAT / DLB / IAA accelerator devices available with other Sapphire Rapids processors. However, the software ecosystem support around the new Intel accelerators is still limited so aside from some particular use-cases isn't much of a blemish for Xeon Max.

The main limitation though is having just 64GB of HBM2E memory per CPU, which for the flagship Xeon Max 9480 at 56 cores means a little more than 1GB per core. Those considering Xeon Max for the HBM-only route will need to ensure that they won't hit any memory limits/contention to where the performance would be negatively impacted. Hopefully for future Xeon Max processors we'll manage to see Intel achieve at least 128GB of HBM2E in the higher core count CPUs. Another obstacle is the Xeon Max 9480 tapping out at 56 cores compared to the non-Max Sapphire Rapids processors achieving up to 60 cores, AMD 4th Gen EPYC Genoa managing up to 96 cores per socket, and AMD's Bergamo hitting 128 cores per socket. For workloads very memory bound the Xeon Max line-up with HBM2E can be a delight but in competing with the competition or even SPR non-max there are certainly workloads where having a higher core count is more advantageous. In any event it's been very interesting to see how these Xeon Max processors perform in HBM caching and HBM-only modes of operation.

Stay tuned to Phoronix for more Xeon Max Linux benchmarking and looking at other elements of the performance and how these results in turn stack up against other processors. Thanks to Intel and Supermicro (Hyper SuperServer SYS-221H-TNR) for providing the Xeon Max hardware used for this interesting round of testing.

If you enjoyed this article consider joining Phoronix Premium to view this site ad-free, multi-page articles on a single page, and other benefits. PayPal or Stripe tips are also graciously accepted. Thanks for your support.


Related Articles
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.