Intel Xeon Max 9480/9468 Show Significant Uplift In HPC & AI Workloads With HBM2e
Today is a very fun and interesting round of benchmarking... Recently Supermicro sent over their Hyper SuperServer SYS-221H-TNR and Intel supplied the Xeon Max 9468 and Xeon Max 9480 for finally being able to benchmark Xeon Max processors, the Sapphire Rapids parts featuring 64GB of HBM2e memory. For this initial benchmarking article is a look at the Xeon Max 9468/9480 dual socket performance when running in HBM-only mode and HBM-caching mode for showing some of the workloads where Xeon Max can deliver significant uplift compared to when running in flat (1LM) mode without assigning anything to the HBM memory for seeing the impact when the specialized memory goes unused.
Since the launch of Intel 4th Gen Xeon Scalable "Sapphire Rapids" back in January I have been very eager to get my hands on Xeon Max, also commonly referred to as SPR HBM2e. The Xeon Max CPU Series features integrated high-bandwidth memory, which for this generation all of the Xeon Max SKUs have 64GB of HBM2e while the core counts range from 32 up to 56 cores with the flagship Xeon Max 9480.
Like with the conventional Intel 4th Gen Xeon Scalable line-up, the Intel Xeon Max series supports Advanced Matrix Extensions (AMX), AVX-512, DDR5, CXL 1.1, and other common platform features. With Xeon Max the focus isn't on the accelerator story but rather for workloads able to leverage the high bandwidth memory.
The Xeon Max Series supports three different modes of operation: HBM-Only, HBM Flat, and HBM Cache mode. Within the HBM-only mode is where the server operates entirely within the 64GB of HBM2e memory (or 128GB for dual socket scenarios). The HBM-only mode works by simply not populating any of the DDR5 memory slots on the server and booting. The HBM cache mode is the default mode where running Xeon Max CPU(s) while also having DDR5 memory installed. In this mode, the HBM2e works transparently as a cache and requires no software-side changes. Lastly is the HBM flat mode, which can be enabled via the BIOS when DDR5 is populated with a Xeon Max server. In the HBM flat mode, a flat memory region with HBM and DRAM can be established for more flexibility over the software making use of the HBM2E. But for the HBM flat mode, software changes may be needed.
For today's benchmarks to look at the performance of HBM2e, in addition to the HBM caching and HBM-only modes, the HBM flat mode was tested but without assigning anything to the HBM2e memory for effectively testing these processors just on DDR5 memory with the HBM2e going unused to see the impact of it inactive / effectively unused.
The HBM-only mode is very interesting for workloads that can fit within the 64GB per socket capacity. With the Xeon Max 9480 having 56 cores, that's a little more than 1GB of memory per core, which isn't suitable for many of today's highly-threaded workloads but still there is a decent amount of scenarios where the 1~2GB of RAM per core is satisfactory. At the bottom end is the Xeon Max 9462 with 32 cores where at least that means 2GB of HBM2E per core. It will be very interesting if for future generations the Xeon Max series can achieve ~128GB or more of HBM2E for opening up a lot more possibilities with the higher core count parts in ideally having at least 2GB per core.
The benchmarks today are looking at various workloads between the HBM-only and HBM caching mode plus HBM unused/inactive where the software can get by with the 128GB (dual socket) versus the caching mode with 512GB of DDR5-4800 plus the 128GB HBM2E cache. Follow-up articles will look at other areas of the Xeon Max Linux performance for HPC and AI as well as against the competition.
The Xeon Max 9468 is Intel's 48-core SPR HBM2e part that has a base frequency of 2.1GHz and an all-core turbo of 2.6GHz or maximum turbo frequency of 3.5GHz. The Xeon Max 9468 has a 105MB cache besides the HBM2E. The Xeon Max 9480 flagship processor has 56 cores, 1.9GHz base clock. 2.6GHz all-core turbo, and 3.5GHz maximum turbo frequency while having a 112.5MB cache. Both the Xeon Max 9468 and Xeon Max 9480 have a 350 Watt TDP rating.
The Intel Xeon Max 9480 has a recommended customer price of $12,980 which is quite a lot lower than the 60-core Xeon Platinum 8490H having a $17,000 price-tag and not too much higher than the AMD EPYC 9654 at around $11,800. Plus if you are able to get by in HBM-only mode is quite a bit of savings on the DDR5 memory costs.
For testing the Intel Xeon Max processors, Supermicro supplied a Hyper SuperServer SYS-221H-TNR review unit. The SYS-221H-TNR is a nice dual socket LGA-4677 solution with all the bells and whistles you'd want for Sapphire Rapids. A Supermicro SYS-221H-TNR review will be coming separately among the other Xeon Max articles coming up on Phoronix over the weeks ahead.
For this initial round of testing all of the Supermicro SYS-221H-TNR + Xeon Max 9468/9480 dual socket testing was done with Ubuntu 23.04 using its stock Linux 6.2 kernel and GCC 12.2 compiler while running in the Intel CPUFreq performance governor mode. The server was operating for all of the benchmarks in the SNC4 mode.
It is worth pointing out that all the processor tests today were with the stock air cooling of the SYS-221H-TNR. For very demanding Xeon Max 9480 deployments, Intel encourages the use of liquid cooling. Intel is encouraging their partners to use liquid cooling to meet the specified case temperature (TCase) of the given SKU. The TCase for the Xeon Max 9480 is recommended to be 64°C or 77°C on the Xeon Max 9468.
Let's move on and see where the Xeon Max with HBM2E memory is making an interesting impact on performance.