NVIDIA GH200 CPU Performance Benchmarks Against AMD EPYC Zen 4 & Intel Xeon Emerald Rapids

Written by Michael Larabel in Processors on 8 February 2024 at 01:00 PM EST. Page 1 of 5. 90 Comments.

Kicking off our NVIDIA GH200 Grace Hopper benchmarking at Phoronix is an initial look at the 72-core Grace CPU performance with 96GB of HBM3 memory. Here are some initial benchmarks of the Grace CPU performance while the Hopper GPU benchmarks will be coming in a follow-up article.

GPTshop.ai GH200 AI desktop

NVIDIA's GH200 combines the 72-core Grace CPU with H100 Tensor Core GPU and support for up to 480GB of LPDDR5 memory and 96GB of HBM3 or 144GB of HBM3e memory. The Grace CPU employs Arm Neoverse-V2 cores with 1MB of L2 cache per core and 117MB of L3 cache.

NVIDIA GH200 diagram

GPTshop.ai provided access to the NVIDIA GH200 for benchmarking at Phoronix. GPTshop.ai is building what they aim to be "the ultimate high-end desktop" as a supercomputer built around the GH200 focused on AI and HPC workloads. Their system uses the GH200 Grace Hopper Superchip dual 2000+ Watt power supplies, QCT motherboard, and can be configured with multiple SSDs as well as various NVIDIA Bluefield/Connect-X adapters and more.

Given the desktop focus, the GPTshop.ai GH200 system is housed in a tower chassis. The system has also been optimized to run very quiet while still relying on air cooling -- though they do have a GH200 Liquid model for those wanting liquid cooling.

Pricing with the GH200 does not come cheap with the currently available GPTshop.ai GH200 576GB model starting out at 47,500 € (~$41k USD due to no taxes when shipped outside the EU). See here for the different GPTshop.ai AI desktop system configurations and purchasing.

GH200 CPU output

With the NVIDIA GH200 standard AArch64 Linux distributions can run on the system. For the purposes of this testing Ubuntu 23.10 with Linux 6.5 was used for having an up-to-date kernel as well as the GCC 13 stock compiler. The toolchain versions are close to what will be found in Ubuntu 24.04 LTS in April and using Ubuntu 23.10 is worthwhile for a leading-edge look at the NVIDIA GH200 Linux performance as well as against the other Intel Xeon Scalable, AMD EPYC, and Ampere Altra Max processors for this comparison.

GH200 CPU Linux information

The GPTshop.ai GH200 system as tested was with 72 cores, Quanta S74G motherboard, 480GB of RAM, and 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 SSD drives. All of the server processors tested for this comparison were running at their top-rated memory frequencies and maximum number of memory channels supported.

The other processors featured for this initial GH200 CPU benchmarking included:

- EPYC 8534P
- EPYC 8534PN
- EPYC 9554
- EPYC 9554 2P
- EPYC 9654
- EPYC 9654 2P
- EPYC 9684X
- EPYC 9684X 2P
- EPYC 9754
- EPYC 9754 2P
- Xeon Platinum 8280 2P
- Xeon Platinum 8380
- Xeon Platinum 8380 2P
- Xeon Platinum 8490H
- Xeon Platinum 8490H 2P
- Xeon Platinum 8592+
- Xeon Platinum 8592+ 2P
- Ampere Altra Max M128-30
- GPTshop.ai GH200

As mentioned, Ubuntu 23.10 was in use on all of these servers and for this initial round of benchmarking is all CPU-focused benchmarks. Benchmarks looking at the H100 GPU will be for follow-up articles along with various other Linux performance benchmarks of the GPTshop.ai GH200 as well as in the weeks ahead comparing the GPTshop.ai performance against QCT and Giga Computing GH200 servers.

GPTshop.ai GH200 Linux Benchmarks

Thanks to GPTshop.ai for providing the remote access for making this independent NVIDIA GH200 benchmarking possible.

Unfortunately there are no power consumption numbers for today's article. The NVIDIA GH200 doesn't appear to currently expose any RAPL/PowerCap/HWMON interface under Linux for being able to read just the GH200 power/energy use. The BMC on the system does expose the overall system power consumption via the web interface but the power data was not exposed by IPMI for being able to query this cleanly from the host. So, unfortunately, for this article are just the initial raw CPU performance benchmark numbers while working on figuring out any way for being able to nicely read any available power metrics.


Related Articles