NVIDIA GH200 CPU Performance Benchmarks Against AMD EPYC Zen 4 & Intel Xeon Emerald Rapids

Written by Michael Larabel in Processors on 8 February 2024 at 01:00 PM EST. Page 5 of 5. 90 Comments.
DuckDB benchmark with settings of Benchmark: IMDB. GPTshop GH200 was the fastest.
DuckDB benchmark with settings of Benchmark: TPC-H Parquet. Xeon Platinum 8592+ was the fastest.
RawTherapee benchmark with settings of Total Benchmark Time. Xeon Platinum 8592+ was the fastest.
Stress-NG benchmark with settings of Test: Matrix Math. EPYC 9754 2P was the fastest.
Stress-NG benchmark with settings of Test: Matrix 3D Math. Xeon Platinum 8592+ 2P was the fastest.
Timed Gem5 Compilation benchmark with settings of Time To Compile. Xeon Platinum 8592+ 2P was the fastest.

It was very interesting to see the GH200 CPU performance competing surprisingly well against the single socket x86_64 Intel / AMD current generation server processors in raw performance. Too bad that there wasn't any power monitoring support available for comparison in power efficiency for today's benchmarks.

Geometric Mean Of All Test Results benchmark with settings of Result Composite, GPTshop.ai GH200 Linux Benchmarks. EPYC 9754 2P was the fastest.

On a geo mean basis across all the benchmarks conducted, the GH200 Grace CPU performance nearly matched the Intel Xeon Platinum 8592+ Emerald Rapids processor. The Arm Neoverse-V2 based Grace CPU tended to be much faster than the 128-core Ampere Altra Max AArch64 server. It will be interesting to see how AmpereOne can compete albeit no hardware available yet for testing. (Unfortunately no AMD MI300A hardware either for testing right now.) The NVIDIA ARM CPU performance has certainly come a long way from benchmarking the NVIDIA Tegra early days for ARM performance.

More of the CPU benchmark numbers are available via this result file. There's also some other benchmarks here from some of the preliminary testing.

Overall the NVIDIA GH200 CPU benchmarking was quite fascinating to see its early potential. There still are some workloads not too well optimized for AArch64 and in some cases the higher core counts and dual socket configurations available with Intel Xeon Emerald Rapids and AMD EPYC Genoa(X) / Bergamo could drive the results much higher. NVIDIA does also have a 144 core Grace Superchip version too, albeit not tested yet. There is also the H100 GPU with this NVIDIA Grace Hopper superchip which will be part of my next benchmark investigation, among other tests from this GPTshop.ai GH200 system.

Thanks to GPTshop.ai for providing the GH200 system access for making this benchmarking possible. Those wanting to learn more about the GPTshop.ai systems can do so via GPTshop.ai.

If you enjoyed this article consider joining Phoronix Premium to view this site ad-free, multi-page articles on a single page, and other benefits. PayPal or Stripe tips are also graciously accepted. Thanks for your support.


Related Articles
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.