My takeout is that there's still a lot of maturing of software ecosystem left to do ... unfortunately. But yes, very refreshing set of benchmarks.
NVIDIA GH200 Grace CPU vs. AMD EPYC 9005 Turin CPU Performance
Collapse
X
-
GH200 is targeted at HPC workloads with GPU acceleration. That's why it has 1 Grace and coherent HBM memory with the H100. The Grace superchip with 2x Grace (144 cores and double memory bandwidth, at 500W) is the one to compare to the AMD Turin. The equivalent of GH200 from AMD is MI300A (Zen 4). The newer MI325X is GPU-only just like GB200.
Unfortunately this expensive hardware isn't easily obtainable (and that is what makes them special anyways).
Comment
-
-
Comment
-
-
Originally posted by ikoz View PostGH200 is targeted at HPC workloads with GPU acceleration. That's why it has 1 Grace and coherent HBM memory with the H100.
Originally posted by ikoz View PostThe Grace superchip with 2x Grace (144 cores and double memory bandwidth, at 500W) is the one to compare to the AMD Turin.
I think the most interesting points of comparison were the 64-core 9575F and the 96-core 9655, both of which are rated at 400W. Interestingly, the 9575F has 5.0/3.3 GHz boost/base clocks. The 9655 has 4.5/2.6 GHz. I'm reading Grace has a base clock speed of 2.8 GHz, although I don't know if that's standard or system-specific. In the article, lscpu lists its boost clock as 3.5 GHz.
Originally posted by ikoz View PostThe equivalent of GH200 from AMD is MI300A (Zen 4). The newer MI325X is GPU-only just like GB200.
Unfortunately this expensive hardware isn't easily obtainable (and that is what makes them special anyways).
BTW, the way to benchmark exotic hardware is to find a cloud instance that's available. If you can use spot pricing, it might be fairly affordable to run an hour's worth of benchmarks. However, I'll bet there's probably zero spot availability for any of the latest products. So, Michael would have to reach out to the manufacturers (or a cloud operator) and see if they'd make special arrangements with him. He's already testing the Grace CPU remotely.Last edited by coder; 08 November 2024, 02:38 PM.
Comment
-
-
Originally posted by dkokron View PostI wonder what the performance/$ comparison looks like.
Graviton 4 is much more compelling on perf/$, if you go purely by their billing rates. It uses the same Neoverse V2 cores as Grace, but clocked a bit lower. It uses 96 cores per CPU, with 768-bit DDR5-5600, which works out to 537.6 GB/s. Nvidia claims Grace's LPDDR5X is good for 500 GB/s. So, they probably have about the same bandwidth per core per GHz.
Comment
-
-
While socket level comparison is fair enough as that compares the performance of one off the shelf package (cpu cores, caches, fabric , memory controllers, io controllers, other soc components) against other.
Given that two parts have different numbers of cores, Turin supports HT and Neoverse V2 cores are single threaded, thus its not an apple to apple comparison if seen in cloud context where VM instances run over fixed number of vCPUs.
Most of your benchmarking is inclined towards bare metal instances, it will help the readers if you also measure the performance of selected workloads in cloud context with released parts hoisted by CSPs at nearly ISO configuration in terms of vCPUs and memory.
Comment
-
-
Originally posted by jbhateja View PostWhile socket level comparison is fair enough as that compares the performance of one off the shelf package (cpu cores, caches, fabric , memory controllers, io controllers, other soc components) against other.
Given that two parts have different numbers of cores, Turin supports HT and Neoverse V2 cores are single threaded, thus its not an apple to apple comparison if seen in cloud context where VM instances run over fixed number of vCPUs.
Most of your benchmarking is inclined towards bare metal instances, it will help the readers if you also measure the performance of selected workloads in cloud context with released parts hoisted by CSPs at nearly ISO configuration in terms of vCPUs and memory.
But beyond that most cloud providers offer very limited access to free/gratis instances for benchmarking, especially after launch of any new instance type. And not within budget to do all that extra cloud benchmark runs when not provided by the CSP. When I do run CSP benchmarks at-cost for instances myself, most of the articles don't even make any profit but mostly out of my own technical interest.
Edit: and with the CSP comparisons also worth mentioning the lack of CPU power monitoring access.Michael Larabel
https://www.michaellarabel.com/
Comment
-
Comment