Announcement

**AndyChow** · 27 June 2023, 04:33 PM

I'd love to know if anyone is using this in the field. It seems again like another solution looking for a problem. Unless I see benchmarks showing otherwise.

**bridgman** · 27 June 2023, 07:28 PM

Originally posted by AndyChow View Post

I'd love to know if anyone is using this in the field. It seems again like another solution looking for a problem. Unless I see benchmarks showing otherwise.

Did you mean to post in a different thread ? This change is about error reporting - routing GPU and CPU memory errors through the same reporting paths when CPU and GPU are running on a common data fabric. Not sure how one might benchmark it.

AFAIK this is mostly applicable to supercomputers since that is where XGMI interconnect between CPU and GPU is frequently used.

**AndyChow** · 27 June 2023, 11:25 PM

Originally posted by bridgman View Post

Did you mean to post in a different thread ? This change is about error reporting - routing GPU and CPU memory errors through the same reporting paths when CPU and GPU are running on a common data fabric. Not sure how one might benchmark it.

AFAIK this is mostly applicable to supercomputers since that is where XGMI interconnect between CPU and GPU is frequently used.

I was talking in general about AMD Instinct MI200. I see things like "we support PyTorch", but I'd like to see benchmarks comparing that to say nvidia A100.

**bridgman** · 28 June 2023, 09:39 AM

Ahh, OK. I'll see what I can dig up, but quick answer is that MI250 tends to be quite a bit faster than A100 at large data formats while performance tends to be closer to A100 for the smaller data formats. MI210 is basically half of an MI210.

As an example, comparing the Leonardo (#4) and Frontier (#1) supercomputers from the TOP500 list using Rmax (measured) and ignoring CPU FP contribution we get something like:

Leonardo - 13,824 A100 - 239 PF/s - 304 PF/s peak - 17.3 TF/s Linpack per GPU

Frontier - 37,488 MI250 - 1194 PF/s - 1680 PF/s peak - 31.9 TF/s Linpack per GPU

This is rough in the sense that the EPYC CPUs in Frontier probably contribute a bit more per-GPU FP64 throughput than the Sapphire Rapids CPUs in Leonardo, but these were the closest numbers I could find quickly.

EDIT - found a few more for MI210, which is basically half of an MI250

1/2x A100 performance running TF (so MI250 would be ~1x):

AMD MI210 GPU Testing

https://researchcomputing.princeton.edu/amd-mi210-gpu-testing

Overview The della-milan node features the AMD EPYC 7763 CPU (128 cores), 1 TB of RAM and 2 AMD MI210 GPUs. The Frontier supercomputer, which is the fastest machine in the US, features the MI250X GPU. Connecting If you have an account on the Della cluster and you have written to [email protected] for access to della-milan (you must be added t...

GROMACS performance was low but there has been a lot of tuning work since then - will see what I can find.

~1.5x A100 running HPL:

AMD Instinct MI210 MI100 NVIDIA A100 V100 HPL Performance In TFLOPS - ServeTheHome

https://www.servethehome.com/asus-esc4000a-e11-review-2u-1p-amd-epyc-and-instinct-gpu-server/amd-instinct-mi210-mi100-nvidia-a100-v100-hpl-performance-in-tflops/

AMD Instinct MI210 MI100 NVIDIA A100 V100 HPL Performance In TFLOPS

Announcement

AMD EDAC/RAS Code Adds GPU/Accelerator Support In Linux 6.5

AMD EDAC/RAS Code Adds GPU/Accelerator Support In Linux 6.5

Comment

Comment

Comment

Comment