Announcement

Collapse
No announcement yet.

AMD EDAC/RAS Code Adds GPU/Accelerator Support In Linux 6.5

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD EDAC/RAS Code Adds GPU/Accelerator Support In Linux 6.5

    Phoronix: AMD EDAC/RAS Code Adds GPU/Accelerator Support In Linux 6.5

    In addition to yesterday bringing EDAC support for AMD Zen 4 client CPUs, the set of RAS "Reliability, Availability and Serviceability" updates for the Linux 6.5 kernel have separately brought initial GPU/accelerator support...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    I'd love to know if anyone is using this in the field. It seems again like another solution looking for a problem. Unless I see benchmarks showing otherwise.

    Comment


    • #3
      Originally posted by AndyChow View Post
      I'd love to know if anyone is using this in the field. It seems again like another solution looking for a problem. Unless I see benchmarks showing otherwise.
      Did you mean to post in a different thread ? This change is about error reporting - routing GPU and CPU memory errors through the same reporting paths when CPU and GPU are running on a common data fabric. Not sure how one might benchmark it.

      AFAIK this is mostly applicable to supercomputers since that is where XGMI interconnect between CPU and GPU is frequently used.
      Last edited by bridgman; 27 June 2023, 09:24 PM.
      Test signature

      Comment


      • #4
        Originally posted by bridgman View Post

        Did you mean to post in a different thread ? This change is about error reporting - routing GPU and CPU memory errors through the same reporting paths when CPU and GPU are running on a common data fabric. Not sure how one might benchmark it.

        AFAIK this is mostly applicable to supercomputers since that is where XGMI interconnect between CPU and GPU is frequently used.
        I was talking in general about AMD Instinct MI200. I see things like "we support PyTorch", but I'd like to see benchmarks comparing that to say nvidia A100.

        Comment


        • #5
          Ahh, OK. I'll see what I can dig up, but quick answer is that MI250 tends to be quite a bit faster than A100 at large data formats while performance tends to be closer to A100 for the smaller data formats. MI210 is basically half of an MI210.

          As an example, comparing the Leonardo (#4) and Frontier (#1) supercomputers from the TOP500 list using Rmax (measured) and ignoring CPU FP contribution we get something like:

          Leonardo - 13,824 A100 - 239 PF/s - 304 PF/s peak - 17.3 TF/s Linpack per GPU

          Frontier - 37,488 MI250 - 1194 PF/s - 1680 PF/s peak - 31.9 TF/s Linpack per GPU

          This is rough in the sense that the EPYC CPUs in Frontier probably contribute a bit more per-GPU FP64 throughput than the Sapphire Rapids CPUs in Leonardo, but these were the closest numbers I could find quickly.

          EDIT - found a few more for MI210, which is basically half of an MI250

          1/2x A100 performance running TF (so MI250 would be ~1x):

          Overview The della-milan node features the AMD EPYC 7763 CPU (128 cores), 1 TB of RAM and 2 AMD MI210 GPUs. The Frontier supercomputer, which is the fastest machine in the US, features the MI250X GPU.   Connecting If you have an account on the Della cluster and you have written to [email protected] for access to della-milan (you must be added t...


          GROMACS performance was low but there has been a lot of tuning work since then - will see what I can find.

          ~1.5x A100 running HPL:

          Last edited by bridgman; 28 June 2023, 10:09 AM.
          Test signature

          Comment

          Working...
          X