Radeon ROCm 3.1 Released With RAS For Vega 7nm, SLURM Support

Written by Michael Larabel in Radeon on 28 February 2020 at 08:38 AM EST. 43 Comments
A new version of the Radeon Open eCosystem "ROCm" stack is available today but it still doesn't deliver on Navi support.

Radeon Open eCosystem 3.1 is the new release that now versions its default installation directory structure, adds RAS support for 7nm Vega, and also introduces SLURM support.

The Reliability, Accessibility, and Serviceability capabilities are for HBM ECC memory error handling, GFX/MMHUB ECC errors, and PCIe uncorrectable errors. The RAS behavior should these uncorrectable errors happen is to perform a GPU reset using BACO. This 7nm Vega work is presumably under the microscope still for the Vega-based "Arcturus" compute accelerator coming this year.

The other new feature of ROCm 3.1 is SLURM support for AMD GPUs, the Simple Linux Utility for Resource Management. This cluster management and job scheduling system for Linux clusters can now interact with AMD GPUs. The SLURM support is a useful addition given the increasing number of super-computing wins with Radeon GPUs and other larger AMD GPU deployments.

The ROCm 3.1 downloads and more details via GitHub. Unfortunately there still are no signs of GFX10/Navi support.
Related News
About The Author
Author picture

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week