AMD GPU Operator Announced For Automated Driver Installation & Kubernetes Support

Written by Michael Larabel in AMD on 29 January 2025 at 08:21 PM EST. 4 Comments
AMD
AMD today announced two new software projects to better enhance their software support for Instinct accelerators / graphics deployments within the data center: AMD GPU Operator and AMD Metrics Exporter.

These new software tools from AMD are designed to help ease the setup and ongoing maintenance for server administrators managing clusters of AMD GPU/accelerator enabled servers in the data center.

AMD GPU Operator allows for the automated driver installation and management for the AMD driver / ROCm compute stack, easy deployment of AMD GPU device plug-ins, simplified GPU resource allocation for containers, automatic worker node labeling, and support for the upstream/vanilla Kubernetes.

The AMD Device Metrics Exporter provides Prometheus-formatted metrics collections for AMD GPUs within HPC and AI environments for various GPU telemetry data, Kubernetes integration, and more. Among the metrics collected by the AMD Device Metrics Exporter are for operating temperatures, performance/utiilization data, clock speeds, power consumption, device memory statistics, and PCI Express metrics.

AMD GPU Operator aims to deliver a "zero-touch GPU setup" with its automatic ROCm driver management while being paired with enterprise-minded features to make the initial deployment and ongoing maintenance much easier for AMD hardware within varying sizes of AI and HPC deployments.

So far with these new software tools from AMD only the Instinct MI300X / MI250 / MI210 hardware is supported. The Kubernetes support covers Ubuntu 22.04 LTS and Ubuntu 24.04 LTS while on Red Hat Core OS is Red Hat OpenShift support.

More details on these new AMD GPU enterprise software tools via rocm.blogs.amd.com. These new tools arrive one day after the release of ROCm 6.3.2. The new tools are open-source with the code available via device-metrics-exporter and gpu-operator on GitHub.

New AMD software announcement


AMD GPU Operator quietly saw its v1.0 release this past November and the AMD Device Metrics Exporter celebrated its v1.0 release in December but the software was only "announced" today via the ROCm blog.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week