AMD Unified AI Software Stack Has The Potential To Be A Very Big Deal

Written by Michael Larabel in AMD on 15 July 2024 at 09:00 AM EDT. 18 Comments

Alongside all of the exciting Ryzen 9000 and Ryzen AI 300 series details shared last week at the AMD Tech Day in Los Angeles, what I also found to be very interesting was AMD sharing a bit more about a "Unified AI Software Stack" they are working to release in the coming quarters.

AMD is working to introduce a Unified AI Software Stack around the end of the calendar year. Simply put, it aims to ensure a performant and optimally accelerated AI experience whether it means tasking your CPU cores, NPU, or GPU(s) with AI workloads. Details were relatively light at this point as there's still months to go until its debut, but it's a very interesting topic and I wish they would have had a session dedicated entirely on it -- but presumably we'll be learning much more as the actual software release nears. This also isn't the first time we've heard AMD bring up a "Unified AI Software Stack" as back in 2022 and then in 2023 briefly mentioned such plans, but now it looks like it's finally coming to fruition.

This isn't a replacement to ROCm or other existing AMD AI software efforts but from the sounds of it at least will serve as sort of an AI workload scheduler and other libraries to help in a unified offloading interface to AMD's heterogeneous products. Developers targeting the AMD Unified AI Software Stack will target LLVM's wonderful MLIR intermediate representation. From there the Unified AI Software Stack will evaluate the MLIR characteristics and determine whether to send the work off to the CPU, GPU, or NPU for execution. At least that's how it was summed up rather simply when brought up last week in LA. So taking into account the MLIR for execution and each AMD device's capabilities (hardware features, current resource utilization, etc), the Unified AI Software Stack will be able to make an informed choice over where to place the workload for execution.

Focusing on MLIR (the Multi-Level Intermediate Representation) as the IR makes sense given it's a great intermediate representation and has a lot of usage around the LLVM community/developers. It also makes sense given AMD's acquisition last year of Nod.ai with all of their MLIR developer talent. Unlike LLVM IR and some alternative forms of intermediate representation, MLIR is able to represent data-flow graphs such as with TensorFlow as well as a variety of other features to help with deep learning workloads and better handling hardware-specific operations.

I wasn't able to get a direct answer last week in LA but I brought up the fact that a few weeks early I spotted the new Peano LLVM-based compiler for AMD NPUs. Presumably this AMD Peano compiler will be part of this Unified AI Software Stack for taking the MLIR to NPU (AMD XDNA) execution path.

This does also leave open questions on whether Xilinx / XDNA will see ROCm support... Phoronix readers may recall that back in 2020 ROCm was going to support Xilinx FPGAs but since then there hasn't been much on the topic of ROCm for Xilinx and in turn the XDNA IP now making up their NPUs. We'll see if this Unified AI Software Stack ends up making it unnecessary thanks to the new Peano compiler and the like at least on the XDNA side and focusing on MLIR as the common denominator for the AMD AI compute ecosystem.

It will also be interesting to see if this Unified AI Software Stack ends up working out well for any non-AMD products given the number of different LLVM back-ends out there and the focus on common MLIR. Likewise, it will be interesting to see how well this offload manager / job scheduler works for non-AI workloads given that MLIR can handle much more than just AI workloads but at the same time is limited by each backends exposed capabilities.

It's also possible -- and likely -- the AMD Unified AI Software Stack will introduce more than just this new scheduler / decision maker (execution manager) bits, so we'll see when the time comes what else AMD may introduce to ease the ISV developer experience. At the same time expect ROCm to continue to mature and advance for AI workloads (and more) on GPUs for compute and AMD's other library and compiler improvements. From the sounds of it, the AMD Unified AI Software Stack will be open-source just as they are with their other software components. The slide on Unified AI Software Stack seems to indicate that new optimization utilities and compiler tech may come as part of this new software offering.

For now there are many more questions than answers about the AMD Unified AI Software Stack, but I am hopeful it will lead to many new and interesting AMD AI opportunities over the coming months. At the very least it will hopefully help ISVs juggle their AI workload management between AMD CPUs, GPUs, and NPUs on systems. At the same time though it's unfortunate that such workload partitioner / execution scheduler wasn't available from the start when AMD first introduced Ryzen AI NPUs or even in advance with a CPU+GPU+Xilinx focus to help prepare the developer community for being able to take advantage of AMD's heterogeneous compute offerings. Instead by the time it publicly arrives Ryzen AI NPUs will have been in the marketplace for the better part of two years. But as they say, better late than never and hopefully as 2024 continues on and moving into 2025 we see more continued serious software investments by AMD.

18 Comments