Intel Proposing XeGPU Dialect For LLVM MLIR

Written by Michael Larabel in LLVM on 17 December 2023 at 06:24 AM EST. Add A Comment

As part of Intel's ongoing quest for maximizing the compute performance of their GPUs/accelerators, their compiler engineers have proposed introducing a XeGPU dialect for LLVM's MLIR.

Jianhui Li with Intel is proposing a XeGPU MLIR dialect for Intel GPUs. He explained on Friday in this request for comments:

To support high-performance GEMM code generation on Intel GPU, we propose XeGPU dialect. XeGPU dialect provides an abstraction that closely models Xe instructions. XeGPU ops are introduced when a special Xe instruction can’t be expressed by LLVM/SPIR-V dialect, for example, like matrix instruction (AKA DPAS) and 2D block load. It matches the hardware instructions’ semantics including the matrix sizes. XeGPU dialect is similar to NVGPU and AMDGPU dialect and works as a bridge dialect providing target-specific operations on MLIR memref and vector data types.

XeGPU dialect models a subset of Xe GPU’s unique features focusing on GEMM performance. The operations include 2d load, dpas, atomic, scattered load, 1d load, named barrier, mfence, and compile-hint. These operations provide a minimum set to support high-performance MLIR GEMM implementation for a wide range of GEMM shapes. XeGPU dialect complements Arith, Math, Vector, and Memref dialects. This allows XeGPU based MLIR GEMM implementation fused with other operations lowered through existing MLIR dialects.

Intel does already have an Intel Extension to MLIR as well as some high performance XeGPU GEMM implementations. Intel has demonstrated this code at close-to-peak GEMM performance on Intel Max series graphics hardware.

"XeGPU dialect models a subset of Xe GPU’s ISA. This is the counterpart of NVGPU and AMDGPU dialects, which provide a bridge dialect in the MLIR gradual lowering. XeGPU dialect works with MLIR memref and vector type and complements Arith, Math, Vector, and Memref dialects. XeGPU operations are introduced when there is a special Xe instruction not modeled by LLVM/SPIR-V dialect, for example, like DPAS and 2D block load. In some cases, one XeGPU op may lower to a sequence of instructions for a dedicated and performance-critical function. For example, create_tdesc is mapped to a fixed sequence of instructions to create an address description."

This RFC documentation has more details on this proposed XeGPU dialect for MLIR for those interested compiler engineers/enthusiasts.

Falcon Shores

This ongoing graphics compiler work, the new Xe kernel driver, and other ongoing open-source investments by Intel will be all the more important by the time the Falcon Shores APU launches in 2025. Exciting times ahead.

Add A Comment