Intel Begins Volleying Open-Source Patches Around Intel AMX
On top of AVX-512 and DL-BOOST and the company's other efforts for making Xeon better optimized for handling modern AI workloads, Advanced Matrix Extensions (AMX) aims to further enhance their AI performance for both training and inference workloads. AMX consists of "tiles" as a set of two-dimensional registers for representing a larger memory image and accelerators that can operate on said tiles. Initial AMX features are for BFloat16, TILE, and INT8 while new accelerators can be introduced later on.
Some of the highlights from the updated Intel reference guide with the AMX addition this week:
Intel Advanced Matrix Extensions (Intel AMX) is a new 64-bit programming paradigm consisting of two compo-nents: a set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image, and an accelerator able to operate on tiles, the first implementation is called TMUL (tile matrix multiply unit).
An Intel AMX implementation enumerates to the programmer how the tiles can be programmed by providing a palette of options. Two palettes are supported; palette 0 represents the initialized state, and palette 1 consists of 8 KB of storage spread across 8 tile registers named TMM0..TMM7. Each tile has a maximum size of 16 rows x 64 bytes, (1 KB), however the programmer can configure each tile to smaller dimensions appropriate to their algo-rithm. The tile dimensions supplied by the programmer (rows and bytes_per_row, i.e., colsb) are metadata that drives the execution of tile and accelerator instructions. In this way, a single instruction can launch autonomous multi-cycle execution in the tile and accelerator hardware. The palette value (palette_id) and metadata are held internally in a tile related control register (TILECFG). The TILECFG contents will be commensurate with that reported in the palette_table (see “CPUID—CPU Identification” in Chapter 1 for a description of the available parameters).
Intel AMX is an extensible architecture. New accelerators can be added, or the TMUL accelerator may be enhanced to provide higher performance.
Intel AMX instructions use new registers and inherit basic behavior from Intel architecture in the same manner that Intel SSE and Intel AVX did.
See the reference guide (PDF) with chapter three outlining Intel AMX for all of the details on this AI-focused addition to premiere with Sapphire Rapids processors.
With AMX details becoming public, there is also Linux/open-source patch work beginning to flow around Intel AMX in ensuring the software support is in good shape ahead of those processors shipping in 2021. On Friday, Intel open-source compiler toolchain expert H.J. Lu began landing changes for the GNU Assembler (Gas) code with initial changes to begin making accommodations for Intel AMX. merged on Friday as well for the GNU C Library (Glibc) was Intel AMX detection. Recent Linux kernel changes around the XSAVES supervisor states also appears in preparation for Intel AMX among other upcoming CPU features.
Outside of key Linux/open-source components, there has also been other AMX patchwork beginning to flow out of Intel. For instance, yesterday also brought patches for the Xbyak JIT assembler for handling AMX.
As more Intel AMX bits land on the Linux/open-source side, I'll be sure to note them on Phoronix.