Intel AMX Support Lands For Linux 5.16

Written by Michael Larabel in Intel on 2 November 2021 at 12:00 AM EDT. 2 Comments
After going through a number of rounds of patch revisions over the past year, Intel's kernel-side changes for supporting Advanced Matrix Extensions (AMX) with next-gen Xeon Scalable "Sapphire Rapids" processors has landed for Linux 5.16!

Intel was quick to land the initial AMX changes into the LLVM compiler and GCC compiler while getting the kernel-side bits squared away is what took more than one year but now in mainline, months ahead of Sapphire Rapids' ramp in Q2.

A number of kernel changes were required to enable AMX usage, including that user-space applications need to actually request AMX usage from the kernel via a prctl interface. Compared to AVX and most other instruction set extensions where applications can just use it if supported by the CPU, with AMX it first needs to be requested by the kernel. This should help isolate the behavior of frequency down-clocking and other performance impact that can occur if the kernel is given more control and aware of an application planning to make use of AMX instructions.

Plus a lot of other changed kernel code in preparing for Advanced Matrix Extensions.

On Monday the support was merged as part of the "x86/fpu" changes for Linux 5.16.
- Add AMX (Advanced Matrix eXtensions) support (finally):

AMX is a large XSTATE component which is going to be available with Saphire Rapids XEON CPUs. The feature comes with an extra MSR (MSR_XFD) which allows to trap the (first) use of an AMX related instruction, which has two benefits:

1) It allows the kernel to control access to the feature

2) It allows the kernel to dynamically allocate the large register state buffer instead of burdening every task with the the extra 8K or larger state storage.

It would have been great to gain this kind of control already with AVX512.

The support comes with the following infrastructure components:

1) arch_prctl() to
- read the supported features (equivalent to XGETBV(0))
- read the permitted features for a task
- request permission for a dynamically enabled feature

Permission is granted per process, inherited on fork() and cleared on exec(). The permission policy of the kernel is restricted to sigaltstack size validation, but the syscall obviously allows further restrictions via seccomp etc.
The big refactoring of the FPU code, which allowed to do a proper integration has been started exactly 3 weeks ago. Refactoring of the existing FPU code and of the original AMX patches took a week and has been subject to extensive review and testing. The only fallout which has not been caught in review and testing right away was restricted to AMX enabled systems, which is completely irrelevant for anyone outside Intel and their early access program. There might be dragons lurking as usual, but so far the fine grained refactoring has held up and eventual yet undetected fallout is bisectable and should be easily addressable before the 5.16 release. Famous last words...
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via

Popular News This Week