Intel AMX Support Lands For Linux 5.16
Intel was quick to land the initial AMX changes into the LLVM compiler and GCC compiler while getting the kernel-side bits squared away is what took more than one year but now in mainline, months ahead of Sapphire Rapids' ramp in Q2.
A number of kernel changes were required to enable AMX usage, including that user-space applications need to actually request AMX usage from the kernel via a prctl interface. Compared to AVX and most other instruction set extensions where applications can just use it if supported by the CPU, with AMX it first needs to be requested by the kernel. This should help isolate the behavior of frequency down-clocking and other performance impact that can occur if the kernel is given more control and aware of an application planning to make use of AMX instructions.
Plus a lot of other changed kernel code in preparing for Advanced Matrix Extensions.
On Monday the support was merged as part of the "x86/fpu" changes for Linux 5.16.
- Add AMX (Advanced Matrix eXtensions) support (finally):
AMX is a large XSTATE component which is going to be available with Saphire Rapids XEON CPUs. The feature comes with an extra MSR (MSR_XFD) which allows to trap the (first) use of an AMX related instruction, which has two benefits:
1) It allows the kernel to control access to the feature
2) It allows the kernel to dynamically allocate the large register state buffer instead of burdening every task with the the extra 8K or larger state storage.
It would have been great to gain this kind of control already with AVX512.
The support comes with the following infrastructure components:
1) arch_prctl() to
- read the supported features (equivalent to XGETBV(0))
- read the permitted features for a task
- request permission for a dynamically enabled feature
Permission is granted per process, inherited on fork() and cleared on exec(). The permission policy of the kernel is restricted to sigaltstack size validation, but the syscall obviously allows further restrictions via seccomp etc.
The big refactoring of the FPU code, which allowed to do a proper integration has been started exactly 3 weeks ago. Refactoring of the existing FPU code and of the original AMX patches took a week and has been subject to extensive review and testing. The only fallout which has not been caught in review and testing right away was restricted to AMX enabled systems, which is completely irrelevant for anyone outside Intel and their early access program. There might be dragons lurking as usual, but so far the fine grained refactoring has held up and eventual yet undetected fallout is bisectable and should be easily addressable before the 5.16 release. Famous last words...