Intel Details APX - Advanced Performance Extensions
Following Advanced Vector Extensions (AVX) and more recently Advanced Matrix Extensions (AMX) for furthering the x86_64 CPU compute potential, Intel has now published initial details on APX: Advanced Performance Extensions.
Intel's Advanced Performance Extensions are to allow access to more registers and adding additional features to enhance general-purpose CPU performance. Intel says APX will allow for performance gains across a wide swath of workloads and without costing much in terms of CPU power or silicon area.
Among the early details Intel outlined with APX:
Intel APX ultimately comes down to 16 more general purpose registers, three-operand instruction formats with a new data destination register for many integer operations, conditional ISA improvements, optimized register state save/restore operations, and a new 64-bit absolute direct jump instruction.
Overall this Intel Advanced Performance Extensions sounds very promising from the performance perspective and great that the compiler enablement is a low barrier and that from the user-space software application side should just be a matter of recompiling software without any expected code changes. Hopefully we'll be seeing the Intel APX Linux and open-source toolchain software support along with Intel processors supporting APX sooner than later. At least with Intel's history, now that they have made the APX technical documents public, I wouldn't be at all surprised if the Linux/open-source patches begin appearing within days.
Those wanting to learn more about Intel APX can do so via the Intel.com page.
As part of the Intel APX disclosure, Intel also today announced AVX10 as evolving AVX-512 for future CPUs including support for both P and E core designs.
Intel's Advanced Performance Extensions are to allow access to more registers and adding additional features to enhance general-purpose CPU performance. Intel says APX will allow for performance gains across a wide swath of workloads and without costing much in terms of CPU power or silicon area.
Among the early details Intel outlined with APX:
"Intel APX doubles the number of general-purpose registers (GPRs) from 16 to 32. This allows the compiler to keep more values in registers; as a result, APX-compiled code contains 10% fewer loads and more than 20% fewer stores than the same code compiled for an Intel 64 baseline. Register accesses are not only faster, but they also consume significantly less dynamic power than complex load and store operations.
Compiler enabling is straightforward – a new REX2 prefix provides uniform access to the new registers across the legacy integer instruction set. Intel AVX instructions gain access via new bits defined in the existing EVEX prefix. In addition, legacy integer instructions now can also use EVEX to encode a dedicated destination register operand – turning them into three-operand instructions and reducing the need for extra register move instructions. While the new prefixes increase average instruction length, there are 10% fewer instructions in APX-compiled code, resulting in similar code density as before.
The new GPRs are XSAVE-enabled, which means that they can be automatically saved and restored by XSAVE/XRSTOR sequences during context switches. They do not change the size and layout of the XSAVE area as they take up the space left behind by the deprecated Intel MPX registers.
...
The performance features introduced so far will have limited impact in workloads that suffer from a large number of conditional branch mispredictions. As out-of-order CPUs continue to become deeper and wider, the cost of mispredictions increasingly dominates performance of such workloads. Branch predictor improvements can mitigate this to a limited extent only as data-dependent branches are fundamentally hard to predict.
To address this growing performance issue, we significantly expand the conditional instruction set of x86, which was first introduced with the Intel® Pentium® Pro in the form of CMOV/SET instructions. These instructions are used quite extensively by today’s compilers, but they are too limited for broader use of if-conversion (a compiler optimization that replaces branches with conditional instructions)."
Intel APX ultimately comes down to 16 more general purpose registers, three-operand instruction formats with a new data destination register for many integer operations, conditional ISA improvements, optimized register state save/restore operations, and a new 64-bit absolute direct jump instruction.
Overall this Intel Advanced Performance Extensions sounds very promising from the performance perspective and great that the compiler enablement is a low barrier and that from the user-space software application side should just be a matter of recompiling software without any expected code changes. Hopefully we'll be seeing the Intel APX Linux and open-source toolchain software support along with Intel processors supporting APX sooner than later. At least with Intel's history, now that they have made the APX technical documents public, I wouldn't be at all surprised if the Linux/open-source patches begin appearing within days.
Those wanting to learn more about Intel APX can do so via the Intel.com page.
As part of the Intel APX disclosure, Intel also today announced AVX10 as evolving AVX-512 for future CPUs including support for both P and E core designs.
60 Comments