Intel Continues GCC Compiler Preparations For AVX10 & APX

Written by Michael Larabel in Intel on 25 September 2023 at 11:28 AM EDT. 5 Comments

Since announcing the Advanced Performance Extensions (APX) and AVX10 back in July, Intel's open-source compiler engineers have been busy preparing the GCC and LLVM/Clang compiler toolchains for these major CPU extensions to be found with future Intel processors.

There's been initial AVX10.1 support added to the GNU Compiler Collection (GCC), GNU Assembler preparations, and more. In recent days Intel compiler engineers have also posted more patches.

Last Thursday saw -mevex512 for AVX-512 patches sent out and related -mno-evex512 for toggling the 512-bit register and 64-bit mask register.

After previous discussion, instead of supporting option -mavx10.1, we will first [introduce] option -m[no-]evex512, which will enable/disable 512 bit register and 64 bit mask register.

It will not change the current option behavior since if AVX512F is enabled with no evex512 option specified, it will automatically enable 512 bit register and 64 bit mask register.

How the patches go comes following:

Patch 1 added initial support for option -mevex512.

Patch 2-6 refined current intrin file to push evex512 target for all 512 bit intrins. Those scalar intrins remained untouched.

Patch 7-11 added OPTION_MASK_ISA2_EVEX512 for all related builtins.

Patch 12 disabled zmm register, 512 bit libmvec call for no-evex512, also requested evex512 for vectorization when using 512 bit register.

Patch 13-17 supported evex512 in related patterns.

Patch 18 added testcases for -mno-evex512 and allowed its usage.

The patches currently cause scan-asm fail for pr89229-{5,6,7}b.c since we will emit scalar vmovss here. When trying to use x/ymm 16+ w/o avx512vl but with avx512f+evex512, I suppose we could either emit scalar or zmm instructions. It is quite a rare case on HW since there is no HW w/o avx512vl but with avx512f, so I prefer to not to add [maintenance] effort here to get a slightly perf improvement. But it could be changed to former behavior.

Separately updated patches in preparing APX EGPR support were posted on Friday. As explained in the earlier APX EGPR patches for GCC:

"Intel Advanced performance extension (APX) has been released. It contains several extensions such as extended 16 general purpose registers (EGPRs)...APX introduces a REX2 prefix to help represent EGPR for several legacy/SSE instructions. For the remaining ones, it promotes some of them using evex prefix for EGPR. The main issue in APX is that not all legacy/sse/vex instructions support EGPR. For example, instructions in legacy opcode map2/3 cannot use REX2 prefix since there is only 1bit in REX2 to indicate map0/1 instructions, e.g., pinsrd. Also, for most vector extensions, EGPR is supported in their evex forms but not vex forms, which means the mnemonics with no evex forms also cannot use EGPR, e.g., vphaddw. Such limitation brings some challenge with current GCC infrastructure...."

So with the APX EGPR patches for GCC there is handling of legacy instructions, initial APX_F enabling code, and other early work around preparing for the Intel Advanced Performance Extensions.

Intel APX manual

The APX and AVX10.2+ efforts are a big undertaking but at least Intel's open-source/Linux engineers have been very active on pushing out new patches quickly and getting relevant bits upstreamed, so that by the time processors appear with these capabilities there should be nice out-of-the-box Linux support.

5 Comments