Originally posted by WannaBeOCer
View Post
First of all the fact that Intel has a better AVX-512 implementation that is not "double-pumped" is a myth that can be easily debunked if someone reads the Intel Optimization Manuals.
The implementation of almost all AVX-512 instructions on the Intel CPUs is also "double-pumped", in the sense that the throughput for 512-bit instructions is the same as for 256-bit instructions, because the 512-bit instructions are executed by combining two 256-bit execution pipelines into a single one.
The only exception is that for the FMA instruction some of the most expensive Intel CPUs, with a price of thousands of dollars have a second 512-bit FMA unit, so only for FMA they have a double throughput when 512-bit instructions are used. This 2nd FMA unit is present only in all Xeon Platinum, a part of the Xeon Gold and in those of the Xeon W models that support AVX-512. It was also present in a few of the HEDT Intel CPUs.
All the Intel CPUs and AMD CPUs that support AVX-512 have exactly the same throughput: two 512-bit instructions per clock cycle.
The difference between the various models is only in the restrictions that may forbid both instructions executed in a clock cycle to be certain of the more complex instructions.
On most Intel CPUs, only 1 of the 2 instructions may be an FMA or an FADD.
On Zen 4, only 1 of the 2 instructions may be an FMA, but the other can be an FADD, so this is better than for most Intel CPUs with AVX-512.
On Intel Xeon Platinum and similar CPUs, both instructions can be an FMA.
Not only Zen 4 has a better throughput than the majority of the Intel CPUs by being able to do both an FMA and an FADD per cycle, but it has also a double throughput for certain kinds of shuffle and permute instructions.
There are also various other improvements described at:
The most expensive models of the future Sapphire Rapids CPUs will again have a double FMA throughput per clock cycle, and maybe Intel will give up on the market segmentation and they will no longer disable the 2nd FMA unit on the cheap CPUs.
Even if that would happen, Zen 4 will continue to have a better AVX-512 implementation than most of the already existing Intel CPUs.
Comment