Originally posted by torsionbar28
View Post
AVX256 you have mostly 2 units in desktop chips, so often you end up in situation where AVX256 2 instructions are done in one cycle vs 1 instruction of AVX512, so gain mostly happens only if 2 AVX-256 instructions can't be used at the same time, and if AVX512 can reduce complexity of algorithm.
The biggest problem of AVX-512 is that you need handcrafted program for it, and you need processor with preferably at least 2 units of AVX-512.
Comment