Announcement

Collapse
No announcement yet.

LLVM Working On Intel AVX-512 Support

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by HeavensRevenge View Post
    We need OpenMP support WAY more urgently than new vector instructions...
    Neither is more urgent or less urgent than the other. An enthusiast GPU like the GeForce GTX 680 has 8 cores with 6 vector units of 32 elements, so it can run 1536 strands simultaneously. But it obviously suffers from heterogeneous overhead and has a tiny consumer install base. AVX-512 should make it feasible to have mainstream CPUs with 8 cores with 2 vector units of 16 elements, running 256 strands at 3-4 times higher frequency, in the not too distant future.

    AVX-512 is the first x86 instruction set extension that is specifically targeted at the SPMD programming model that is also used by GPUs (e.g. it supports predication through dedicated mask registers). To execute 16 loop iterations in parallel, you need compilers to vectorize your code in the SPMD fashion. That's what LLVM is working on. So you don't want to miss out on that.

    That said, multi-threading is an equally important aspect of maximizing the CPU's performance. Fortunately Intel recently added the TSX extensions to greatly facilitate and optimize thread synchronization.

    Comment


    • #12
      Originally posted by HeavensRevenge View Post
      We need OpenMP support WAY more urgently than new vector instructions...
      Neither is more urgent or less urgent than the other. An enthusiast GPU like the GeForce GTX 680 has 8 cores with 6 vector units of 32 elements, so it can run 1536 strands simultaneously. But it obviously suffers from heterogeneous overhead and has a tiny consumer install base. AVX-512 should make it feasible to have mainstream CPUs with 8 cores with 2 vector units of 16 elements, running 256 strands at 3-4 times higher frequency, in the not too distant future.

      AVX-512 is the first x86 instruction set extension that is specifically targeted at the SPMD programming model that is also used by GPUs (e.g. it supports predication through dedicated mask registers). To execute 16 loop iterations in parallel, you need compilers to vectorize your code in the SPMD fashion. That's what LLVM is working on. So you don't want to miss out on that.

      That said, multi-threading is an equally important aspect of maximizing the CPU's performance. Fortunately Intel recently added the TSX extensions to greatly facilitate and optimize thread synchronization.

      Comment

      Working...
      X