Announcement

Collapse
No announcement yet.

LLVM Working On Intel AVX-512 Support

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • LLVM Working On Intel AVX-512 Support

    Phoronix: LLVM Working On Intel AVX-512 Support

    Intel developers working on the LLVM compiler infrastructure have been working on AVX-512 instruction set support in recent days. Intel AVX-512 instructions support 512-bit SIMD instructions with providing twice the number of data elements handled by AVX/AVX2 with a single instruction and four times that of SSE instructions...

    http://www.phoronix.com/vr.php?view=MTQyODk

  • #2
    OpenMP

    We need OpenMP support WAY more urgently than new vector instructions... but more supported instructions is always good, but not important vs the speedup a simple OpenMP implementation could give.

    Comment


    • #3
      Yes, as one keeps doubling down these special registers the yield gets smaller and smaller because the use cases are more seldom, like with threads - 2 threads (cores) are enough to have responsive programs (gui thread and worker thread) which is used by almost any program, while 2+ threads (cores) are much less useful and only used for relatively few use cases like parallel compilation or video encoding.

      Point is, AVX2 and/or SSE4.2 offer plenty of acceleration on the CPU, if still not enough, don't double the registers, the CPU isn't meant for that - the GPU is.

      Comment


      • #4
        You must be kidding me...

        At the moment, clang wastes 75% of a cpu's compute capability on a 4-core and 84% of the compute performance of a 6-core, how in the world is that remotely ok?



        Now just realize clang uses only 1 of those at the moment... basically wasting the money you spent on a core count more than 2 and I WANT clang/LLVM to do multithreading and OpenMP is the normal and easy way to do it.

        It's beyond bad since the problem is compounded quite bad on a 6-8 core machine because only 1 hyperthread is usable, not "core" so thats 1/2 a physical CPU being used, but I do know what you mean, UI responsiveness is fine with just 2 cores, but its just a terrible state to be in

        Comment


        • #5
          Originally posted by HeavensRevenge View Post
          At the moment, clang wastes 75% of a cpu's compute capability on a 4-core and 84% of the compute performance of a 6-core, how in the world is that remotely ok?



          Now just realize clang uses only 1 of those at the moment... basically wasting the money you spent on a core count more than 2 and I WANT clang/LLVM to do multithreading and OpenMP is the normal and easy way to do it.

          It's beyond bad since the problem is compounded quite bad on a 6-8 core machine because only 1 hyperthread is usable, not "core" so thats 1/2 a physical CPU being used, but I do know what you mean, UI responsiveness is fine with just 2 cores, but its just a terrible state to be in
          It's being worked on by Intel, Apple and others within the LLVM Group. When there is an inclusion into trunk I'm sure Phoronix will publish it.

          Comment


          • #6
            Elena Demikovsky

            You forgot to mention that this work is done by Elena Demikovsky from the Intel site in Israel.

            Comment


            • #7
              Originally posted by xterminator
              LLVM is terrible

              It should be replaced with GCC.
              GCC is terrible

              It should be replaced with LLVM.

              Comment


              • #8
                I love Clang & LLVM

                I love clang & llvm, they are marvelous and I love playing with it more than GCC by far. It's just that this is an achilles heel, basically its a car without a windshield or with only 1 gear. We need to give llvm code a gearbox so it can gear up and go as fast as needed without such an artificial limitation with no OpenMP support.

                Comment


                • #9
                  Originally posted by HeavensRevenge View Post
                  I love clang & llvm, they are marvelous and I love playing with it more than GCC by far. It's just that this is an achilles heel, basically its a car without a windshield or with only 1 gear. We need to give llvm code a gearbox so it can gear up and go as fast as needed without such an artificial limitation with no OpenMP support.
                  Wow, hyperbole or what. I've used a copy of GCC for ages that doesn't have OpenMP support and it has never been an issue building anything or running any software. A car like that no one would consider using. With GCC or another compiler that isn't even close to the case. Yes it would be nice for LLVM to have OpenMP and it will happen in due time, but the reality is it that for the avg end user it doesn't make any difference and the software still works as is. The other thing too is that only a very tiny percentage of software even has support for utilizing OpenMP.

                  Comment


                  • #10
                    Originally posted by mark45 View Post
                    Yes, as one keeps doubling down these special registers the yield gets smaller and smaller because the use cases are more seldom, like with threads - 2 threads (cores) are enough to have responsive programs (gui thread and worker thread) which is used by almost any program, while 2+ threads (cores) are much less useful and only used for relatively few use cases like parallel compilation or video encoding.

                    Point is, AVX2 and/or SSE4.2 offer plenty of acceleration on the CPU, if still not enough, don't double the registers, the CPU isn't meant for that - the GPU is.
                    You miscalculated. It's two threads for each program! This means for running multiple responsive programs would benefit from at least 4 cores (min 2 programs with each a gui and a worker thread).
                    It's definitely useful to have more cores. And what if you want to edit some documents while playing some music or something in the background? That's two programs already, not including antivirus software that also takes resources.

                    Comment


                    • #11
                      Originally posted by HeavensRevenge View Post
                      We need OpenMP support WAY more urgently than new vector instructions...
                      Neither is more urgent or less urgent than the other. An enthusiast GPU like the GeForce GTX 680 has 8 cores with 6 vector units of 32 elements, so it can run 1536 strands simultaneously. But it obviously suffers from heterogeneous overhead and has a tiny consumer install base. AVX-512 should make it feasible to have mainstream CPUs with 8 cores with 2 vector units of 16 elements, running 256 strands at 3-4 times higher frequency, in the not too distant future.

                      AVX-512 is the first x86 instruction set extension that is specifically targeted at the SPMD programming model that is also used by GPUs (e.g. it supports predication through dedicated mask registers). To execute 16 loop iterations in parallel, you need compilers to vectorize your code in the SPMD fashion. That's what LLVM is working on. So you don't want to miss out on that.

                      That said, multi-threading is an equally important aspect of maximizing the CPU's performance. Fortunately Intel recently added the TSX extensions to greatly facilitate and optimize thread synchronization.

                      Comment


                      • #12
                        Originally posted by HeavensRevenge View Post
                        We need OpenMP support WAY more urgently than new vector instructions...
                        Neither is more urgent or less urgent than the other. An enthusiast GPU like the GeForce GTX 680 has 8 cores with 6 vector units of 32 elements, so it can run 1536 strands simultaneously. But it obviously suffers from heterogeneous overhead and has a tiny consumer install base. AVX-512 should make it feasible to have mainstream CPUs with 8 cores with 2 vector units of 16 elements, running 256 strands at 3-4 times higher frequency, in the not too distant future.

                        AVX-512 is the first x86 instruction set extension that is specifically targeted at the SPMD programming model that is also used by GPUs (e.g. it supports predication through dedicated mask registers). To execute 16 loop iterations in parallel, you need compilers to vectorize your code in the SPMD fashion. That's what LLVM is working on. So you don't want to miss out on that.

                        That said, multi-threading is an equally important aspect of maximizing the CPU's performance. Fortunately Intel recently added the TSX extensions to greatly facilitate and optimize thread synchronization.

                        Comment

                        Working...
                        X