Announcement

Collapse
No announcement yet.

Intel Begins Working On "Knights Mill" Support For LLVM/Clang

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel Begins Working On "Knights Mill" Support For LLVM/Clang

    Phoronix: Intel Begins Working On "Knights Mill" Support For LLVM/Clang

    Intel compiler engineers have begun mainlining "Knights Mill" enablement within the LLVM compiler stack...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Poor Intel. Forever tied to the CPU and x86 (AMD 64) like the albatross to the pilot in "The Rime of the Ancient Mariner". The market has chosen and it's GPU's for intense mathmatics. A few vector routines glued to a crap ton of x86 CPU's is not going to cut it at scale. Intel should have bought Nvidia back in the late 90's. But it was myopic management and the fundamentalist belief in Moore's law that kept them a monoculture tech company. The only hope Intel has is putting an Altera FPGA in die on these CPUs but of course that increases the complexity of programming these guys and that ruins the marketing message that Intel is using to sell these Xeon Phi's.

    Comment


    • #3
      The whole Xeon Phi thing appears to be a completely embarrassment for Intel. Most of papers and books, even the ones, written by people with ties to Intel could not make this damn thing go faster enough. For instance the Volta GPU with the Tensor cores improved the Performance for some kernels used in deep learning software from six to ten times when compared to the Pascal architecture that is already many times faster than the Intel Xeon (not Phi).

      Comment


      • #4
        I am very very excited about Knight Mill. Ultimately, individuals will decide where this ends up, but this is how we shall first achieve exascale computing.

        Comment


        • #5
          No....Xeon Phi will NOT usher in exascale computing. Intel like Microsoft is a tech company boxed in by its own hubris and success. Just as Microsoft did not and could not see the mobile, distributed and open source compute models, neither did Intel see the low power SoC model of computing nor the model of GPGPU computing for HPC. Intel CAN NOT....let me repeat....CAN NOT scale vector routine improvements NOR x86 improvements faster than improvements made by GPU manufacturers such as Nvidia and AMD. Yes....they bought FPGA maker Altera a few years back and FPGAs can do a CRAP TON of calculations for relatively low power but the cost comes in complexity of programming and software tools and frameworks. So you are back to the problem of scale and rapidity of improvements over generations and complex programming paradigms.

          A GPU is the computing sweet spot for HPC. Cheaper to scale than banks of hard to program FPGAs. Can be more easily programmed than FPGAs and only slightly more complex than x86 cores with AVX routines. GPU chips through Moore's Law can shrink and improve nearly as quickly as CPU's and faster than FPGA's.

          Comment


          • #6
            Originally posted by defaultUser View Post
            The whole Xeon Phi thing appears to be a completely embarrassment for Intel. Most of papers and books, even the ones, written by people with ties to Intel could not make this damn thing go faster enough. For instance the Volta GPU with the Tensor cores improved the Performance for some kernels used in deep learning software from six to ten times when compared to the Pascal architecture that is already many times faster than the Intel Xeon (not Phi).
            It was an Idea started with IBMs Cell, at a time Intel was trying to tell us (and themselves) that they will take the "Netburst" (P4) Architecture to 10GHz in around 2007.
            The Cell was ground-breaking in many regards, but the GPU then took over providing a better fit for most (not all) problems Cell could solve.
            Itanium, Pentium4, Quark, Knightscorner, those are failed direction who each alone could kill most companies, their quasi-monopoly apparently allows wasting alot of money to keep some managers ideas and ego alive.
            Its almost like they have a separate "special olympics" branch, aside from the top-of-class engineers that keep them a big player.

            Comment


            • #7
              There are people who know a lot about microarchitectures that are of the opinion that the correct approach for data-parallel problems is basically the "traditional vector" ISA, as invented by Seymour Cray back in the 1970'ies. Unfortunately the microprocessor industry largely ignored this and thus in mainstream processors we have suffered the scourge of packed-SIMD (itself a 1950'ies invention) extensions for the past several decades. For modern examples of "proper" vector ISA's, see ARM SVE or the RISC-V vector extension. Though AVX-512 is to some extent getting there too.

              GPU's are good for (some) data-parallel problems not because SIMT is the ultimate programming model but because they architecturally get a lot of other things right (lots of cores + massive memory BW).

              See https://riscv.org/wp-content/uploads...p-june2015.pdf for some arguments why a real vector ISA is better than either packed-SIMD or GPU-style SIMT programming models.

              Comment


              • #8
                Originally posted by Jumbotron View Post
                No....Xeon Phi will NOT usher in exascale computing. Intel like Microsoft is a tech company boxed in by its own hubris and success. Just as Microsoft did not and could not see the mobile, distributed and open source compute models, neither did Intel see the low power SoC model of computing nor the model of GPGPU computing for HPC. Intel CAN NOT....let me repeat....CAN NOT scale vector routine improvements NOR x86 improvements faster than improvements made by GPU manufacturers such as Nvidia and AMD. Yes....they bought FPGA maker Altera a few years back and FPGAs can do a CRAP TON of calculations for relatively low power but the cost comes in complexity of programming and software tools and frameworks. So you are back to the problem of scale and rapidity of improvements over generations and complex programming paradigms.

                A GPU is the computing sweet spot for HPC. Cheaper to scale than banks of hard to program FPGAs. Can be more easily programmed than FPGAs and only slightly more complex than x86 cores with AVX routines. GPU chips through Moore's Law can shrink and improve nearly as quickly as CPU's and faster than FPGA's.
                That bit about Microsoft... apparently they feel their monopoly is now stronger than ever so they're raising license fees for Windows. Exactly what was it they couldn't see? Intel has been in basically same position for years

                Comment


                • #9
                  Originally posted by AndyChow View Post
                  I am very very excited about Knight Mill. Ultimately, individuals will decide where this ends up, but this is how we shall first achieve exascale computing.
                  Unlikely. Not even Intel thinks that these days.
                  Next big machine is Sierra. POWER9+nV, ~.1 ExaFlop, maybe mid-2018
                  After that we get Aurora which is Intel, supposed to hit Exa, but which is "Future Intel Part, NOT Knights Hill" and which appears to be unlike the existing Phi family.

                  This morning a presentation filtered from the Department of Energy’s Office of Science showing the roadmap to exascale with a 2021 machine at Argonne

                  Comment


                  • #10
                    Originally posted by Jumbotron View Post
                    No....Xeon Phi will NOT usher in exascale computing. Intel like Microsoft is a tech company boxed in by its own hubris and success. Just as Microsoft did not and could not see the mobile, distributed and open source compute models, neither did Intel see the low power SoC model of computing nor the model of GPGPU computing for HPC. Intel CAN NOT....let me repeat....CAN NOT scale vector routine improvements NOR x86 improvements faster than improvements made by GPU manufacturers such as Nvidia and AMD. Yes....they bought FPGA maker Altera a few years back and FPGAs can do a CRAP TON of calculations for relatively low power but the cost comes in complexity of programming and software tools and frameworks. So you are back to the problem of scale and rapidity of improvements over generations and complex programming paradigms.

                    A GPU is the computing sweet spot for HPC. Cheaper to scale than banks of hard to program FPGAs. Can be more easily programmed than FPGAs and only slightly more complex than x86 cores with AVX routines. GPU chips through Moore's Law can shrink and improve nearly as quickly as CPU's and faster than FPGA's.
                    I agree with most of what you are saying, but it's not clear that a traditional GPU is EXACTLY the optimal choice.
                    Consider something that I like to call a "laned" processor which is like a GPU except that each lane of a wavefront has its own PC rather than the shared PC of a wavefront (and perhaps a few other modifications like a small L0 cache in front of each lane). Such a processor is essentially in the same business as a GPU
                    --- each lane is a simple one- (or perhaps two-) wide in order CPU executing a simple narrow-vector (say 128 bit maybe?) instruction set
                    --- there is a huge pool of registers and an expectation of register-set swapping (ie running a new "virtual" lane) on a cache miss
                    --- BUT there isn't the enforced absolutely in synch execution of wave-fronts (and their attendant costs in the face of branching).
                    There is exploiting of locality (instructions and data, spatial and temporal) through a deep cache hierarchy, but there isn't the same sort of requirement of more or less identical behavior across all lanes.

                    My guess is that a design like this gives you what you want from a throughput engine, without the overhead and specifically graphics-targeted decisions of a GPU.
                    Now, is there a market large enough for such a design? We shall see...
                    IMHO you COULD run graphics on top of such a design (at the cost of adding a few extra features that are irrelevant to HPC, like texture support, and fp16 support.) My GUESS is that that is where Apple is headed with their "GPU".

                    Now if Intel want a design that's a better throughput engine than Phi, I would imagine they'd go down this path
                    --- many more lanes than the number of cores in a phi
                    --- HW driven register switching (as opposed to OS-mediated threads and SMT --- imagine if every lane of a GPU required an OS thread...)
                    --- drop back from the pain of 512-wide vectors to the much nicer design point of say 128 or 256 wide.
                    --- and (and this is what Intel does NOT want to hear) because each lane needs to be simple and sleek, you need to be running a simple and sleek instruction set. x86 overhead may not be much on Skylake-sized cores, but it is a huge overhead when we're envisaging individual lanes that are each simpler than an A35 or so.

                    Are they willing yet to admit that x86 is not the solution to EVERY computing problem?
                    They weren't for Atom, they weren't for Quark, they weren't for Phi. Has that string of failures pounded some sense into them?

                    Comment

                    Working...
                    X