Announcement

Collapse
No announcement yet.

Is Xeon Phi every OSS enthusiast`s wet dream

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is Xeon Phi every OSS enthusiast`s wet dream

    The first time i read this article http://semiaccurate.com/2012/11/12/w...or-a-xeon-phi/ i wondered if this card would be the best way to have high—end graphics with OSS drivers. It should be pretty straight forward to run LLVM on top of it right?

    Obviously these cards are directed at HPC, but the prices are in line with Teslas, so Intel should be able to sell these chips at similar prices directed at consumers.

    Can anyone with better understanding of the driver stack tell me if I'm wrong our wright?

  • #2
    In short, the answer is "It IS the best way, but a little longer than you thought."

    Despite of the pricing, the most outstanding problem is LLVM doesn't have a backend for KnightCorner. SSE instructions also won't work on KNC.
    We also need a (virtual?) DRM driver to handle all the mess such memory management and interactions with i915. You won't want a pure rendering card.
    Finally, may Intel push KNC kernel driver into Linux mainline, or we need to stick with RHEL6/CentOS6/SL6/Suse as well as their ancient software.

    Nevertheless, it is still the brightest way towards high performance rendering with OSS.

    P.S.
    http://software.intel.com/en-us/arti...ck-start-guide
    and
    http://registrationcenter.intel.com/.../readme-en.txt
    show a nicer figure about the stack. Better than I thought, but not perfect.
    Last edited by zxy_thf; 11-13-2012, 05:54 AM.

    Comment


    • #3
      Thanks for the reply.Some version of KC will probably trickle down to the consumer market eventually. The times of mixing and matching different vendors parts are close to an end, so if intel wants to be relevant at the graphically intensive applications they must develop a high performance GPU. I remember reading rumors that future integraded graphics might even be based on it. It will probably take some time though...

      What I find most interesting is that, if I understood it correctly, since KC is "easier" to program, developing a driver for it would probably be "easier" than for other GPUs. This would have the consequence of reducing the amount of effort a complete GPU driver requires, improving our chances of having open drivers. If such a trend catches with other GPU manufacturers, should be great for consumers, right?

      Comment


      • #4
        Originally posted by Figueiredo View Post
        Thanks for the reply.Some version of KC will probably trickle down to the consumer market eventually. The times of mixing and matching different vendors parts are close to an end, so if intel wants to be relevant at the graphically intensive applications they must develop a high performance GPU. I remember reading rumors that future integraded graphics might even be based on it. It will probably take some time though...

        What I find most interesting is that, if I understood it correctly, since KC is "easier" to program, developing a driver for it would probably be "easier" than for other GPUs. This would have the consequence of reducing the amount of effort a complete GPU driver requires, improving our chances of having open drivers. If such a trend catches with other GPU manufacturers, should be great for consumers, right?
        (I only skimmed part of KC's ISA. Sorry for any mistake)
        OpenGL developers don't need to write any "driver", but an "OpenGL server". That's what developers were doing in the "good old days":P, when SGI dorminated workstations.
        However, unfortunatelly the efforts to implement one are still a lot, but this time people don't need to working on two ISAs for each card.

        For the open-source GPU driver, I'm still not optimistic. It would take long time before community have real threatens to NV/AMD's solutions. For example, the state tracker of Mesa can only handle OpenGL 3.1, rather than 4.x.

        Comment


        • #5
          Originally posted by zxy_thf View Post
          For the open-source GPU driver, I'm still not optimistic. It would take long time before community have real threatens to NV/AMD's solutions. For example, the state tracker of Mesa can only handle OpenGL 3.1, rather than 4.x.
          I guess what I was expecting is that if the GPU handles a code general enought, the same driver would work for every GPU this general. Bear with me for a moment.

          Obviously i'm going out on a limb here, but let's imagine Xeon Phi crushes the competition in the HPC space and enters consumer market (probably being integrated as the GPU of some future intel SoC).

          AMD and nVidia will be pressed to put out more "general" GPUs. Probably accepting ARM instruction set. Maybe they even decide to push this "programabilty" into future OpenGL revisions.

          In this scenario, if a driver similar to LLVM-pipe runs on every GPU out there, the efforts of building and maitaining a driver would be much less then they are now. Right? This could potentially be a huge win for opensource, a single driver to rule them all. Much like Linux itself. One can only dream...

          I'm sorry if I made any mistake with the terms or concepts, I'm not a programmer.

          Comment


          • #6
            Originally posted by Figueiredo View Post
            I guess what I was expecting is that if the GPU handles a code general enought, the same driver would work for every GPU this general. Bear with me for a moment.

            Obviously i'm going out on a limb here, but let's imagine Xeon Phi crushes the competition in the HPC space and enters consumer market (probably being integrated as the GPU of some future intel SoC).

            AMD and nVidia will be pressed to put out more "general" GPUs. Probably accepting ARM instruction set. Maybe they even decide to push this "programabilty" into future OpenGL revisions.

            In this scenario, if a driver similar to LLVM-pipe runs on every GPU out there, the efforts of building and maitaining a driver would be much less then they are now. Right? This could potentially be a huge win for opensource, a single driver to rule them all. Much like Linux itself. One can only dream...

            I'm sorry if I made any mistake with the terms or concepts, I'm not a programmer.
            It's possible, but firstly AMD&NV have to make agreement on the ISA. At least, they need to make agreement on some features of their processors.
            Like one can't develop OS easily for two CPUs, one with MMU&Interruption and one without, currently we can't develop such a general driver for all GPUs as they have too many differences. For example, on the context switch, I/A/N choose three different ways.

            BTW, maybe the most funny thing is, if a GPU is "general enough", why we have to call it GPU? Just because it can output videos?

            Comment


            • #7
              I guess the two big challenges will be texture processing (Larrabee had dedicated texture units) and scaling to a larger number of threads.

              My recollection was that recent llvmpipe versions scaled pretty well to 3 cores but hit diminishing returns after that (see Michael's test below but ignore 12-thread because there you're running "hyper-threads" instead of more cores :

              http://www.phoronix.com/scan.php?pag...llvmpipe&num=1

              I think the scaling issue should be manageable (GPUs manage it today with the equivalent of 20+ cores) -- I'm less sure about texturing simply because there's a lot of processing power hidden in the texture filtering.

              Originally posted by zxy_thf View Post
              BTW, maybe the most funny thing is, if a GPU is "general enough", why we have to call it GPU? Just because it can output videos?
              Have you no faith in Marketing ? GPU will just become "General-purpose Processing Unit"
              Last edited by bridgman; 11-13-2012, 11:22 AM.

              Comment


              • #8
                bridgman,

                Due to my ignorance in the subject I couldn't grasp from AMD's roadmap if such "programability" is also also expected in AMD camp. Obviously you can only share what's been made public already, but if can be so kind as to briefly clarify how the HSA improvements differs from Xeon+XeonPhi chip I'm sure us layman users would greatly appreciate.

                Comment


                • #9
                  Originally posted by zxy_thf View Post
                  It's possible, but firstly AMD&NV have to make agreement on the ISA. At least, they need to make agreement on some features of their processors.
                  Like one can't develop OS easily for two CPUs, one with MMU&Interruption and one without, currently we can't develop such a general driver for all GPUs as they have too many differences. For example, on the context switch, I/A/N choose three different ways.
                  I think the keyword here is LLVM. Yes you can easily develop OS for two CPUs utilizing LLVM. Of course if one CPU lacks some feature like an MMU, the OS must be able to cope with the lack of such a component. However you are talking about a quite small difference here that LLVM should have no problem in handling.

                  Code that targets LLVM do not target a specific ISA...

                  Originally posted by zxy_thf View Post
                  BTW, maybe the most funny thing is, if a GPU is "general enough", why we have to call it GPU? Just because it can output videos?
                  Why are we still calling something a "sound card" when its most often part of the chipset? Because of historical reasons. We already have OpenCL and other standards that makes a GPU a lot more than a GPU.

                  Comment


                  • #10
                    Originally posted by zxy_thf View Post
                    (I only skimmed part of KC's ISA. Sorry for any mistake)
                    OpenGL developers don't need to write any "driver", but an "OpenGL server". That's what developers were doing in the "good old days":P, when SGI dorminated workstations.
                    However, unfortunately the efforts to implement one are still a lot, but this time people don't need to working on two ISAs for each card.

                    For the open-source GPU driver, I'm still not optimistic. It would take long time before community have real threatens to NV/AMD's solutions. For example, the state tracker of Mesa can only handle OpenGL 3.1, rather than 4.x.
                    Actually they would need to write a Mesa driver, as Mesa already have an OpenGL Server.

                    What do you mean? The mesa drivers for NV/AMD are at 3.1 to. The Xeon Phi with a good Mesa driver have a fair chance to give us performance that's neither NV or AMD can currently match.

                    Yes I know that the proprietary driver have more features and performance, but that's totally irrelevant. For a bunch of reasons i need FOSS drivers and have to judge a device based on how it perform with FOSS drivers. And I know I'm not alone with such use cases.

                    Comment


                    • #11
                      Originally posted by mateli View Post
                      Actually they would need to write a Mesa driver, as Mesa already have an OpenGL Server.

                      What do you mean? The mesa drivers for NV/AMD are at 3.1 to. The Xeon Phi with a good Mesa driver have a fair chance to give us performance that's neither NV or AMD can currently match.

                      Yes I know that the proprietary driver have more features and performance, but that's totally irrelevant. For a bunch of reasons i need FOSS drivers and have to judge a device based on how it perform with FOSS drivers. And I know I'm not alone with such use cases.
                      I said we need a "server" because Mesa needs to be splited into two processes for Xeon Phi.
                      One is the library for traditional OpenGL applications who have no idea about Xeon Phi.
                      Another is a Xeon Phi application, running in a different process located at a different host, because Xeon Phi is a standalone machine connected with PCI-e bus actually.

                      A brief TODO list for Xeon Phi rendering:
                      1. Write a Mesa driver on the host (your core i?, athlon, ppc or any cpu you like)
                      The driver needs to translate OpenGL commands into some intermediate form messages and pass them to the server running on Xeon Phi.
                      In other words, the state tracker (gallium) is left on the host.
                      (Sending OpenGL commands directly is possible, but I'd rather run the state tracker on a superscalar processor)

                      2. Write a OpenGL server on Xeon Phi
                      The server needs to parse messages and complete the rendering work.
                      This server can use llvmpipe, but rewriting from scratch is also possible, esp. for some commerical OpenGL vendors.


                      It's another story to optimize the OpenGL Sever, but I believe Xeon Phi is the furture for OSS high performance 3D.
                      Last edited by zxy_thf; 03-21-2013, 04:56 AM.

                      Comment


                      • #12
                        Originally posted by Figueiredo View Post
                        bridgman,

                        Due to my ignorance in the subject I couldn't grasp from AMD's roadmap if such "programability" is also also expected in AMD camp. Obviously you can only share what's been made public already, but if can be so kind as to briefly clarify how the HSA improvements differs from Xeon+XeonPhi chip I'm sure us layman users would greatly appreciate.
                        Sorry, just noticed this now. Even without HSA, most of the programmability is already included in currently shipping GPUs. The main differences are :

                        1. GPUs keep texture filtering in fixed-function hardware rather than moving it to general purpose processors. Texture processing is generally required for small rectangular areas of texture rather than individual pixels, and there are some significant performance & power efficiency benefits to be had from using fixed-function hardware because of the ability to share results from intermediate calculations more efficiently.

                        2. GPUs handle the task of spreading work across parallel threads and cores using fixed function hardware rather than software, which helps a lot with scaling issues.

                        Pretty much everything else has already moved from fixed function hardware into the general purpose processors. The ISA on the general purpose processors is a bit more focused on graphics and HPC tasks -- that's what the ISA guide for each new HW generation covers.

                        In KC-speak, the HD 79xx has 32 independent cores, each with a scalar ALU and a 2048-bit SIMD floating point ALU (organized as 4 x 512-bit, ie 4 x 16-way SIMD), running up to 40 threads on each core. The fixed-function hardware that spreads work across threads and cores allows each of the cores to have relatively more floating point power (which is what graphics and HPC both require) and relatively less scalar power.

                        What HSA brings is tighter integration between the main (superscalar) CPU cores and the GPU cores to reduce the overhead and programming complexity of offloading work to a separate device -- shared pageable virtual memory, cache coherency between CPU and GPU cores, simpler/faster dispatch of work between GPU and CPU etc...
                        Last edited by bridgman; 03-21-2013, 09:45 AM.

                        Comment


                        • #13
                          re: #2, couple more comments for completeness...

                          For compute, the fixed function hardware takes N-dimensional array-level compute commands and spreads the work across cores & threads.

                          For graphics, the fixed function hardware takes "draw using these lists of triangles" commands and implements the non-programmable parts of GL/DX graphics pipelines :

                          - pick out individual vertices and spread the vertex shader processing across cores and threads
                          - reassemble processed vertices into triangles, scan convert each triangle to identify pixels
                          - spread the pixel/fragment shader work across cores & threads

                          (a modern graphics pipeline has a lot more stages than just vertex & fragment processing but you get the idea, same applies to the other stages as well)

                          Comment


                          • #14
                            Originally posted by bridgman View Post
                            In KC-speak, the HD 79xx has 32 independent cores, each with a scalar ALU and a 2048-bit SIMD floating point ALU (organized as 4 x 512-bit, ie 4 x 16-way SIMD), running up to 40 threads on each core. The fixed-function hardware that spreads work across threads and cores allows each of the cores to have relatively more floating point power (which is what graphics and HPC both require) and relatively less scalar power.
                            Maybe I've learn something wrong but I can't get the idea about "40 threads".
                            I think the cores in SI run wavefront, not independent threads. Isn't SMT an unnecessary complexity for GPUs?

                            Comment


                            • #15
                              Originally posted by zxy_thf View Post
                              Maybe I've learn something wrong but I can't get the idea about "40 threads".
                              I think the cores in SI run wavefront, not independent threads. Isn't SMT an unnecessary complexity for GPUs?
                              What I'm calling a thread in KC-speak is a wavefront in GPU-speak. Basically the same thing these days... a single thread using the SIMD hardware to process 64 elements in parallel (each 16-way SIMD actually performs a vector operation on 64 elements in 4 clocks).

                              SMT these days usually refers to dynamically sharing execution units in a superscalar processor, which is complex as you say. GPUs generally rely on thread-level parallelism rather than instruction-level parallelism (although VLIW shader cores use both, with the compiler implementing ILP), so running multiple threads is a lot less complex.

                              Think about the old "barrel processor" model from the 60s and 70s, where the processor has multiple register sets and switches between threads on a per-clock basis rather than using the parallel execution units required for superscalar operation to run instructions from more than one thread in a single clock.

                              IIRC Larrabee uses the same approach -- multiple threads per core but only one thread at a time. I'm not sure which model KC uses but I suspect it also runs one thread at a time per core.
                              Last edited by bridgman; 03-21-2013, 01:02 PM.

                              Comment

                              Working...
                              X