Announcement

Collapse
No announcement yet.

Intel Has A Single-Chip Cloud Computer

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Are there other things that GPU's are unquestionable better at than CPU's?
    It's typically highly parallelizable low-precision floating point code, but over the last few years a lot of progress has been made to improve the GPU in other areas and I'm sure that will continue.

    I suppose that preforming FP calculations are very parallelizable?
    Not intrinsically, no. Multiplying 2.5 * 3.5 is no more parallel than multiplying 2 * 3. However, many of the applications that heavily use fp hardware are - anything that benefits from SSE support, for example, is probably at least somewhat parallelizable.

    Also, GPU hardware is built from the ground up around floating point operations, while most of the code that goes through a cpu is integer based. Floating point hardware is a lot more complicated (and therefore expensive) than integer hardware, so the amount of fp resources that can be justified on a cpu are fairly limited.

    I believe they're talking about having a single gpu on chip to handle multiple cpu cores (threads) at a time, so that will let the hardware stretch it's legs a little and of course the compilers will probably be tuned to try to schedule as many fp operations at a time as they can.

    So something like the Cell architecture?
    It is sort of like the Cell architecture in that they're talking about having different kinds of specialized hardware on the same chip. Cell had a weak general purpose core with a bunch of highly specialized SPUs, while AMD is talking about multiple x86 cores along with a gpu that would handle most of the fp load.

    I'm just guessing here, but I think the idea would be for the hardware to automatically do all the offloading here, which is different than Cell. With Cell, you had to be very careful about what you were programming where, and I would assume that this stuff from AMD would just take normal code and have the cpu fetch/decode logic forward fp calls through to another part of the chip automatically for you.
    Last edited by smitty3268; 12-03-2009, 08:48 PM.

    Comment


    • #17
      Sorry, when I mentioned SIMD instruction extensions I was talking about the SIMD extensions which are already built into x86 processors today. SIMD extensions are how most of the serious floating point work is done on x86 today, but unless you're writing math libraries or game engines you probably don't see them.

      Originally x86 processors only handled integer work, and a separate x87 coprocessor handled floating point operations (or you trapped down to a software emulation library). The coprocessor had its own set of registers and other state information, but it pulled instructions out of the same stream as the rest of your program. The floating point coprocessor moved onto the same die as the CPU around the 486 days, but the separate instructions and registers remained (and are still there today AFAIK).

      Enter SIMD. The Intel MMX extensions added integer SIMD functions -- the ability to process multiple sets of data with a single instruction. AMD's 3DNow! added the first floating point SIMD extensions, targetted at 3D game geometry but useful in other areas as well. Intel's SSE extensions (Streaming SIMD aka Screaming Sindy) added more floating point capabilities along with a third set of registers.

      In general compilers don't directly use the SIMD extensions - you get at them through math libraries or calls into hand-tweeked assembler routines. This is starting to change but we're still at the early stages - OpenCL is one attempt to make the SIMD extensions on CPUs generally accessible.

      While all this was happening, GPUs were gradually evolving into floating point SIMD engines as well, but with a higher degree of streaming (trading off cache coherence for throughput) and parallelism than CPUs (HD5870 has 20 SIMD engines each executing 80 floating point ops per clock, ie 1600 operations per clock or 3200 FLOPs/clock for MAD).

      This obviously reminds one of Seymour Cray's comment about the emerging conflict between highly parallel microprocessor-based systems and optimized supercomputers :

      If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?
      It took a lot of years, but the chickens are finally starting to win.

      The GPU vs CPU discussion is primarily about whether the work currently handled by the CPU's SIMD extensions (SSEx) can be handled as well or better by a GPU-type architecture instead. So far it looks pretty good - the biggest application of SIMD floating point was 3D geometry and that moved to the GPU quite a few years ago. Math libraries come next, and many of them have been ported to GPUs. APIs like OpenCL and DirectCompute are designed to pick up most of the remaining workload and isolate it from the hardware specifics.
      Last edited by bridgman; 12-03-2009, 10:23 PM.

      Comment


      • #18
        FYI, Larrabee is dead. (As I said was going to happen so long ago).

        http://news.cnet.com/8301-13924_3-10409715-64.html

        Comment


        • #19
          Originally posted by deanjo View Post
          FYI, Larrabee is dead. (As I said was going to happen so long ago).
          From what I've read it's not officially dead, they're just not releasing the first version because it's not competitive; which is hardly unusual with new architectures in the consumer graphics market. Of course they may decide never to build a second version with the lessons they've learned from this one, but they haven't killed Itanium yet so Larrabee may still be with us a decade or two from now.

          Personally I've never thought that Larrabee made much sense -- why put x86 instructions into a massively parallel architecture if you could use a new instruction set and eliminate all the complex instruction decoding? -- but I wouldn't write it off yet.

          Comment


          • #20
            Originally posted by movieman View Post
            From what I've read it's not officially dead, they're just not releasing the first version because it's not competitive; which is hardly unusual with new architectures in the consumer graphics market. Of course they may decide never to build a second version with the lessons they've learned from this one, but they haven't killed Itanium yet so Larrabee may still be with us a decade or two from now.
            Really by a decade or two there really would be no point as by that time CPU's should be massively parallel with possibly hundreds (maybe thousands) of cores making a x86 based graphics card relatively a moot product.

            Comment


            • #21
              Originally posted by movieman View Post
              Personally I've never thought that Larrabee made much sense -- why put x86 instructions into a massively parallel architecture if you could use a new instruction set and eliminate all the complex instruction decoding? -- but I wouldn't write it off yet.
              My guess is that Intel views Larrabee as a jumping point into the same HPC computing space that NVidia seems to be betting their company on, rather than just a video card. If you view the hardware as being for more general purposes and not just 3D acceleration then keeping the x86 ISA could become a selling point.

              Comment


              • #22
                Originally posted by smitty3268 View Post
                My guess is that Intel views Larrabee as a jumping point into the same HPC computing space that NVidia seems to be betting their company on, rather than just a video card. If you view the hardware as being for more general purposes and not just 3D acceleration then keeping the x86 ISA could become a selling point.
                I'm thinking that Larrabee is more of a R&D product who's advancements will show up in other venues like the CPU much like the i740 graphics which had technologies that would later carry on in the intel GMA line of IGP's.

                Comment


                • #23
                  Originally posted by movieman View Post
                  Of course they may decide never to build a second version with the lessons they've learned from this one, but they haven't killed Itanium yet so Larrabee may still be with us a decade or two from now.
                  Also given that Itanic is being effectively being kept 'alive' (and I really use that term generously) by HP, I would say it has been for all intense purposes dead for years. Last chip was built on 90nm and it's successor has yet to be seen. Compaq kept Alpha 'alive' for years too and we all know what happened to it.

                  Comment


                  • #24
                    Originally posted by deanjo View Post
                    Also given that Itanic is being effectively being kept 'alive' (and I really use that term generously) by HP, I would say it has been for all intense purposes dead for years. Last chip was built on 90nm and it's successor has yet to be seen. Compaq kept Alpha 'alive' for years too and we all know what happened to it.
                    Yeah, Intel bought Alpha and killed it...

                    There are very good reasons for maintaining several major architectures. Itanic may have it's problems, but if you killed it completely, it would be a loss for anyone interested in diversity in IT.

                    Comment


                    • #25
                      Originally posted by RobbieAB View Post
                      Yeah, Intel bought Alpha and killed it...

                      There are very good reasons for maintaining several major architectures. Itanic may have it's problems, but if you killed it completely, it would be a loss for anyone interested in diversity in IT.
                      Oh I have nothing against diversity but the market that Itanic was supposed to address got clubbed with a 400 pound pole when AMD brought out their 64-bit solution. Bang for the buck it pretty much snuffed IA-64 out of real existence. ultraSPARC and PPC's thrived better then itanic ever did. IDC predicted IA-64 systems sales will reach $38bn/yr by 2001 but to my knowledge itanic's peak was around 1 billion in sales in 2004.
                      Last edited by deanjo; 12-05-2009, 05:56 PM.

                      Comment


                      • #26
                        Well... IA64 was also being beaten by Alpha until Intel bought and killed it. IA64 had major issues, but if we consider the number of "big chip" designs now against 10 years ago, it's a worrying trend. Alpha and MIPS effectively dead, PPC PPC, Sparc, and IA64 essentially gone from the workstation market. AMD64, good as it is, is still carrying handicaps deriving from it's x86 origins. Admittedly, Itanic is a tad irrelevant in the context of that trend as it's an Intel chip.

                        On a different level, one has to wonder how much Intel learned from the Itanic project which has since been fed back into their x86(_64) chip range.

                        Comment


                        • #27
                          Originally posted by smitty3268 View Post
                          If you view the hardware as being for more general purposes and not just 3D acceleration then keeping the x86 ISA could become a selling point.
                          But if you're going to have to recompile anyway, then you don't care what the underlying instruction set is, just how fast it can execute your code; which will almost certainly be faster if you can eliminate all those transistors and pipeline stages required to decode the complex x86 instruction set.

                          Comment


                          • #28
                            Originally posted by RobbieAB View Post
                            On a different level, one has to wonder how much Intel learned from the Itanic project which has since been fed back into their x86(_64) chip range.
                            Which is also why I said I see larrabee more as a R&D project who's tech will eventually be carried on into other intel products.

                            Comment


                            • #29
                              Originally posted by movieman View Post
                              But if you're going to have to recompile anyway, then you don't care what the underlying instruction set is, just how fast it can execute your code; which will almost certainly be faster if you can eliminate all those transistors and pipeline stages required to decode the complex x86 instruction set.
                              Makes me wonder... if x86 is such a bad thing, why has nobody yet produced an x86 chip where you can switch off the translation layer?

                              Comment


                              • #30
                                Originally posted by Ant P. View Post
                                Makes me wonder... if x86 is such a bad thing, why has nobody yet produced an x86 chip where you can switch off the translation layer?
                                Because the translation layer is what makes it run at a reasonable speed .

                                Essentially it takes the complex x86 instructions and turns them into a sequence of simple RISC-type instructions which can be dynamically decoded as they're executed. I don't know about AMD, but recent Intel chips cache the translated instructions so they don't need to decode x86 instructions all the time.

                                Comment

                                Working...
                                X