Announcement

Collapse
No announcement yet.

NEC Is Looking To Contribute SX-Aurora VE Accelerator Support To LLVM

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NEC Is Looking To Contribute SX-Aurora VE Accelerator Support To LLVM

    Phoronix: NEC Is Looking To Contribute SX-Aurora VE Accelerator Support To LLVM

    The newest compiler back-end proposed for merging into the LLVM compiler code-base is for the NEC SX-Aurora VE (Vector Engine) accelerator card...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Cool! Micheal you didn’t mention the cost for one of these vector processors. I’m kinds assuming it is out of reach of mainstream users but you never know.

    Comment


    • #3
      It is quite reasonable, less than nvidia top gpus. Talk to nec, if you demonstrate a need for a cluster of them, they will give you a development workstation for a very good price.

      Comment


      • #4
        Originally posted by wizard69 View Post
        ... you didn’t mention the cost for one of these vector processors.
        This is probably one of those prices of "if you have to ask, you can't afford it".

        Comment


        • #5
          Originally posted by pegasus View Post
          It is quite reasonable, less than nvidia top gpus.
          They're also considerably slower than Nvidia's top GPUs. But, I guess the point is that they're (now) FOSS and don't involve a US-based supplier.

          I hope they support OpenCL. IMO, that's a much better option than relying on LLVM offload.

          Comment


          • #6
            "Slower" depends on the type of code you want to run on them. These are old-school vector machines and you want code that tends well to vectorization. Therefore nothing like OpenCL is required, just some compiler smarts. They were able to demonstrate 1600x speedups compared to cpu for real applications, which is more than what Nvidia can show I believe. See
            https://www.nextplatform.com/2017/11...vector-engine/ and
            Ever since the “Aurora” vector processor designed by NEC was launched last year, we have been wondering if it might be used as a tool to accelerate

            Comment


            • #7
              Originally posted by pegasus View Post
              "Slower" depends on the type of code you want to run on them. These are old-school vector machines and you want code that tends well to vectorization.
              And GPUs are new-school vector machines that can deliver good performance with less vectorization.

              Originally posted by pegasus View Post
              Therefore nothing like OpenCL is required, just some compiler smarts.
              You could say the same for GPUs, but it turns out that you get better application performance if you expose the device memory hierarchy and make the programmer refactor their code to parallelize well and explicitly map out data sharing among work items.

              Originally posted by pegasus View Post
              They were able to demonstrate 1600x speedups compared to cpu for real applications, which is more than what Nvidia can show I believe. See
              https://www.nextplatform.com/2017/11...vector-engine/ and
              https://www.nextplatform.com/2018/10...ormance-boost/
              I dunno. I just skimmed the articles, but I'm not seeing any 1600x number. What I did see is this:
              While the Vector Engine cannot come close to the double precision performance of a Volta GPU, which weighs in at 7.8 teraflops (the number has been tweaked up a smidgen with higher clock speeds since the announcement in May), or offer the many other flavors of compute like Tensor Core dot product engines of 8-bit integer or half precision 16-bit floating point, the Vector Engine does best the Volta when it comes to memory bandwidth, at 1.2 TB/sec compared to 900 GB/sec for Volta, and in memory capacity, at 48 GB of HBM2 compared to 16 GB for Volta.
              Of course, that article is old and Nvidia now offers the Tesla V100 with 32 GB and I think 1 TB/sec (but I could be wrong, on that point).

              I'm not really opposed to having more parallel architectures in the world. And I get why some countries might want non-US supply chains. Fully open-source is also nice. I'm just saying I don't think it's got anything on Nvidia's V100 or AMD's MI60 (which does hit 1 TB/sec). Well, 50% more memory, I guess. But both offer a lot besides raw compute and bandwidth, as we all know.

              Anyway, if you want to talk Japanese HPC chips, I think the PEZY-SC2 is a much more interesting option:

              Comment


              • #8
                Consider that NEC still has customers with their SX line of vector machines that are still happily using them. But they're getting old and this Aurora Tsubasa card is basically a replacement of SX machines with additional benefit of being designed as an accelerator with the potential to gain some more customers. If you check hpcg top 500, you'll find SX machines very high on the hpl/hpcg ratio. The fact that they're so high is a consequence of lack of memory bandwidth in modern architectures. And there are still many engineering and weather codes out there that are mainly bottlenecked on memory bandwidth.

                Comment


                • #9
                  Originally posted by pegasus View Post
                  Consider that NEC still has customers with their SX line of vector machines that are still happily using them.
                  Providing an upgrade path for legacy customers is also a decent reason to build them. Of course, another option might be for them to partner with a GPU maker and provide a software transition path, but I know Japan's HPC efforts are heavily-subsidized specifically to protect some indigenous capacity to build such HW.

                  My only point was I doubt they can beat the big GPUs from team Red, Green, and Blue (soon), in most workloads.

                  Comment


                  • #10
                    Originally posted by coder View Post
                    And GPUs are new-school vector machines that can deliver good performance with less vectorization.


                    You could say the same for GPUs, but it turns out that you get better application performance if you expose the device memory hierarchy and make the programmer refactor their code to parallelize well and explicitly map out data sharing among work items.


                    I dunno. I just skimmed the articles, but I'm not seeing any 1600x number. What I did see is this:

                    Of course, that article is old and Nvidia now offers the Tesla V100 with 32 GB and I think 1 TB/sec (but I could be wrong, on that point).

                    I'm not really opposed to having more parallel architectures in the world. And I get why some countries might want non-US supply chains. Fully open-source is also nice. I'm just saying I don't think it's got anything on Nvidia's V100 or AMD's MI60 (which does hit 1 TB/sec). Well, 50% more memory, I guess. But both offer a lot besides raw compute and bandwidth, as we all know.

                    Anyway, if you want to talk Japanese HPC chips, I think the PEZY-SC2 is a much more interesting option:

                    https://en.wikichip.org/wiki/pezy/pezy-scx/pezy-sc2
                    Then read the _real_ doks by NEC (Dr. Erich Focht).
                    Starting page 29
                    Keynote at the WPMVP 2019 workshop at PPoPP 2019, Washington, DC, February 16, 2019 The talk introduces the NEC SX-Aurora TSUBASA vector engine in the context of the history of vector computers. References and links: Wikipedia


                    _Current_ VE (selling since Feb 2018, 16 nm FinFET) is nearly Tesla V100 but with better/lower power draw.
                    Price is significantly lower then V100.

                    Comment

                    Working...
                    X