Announcement

Collapse
No announcement yet.

Intel Publishes oneAPI Level 0 Specification

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by CommunityMember View Post
    oneAPI provide another higher level abstraction (DPC++) and builds on SYCL (of which OpenCL is a subset).
    Heh, my inner cynic wonders if Intel felt they needed their own programming language, in order to be taken seriously. So, following in the footsteps of such giants as Sun/Oracle, Microsoft, Google, Apple, and even the non-profit Mozilla foundation, they're "embracing and extending" C++. This also potentially serves their need to have some form of vendor lock-in.

    Well, if they at least offer good OpenCL and SYCL support, then I suppose I'll be satisfied with that.

    Comment


    • #12
      Originally posted by coder View Post
      Well, if they at least offer good OpenCL and SYCL support, then I suppose I'll be satisfied with that.
      Yes, I would also be happy if that was true... for all vendors. Intel is probably the vendor with better overall OpenCL support, for CPUs, GPUs and FPGAs. AMD dropped CPU support and now only supports OpenCL in (some) GPUs. Nvidia has reasonable OpenCL 1.2 support in their proprietary drivers going back 12 years worth of GPUs. POCL has come leaps and bounds and now offers very interesting open-source OpenCL CPU support. And there are some ARM proprietary implementations, mainly for GPUs (e.g. Mali). In general, the landscape is fragmented and OpenCL support could be in much better shape.

      Any device with good lower-level OpenCL support will basically have good higher-level SYCL support, since the latter typically runs on top of the former.

      Originally posted by coder View Post
      The OpenCL community seems to refer to this issue as "performance portability". The Khronos working group is certainly aware of the problem, though I have no idea what work is being done towards addressing it.
      Since OpenCL is so close to hardware, I think it's very difficult to solve this at the OpenCL kernel API level. SYCL may partially solve this, since it runs at a higher level, but I have to experiment with it, since current implementations are at a very "alpha" stage.

      Comment


      • #13
        Originally posted by fakenmc View Post
        Yes, I would also be happy if that was true... for all vendors. Intel is probably the vendor with better overall OpenCL support, for CPUs, GPUs and FPGAs. AMD dropped CPU support and now only supports OpenCL in (some) GPUs.
        It seems pretty clear that Navi support is just a matter of time.

        Originally posted by fakenmc View Post
        Nvidia has reasonable OpenCL 1.2 support in their proprietary drivers going back 12 years worth of GPUs.
        Not for their Tegra/Jetson devices, it must be said! Those support CUDA, but no OpenCL.

        Comment


        • #14
          Originally posted by coder View Post
          It seems pretty clear that Navi support is just a matter of time.
          I believe so, yes. Unfortunately, support for not that old GPUs (e.g. HD 7970), is non-existent in both ROCm and the AMD proprietary driver. The HD 7970, for example, still offers better compute performance than a few low-to-mid tier newer GPUs.

          Originally posted by coder View Post
          Not for their Tegra/Jetson devices, it must be said! Those support CUDA, but no OpenCL.
          Very unfortunate and short sighted. Hopefully SYCL implementations such as hipSYCL could minimize the issue going forward.

          Comment


          • #15
            Originally posted by fakenmc View Post

            I believe so, yes. Unfortunately, support for not that old GPUs (e.g. HD 7970), is non-existent in both ROCm and the AMD proprietary driver.
            Ah the legendary 7970. Relative to most consumers 8 years is ancient. I agree with you that it's not that old but AMD and every other company will go after what most people do or want. Hopefully the community can hack it with some inefficient wrapper or something...

            The HD 7970, for example, still offers better compute performance than a few low-to-mid tier newer GPUs.
            More than double ;-) the FP64 performance of a Nvidia Titan RTX it still packs a heavy punch in that regard. The FP32 is actually slow these days. If you're going to take compute seriously then you need to factor power usage. (Stats based on theoretical performance, I don't have access to all of the cards listed).


            High tier: 7970 (Reference, OpenCL 1.2 support)
            Die Size: 352 mm² - TDP: 300W - Shading Units: 2048 - FP32: 4.3 TFLOPS

            Low tier: 560 (Reference, OpenCL 2.0 support)
            Die Size: 123 mm² - TDP: 75W - Shading Units: 1024 - FP32: 2.6 TFLOPS

            Low tier: RX 5500 XT (Reference, OpenCL 2.0 support)
            Die Size: 158 mm² - TDP: 130W - Shading Units: 1408 - FP32: 5.2 TFLOPS

            Mid tier: 590 (Reference, OpenCL 2.0 support)
            Die Size: 232 mm² - TDP: 175W - Shading Units: 2304 - FP32: 7.1 TFLOPS

            High tier: 2080 Ti (Reference, OpenCL 2.0 support)
            Die Size: 754 mm² - TDP: 300W - Shading Units: 4352 - FP32: 13.4 TFLOPS

            High tier: Radeon VII (Reference, OpenCL 2.0 support)
            Die Size: 331 mm² - TDP: 295W - Shading Units: 3840 - FP32: 13.4 TFLOPS


            Overclocked:

            Sapphire HD 7970 Toxic 6 GB
            FP32: 4.915 TFLOPS
            FP64: 1229 GFLOPS (1:4)

            ASUS ROG STRIX RX 560 GAMING OC
            FP16: 2.716 TFLOPS (1:1)
            FP32: 2.716 TFLOPS
            FP64: 169 GFLOPS (1:16)

            ASRock Challenger RX 5500 XT (This overclocked version is expected to be released in 4 days)
            FP16: 10.5130483402 TFLOPS (2:1)
            FP32: 5.25652417009 TFLOPS
            FP64: 328.53276063 GFLOPS (1:16)

            XFX RX 590 50th Anniversary
            FP16: 7.373 TFLOPS (1:1)
            FP32: 7.373 TFLOPS
            FP64: 460 GFLOPS (1:16)

            GALAX RTX 2080 Ti HOF Limited Edition
            FP16: 31.60 TFLOPS (2:1)
            FP32: 15.80 TFLOPS
            FP64: 493 GFLOPS (1:32)

            Radeon VII (No AIB Partners boards were overclocked)
            FP16: 26.88 TFLOPS (2:1)
            FP32: 13.44 TFLOPS
            FP64: 3360 TFLOPS (1:4)

            Comment


            • #16
              Originally posted by Jabberwocky View Post
              Ah the legendary 7970. Relative to most consumers 8 years is ancient. I agree with you that it's not that old but AMD and every other company will go after what most people do or want. Hopefully the community can hack it with some inefficient wrapper or something...
              I have a HD 7870, which I replaced with GTX 980 Ti that I got for a bit over $400, in a clearance sale. I was simply stunned at the difference in gaming & benchmark performance, under Windows.

              In a Linux machine, I also ran a HD 6850, which was nice and not too hot. But, then I needed OpenGL 4.3 before the fp64 emulation went in for it, so I picked up a little RX 550 that offered the majority of the performance for a lot less power.

              Recently, I jumped on a Radeon VII. The fp64 and HBM2 capacity & bandwidth were too tempting. Lately, it's on sale for $530 (which is still more than I ever thought I'd spend on a GPU).

              Comment


              • #17
                Originally posted by Jabberwocky View Post

                Ah the legendary 7970. Relative to most consumers 8 years is ancient. I agree with you that it's not that old but AMD and every other company will go after what most people do or want. Hopefully the community can hack it with some inefficient wrapper or something...
                ...
                Thanks, nice recap.

                Comment

                Working...
                X