Announcement

Collapse
No announcement yet.

PoCL 3.0 Released With Minimal OpenCL 3.0 Implementation For CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PoCL 3.0 Released With Minimal OpenCL 3.0 Implementation For CPUs

    Phoronix: PoCL 3.0 Released With Minimal OpenCL 3.0 Implementation For CPUs

    PoCL 3.0 has been formally released today for this portable OpenCL implementation that supports execution on CPUs or other back-ends by way of LLVM such as for targeting AMD HSA, NVIDIA GPUs, and other accelerators. With PoCL 3.0 comes initial OpenCL 3.0 support while the actual conformance results are still pending...

    https://www.phoronix.com/scan.php?pa...L-3.0-Released

  • #2
    What's the current status of compute acceleration? A mess!

    NVIDIA:
    - CUDA
    - OpenCL (an old version)
    - Rusticl OpenCL (Nouveau, poor performance)
    - PoCL
    - Vulkan Compute
    - SYCL (via hipSYCL)
    - SYCL (via ComputeCpp)
    ​​​​​​- DirectCompute (on Windows)

    AMD:
    - APP OpenCL (original implementation before ROCm era)
    - Clover OpenCL
    - ROCm OpenCL
    - Rusticl OpenCL
    - ROCm HIP
    - CUDA (partial, via ROCm HIPIFY)
    - CUDA (partial, via SYCLomatic)
    - PoCL
    - Vulkan Compute (with Mesa RADV)
    - Vulkan Compute (with AMDVLK)
    - Vulkan Compute (with proprietary driver)
    - Metal Performance Shaders (on macOS)
    - SYCL (via hipSYCL)
    - SYCL (via ComputeCpp)
    - DirectCompute (on Windows)

    Intel:
    - Probably PoCL too
    - Vulkan Compute (with ANV)
    - Vulkan Compute (with proprietary driver on Windows)
    - Metal Performance Shaders (on macOS)
    - NEO OpenCL
    - Beignet OpenCL
    - intel_clc OpenCL
    - Rusticl OpenCL
    - oneAPI Level Zero
    - CUDA (partial, via SYCLomatic)
    - CUDA (partial, via ZLUDA)
    - SYCL (via oneAPI DPC++)
    - SYCL (via ComputeCpp)
    - DirectCompute (on Windows)

    CPU:
    - SYCL (via hipSYCL)
    - SYCL (via ComputeCpp)
    - PoCL
    - Vulkan Compute (using Lavapipe)

    However, for some reason, CUDA has over 90% of the usage share, despite thousand of efforts to liberate ourselves from it being made!

    Comment


    • #3
      tildearrow, don't forget that DPC++ also has a CPU backend, and experimental CUDA and HIP backends. That (or maybe ComputeCpp) seems like the most promising option for making cross compatible binaries, although they do it by plugging in different backends at runtime (CUDA/HIP/Level Zero/CPU) rather than by targeting a a vendor independent driver API. It's a mess.

      And of course, ROCm doesn't even work on all AMD GPUs.

      Comment


      • #4
        Khronos aren't too good at bundling their APIs. They have Audio APIs, they have OpenCL, video codec APIs, video playback APIs, then all the AI and computer vision stuff...

        ...but you never know what is going to work on any one computer. OpenGL 4.0, but no OpenCL. Vulkan, but no video playback. 3d but no video compression. They should occasionally draw a line in the sand and say "in order to be Vulkan 1.2 compatible, it *must* also support these APIs: OpenAL, OpenCL 1.2, ...".

        Even if they are older library versions or just software implementations, they should at least be mandatory to be present for compatibility. This is really why Nvidia won the API wars: you can have a fully Vulkan certified graphics driver, but that doesn't guarantee that anything else is usable... Khronos needs to target all GPU features.

        Comment


        • #5
          Originally posted by tildearrow View Post
          What's the current status of compute acceleration? A mess!
          Is that a list you're maintaining? If you copied it from somewhere, you should include a link.

          I don't really get why the list includes Windows and MacOS, other than to pad it out and make it seem even more complicated than it is. Same thing with obsolete options, like Beignet. One could get the sense that you're really trying to sew FUD, here.

          I also doubt Vulkan compute is supported by all of those backends. I think you're wrong to assume every Vulkan implementation supports Vulkan Compute. It has a different SPIR-V.

          Originally posted by tildearrow View Post
          for some reason, CUDA has over 90% of the usage share, despite thousand of efforts to liberate ourselves from it being made!
          According to whom?

          Anyway, the problem of failing to coalesce around a good alternative is largely down to most of the big players deciding to push their own solutions - Apple, Google, and Microsoft. Google had the clout to make OpenCL happen, if they'd used as the base of their compute stack and made it a requirement for Android. Instead, they banned it!

          And without an API being ubiquitous, it can't really gain a lot of traction among app developers. If OpenCL support were as universal as OpenGL support, a lot more apps would be using it. POCL holds the potential to help make that a reality. Rusticle will also help with that.
          Last edited by coder; 12 June 2022, 03:51 AM.

          Comment


          • #6
            Originally posted by OneTimeShot View Post
            Khronos aren't too good at bundling their APIs. They have Audio APIs, they have OpenCL, video codec APIs, video playback APIs, then all the AI and computer vision stuff...
            They're separate standards for a reason. If they were meant to be coupled, then they'd all be a single standard.

            Originally posted by OneTimeShot View Post
            They should occasionally draw a line in the sand and say "in order to be Vulkan 1.2 compatible, it *must* also support these APIs: OpenAL, OpenCL 1.2, ...".
            That's not their job. That's something a higher-level entity should do, like how Google sets API support requirements in Android. We could have the same thing on Linux, if one or more of the big distros would simply decide to do it.

            Originally posted by OneTimeShot View Post
            you can have a fully Vulkan certified graphics driver, but that doesn't guarantee that anything else is usable...
            Vulkan on one GPU isn't the same as Vulkan on another. It has so many optional features that it's basically its own mini-version of the API support problem you're complaining about. That's why Vulkan 1.3 had to add the concept of Profiles.

            https://www.phoronix.com/scan.php?pa...vulkan-13-2022

            Comment


            • #7
              Originally posted by tildearrow View Post
              What's the current status of compute acceleration? A mess!

              NVIDIA:
              - CUDA
              - OpenCL (an old version)
              - Rusticl OpenCL (Nouveau, poor performance)
              - PoCL
              - Vulkan Compute
              - SYCL (via hipSYCL)
              - SYCL (via ComputeCpp)
              ​​​​​​- DirectCompute (on Windows)

              AMD:
              - APP OpenCL (original implementation before ROCm era)
              - Clover OpenCL
              - ROCm OpenCL
              - Rusticl OpenCL
              - ROCm HIP
              - CUDA (partial, via ROCm HIPIFY)
              - CUDA (partial, via SYCLomatic)
              - PoCL
              - Vulkan Compute (with Mesa RADV)
              - Vulkan Compute (with AMDVLK)
              - Vulkan Compute (with proprietary driver)
              - Metal Performance Shaders (on macOS)
              - SYCL (via hipSYCL)
              - SYCL (via ComputeCpp)
              - DirectCompute (on Windows)

              Intel:
              - Probably PoCL too
              - Vulkan Compute (with ANV)
              - Vulkan Compute (with proprietary driver on Windows)
              - Metal Performance Shaders (on macOS)
              - NEO OpenCL
              - Beignet OpenCL
              - intel_clc OpenCL
              - Rusticl OpenCL
              - oneAPI Level Zero
              - CUDA (partial, via SYCLomatic)
              - CUDA (partial, via ZLUDA)
              - SYCL (via oneAPI DPC++)
              - SYCL (via ComputeCpp)
              - DirectCompute (on Windows)

              CPU:
              - SYCL (via hipSYCL)
              - SYCL (via ComputeCpp)
              - PoCL
              - Vulkan Compute (using Lavapipe)

              However, for some reason, CUDA has over 90% of the usage share, despite thousand of efforts to liberate ourselves from it being made!
              Very simply, OpenCL is a mess, CUDA was earlier and was better and had way better documentation and way better support from Nvidia engineers. Scientists was using anyway workstations that had nvidia gpus anyway, so OpenCL widespread was not useful. Simply saying, Nvidia alone did more for CUDA, then rest of the world for OpenCL.

              Remaining technologies are simply too fresh, to judge (Vulkan compute is relativly fresh but also it is more made for sake of usage in game engines along the rendering) or are simply CUDA copycats that want to achieve compability with CUDA.

              Comment


              • #8
                Originally posted by piotrj3 View Post
                Very simply, OpenCL is a mess,
                Really? In what ways?

                Originally posted by piotrj3 View Post
                CUDA was earlier and was better
                Because OpenCL came after, it could take key concepts from CUDA and implement them in a cleaner and more consistent way. I dabbled with CUDA, after learning a bit about OpenCL, and I came to the opposite conclusion as you - that CUDA was more of a mess.

                I'll grant you that OpenCL lagged behind CUDA. That's always going to be true of standards. A single industry player can blaze ahead with their own API and update it in sync their hardware advances. A standard ends up being trailing edge, because they like to see multiple implementations of a feature, before it's incorporated in the core standard.

                This often isn't a big problem, because most apps don't need all the newest features. And for those which do, there are usually vendor-specific extensions you can use that get the job done.

                Originally posted by piotrj3 View Post
                Scientists was using anyway workstations that had nvidia gpus anyway, so OpenCL widespread was not useful.
                GPU compute exists on phones and for plenty of non-scientific apps. As I mentioned, Apple and Google are both responsible for killing off OpenCL on phones, even though most SoCs did support it. Google explicitly banned Android phones from shipping with OpenCL drivers.

                Originally posted by piotrj3 View Post
                Simply saying, Nvidia alone did more for CUDA, then rest of the world for OpenCL.
                When you look at how many implementations of OpenCL existed for different hardware, that really doesn't hold up.

                However, where Nvidia succeeded was by seeding the academic community with hardware and software tools, as well as hosting their GPU Technology Conference. This is largely why early deep learning frameworks supported CUDA first and foremost. With something that's an industry standard, no vendor has the same interest in pushing it into the hands of users, influencers, and into popular & promising software projects.

                CUDA therefore succeeded less by virtual of technical superiority, than because Nvidia understood the strategic importance of pushing it and building momentum behind it.

                Originally posted by piotrj3 View Post
                (Vulkan compute is relativly fresh but also it is more made for sake of usage in game engines along the rendering)
                People misunderstand and misuse the term "Vulkan compute". Proper Vulkan compute is not used in game engines, nor does Vulkan guarantee the sort of precision that would be needed to use it for scientific purposes.

                Also, Vulkan is a complex API that's difficult to use well. That doesn't mean you can't use it via a framework, but simply talking about "Vulkan compute", on its own, is too simplistic.

                Comment


                • #9
                  Originally posted by coder View Post
                  Really? In what ways?


                  Because OpenCL came after, it could take key concepts from CUDA and implement them in a cleaner and more consistent way. I dabbled with CUDA, after learning a bit about OpenCL, and I came to the opposite conclusion as you - that CUDA was more of a mess.

                  I'll grant you that OpenCL lagged behind CUDA. That's always going to be true of standards. A single industry player can blaze ahead with their own API and update it in sync their hardware advances. A standard ends up being trailing edge, because they like to see multiple implementations of a feature, before it's incorporated in the core standard.

                  This often isn't a big problem, because most apps don't need all the newest features. And for those which do, there are usually vendor-specific extensions you can use that get the job done.


                  GPU compute exists on phones and for plenty of non-scientific apps. As I mentioned, Apple and Google are both responsible for killing off OpenCL on phones, even though most SoCs did support it. Google explicitly banned Android phones from shipping with OpenCL drivers.


                  When you look at how many implementations of OpenCL existed for different hardware, that really doesn't hold up.

                  However, where Nvidia succeeded was by seeding the academic community with hardware and software tools, as well as hosting their GPU Technology Conference. This is largely why early deep learning frameworks supported CUDA first and foremost. With something that's an industry standard, no vendor has the same interest in pushing it into the hands of users, influencers, and into popular & promising software projects.

                  CUDA therefore succeeded less by virtual of technical superiority, than because Nvidia understood the strategic importance of pushing it and building momentum behind it.


                  People misunderstand and misuse the term "Vulkan compute". Proper Vulkan compute is not used in game engines, nor does Vulkan guarantee the sort of precision that would be needed to use it for scientific purposes.

                  Also, Vulkan is a complex API that's difficult to use well. That doesn't mean you can't use it via a framework, but simply talking about "Vulkan compute", on its own, is too simplistic.
                  You are 1st person to make such claim from people i know, but i get diffrent people have diffrent likes so i won't get into it.

                  Vulkan compute (or Vulkan in general) i know is strong low level, and has quite diffrent philosophy. But I know projects that succesfully employed it. We have waifu2x for example that doesn't have popular OpenCL backend, but has CUDA and Vulkan. And Vulkan backend works very very well. In fact project like that doesn't feel it should favour vulkan over opencl or cuda but here we are.

                  Point about precision : I don't agree with you here. If you have device compliant with OpenCL and Vulkan it is obvious that they gonna use same underlying hardware with same underlying precision for same kind of operation. So when Vulkan doesn't mandate it, precision issue you talk about is only potential issue for some kind of wierd hardware that does support vulkan compute shaders, but doesn't support OpenCL at all. Although you are correct there is some number formats in OpenCL not supported by Vulkan. So if you want to rely on very particular number format in porting opencl to vulkan you might end up here with problem.

                  Also as far as i know devices that support Vulkan actually does offer full IEEE754 compliance in practise. Vulkan spec gives quite good doc about maximum allowed error for operations.

                  Comment


                  • #10
                    Originally posted by tildearrow View Post
                    What's the current status of compute acceleration? A mess!

                    NVIDIA:
                    - CUDA
                    - OpenCL (an old version)
                    - Rusticl OpenCL (Nouveau, poor performance)
                    - PoCL
                    - Vulkan Compute
                    - SYCL (via hipSYCL)
                    - SYCL (via ComputeCpp)
                    ​​​​​​- DirectCompute (on Windows)

                    AMD:
                    - APP OpenCL (original implementation before ROCm era)
                    - Clover OpenCL
                    - ROCm OpenCL
                    - Rusticl OpenCL
                    - ROCm HIP
                    - CUDA (partial, via ROCm HIPIFY)
                    - CUDA (partial, via SYCLomatic)
                    - PoCL
                    - Vulkan Compute (with Mesa RADV)
                    - Vulkan Compute (with AMDVLK)
                    - Vulkan Compute (with proprietary driver)
                    - Metal Performance Shaders (on macOS)
                    - SYCL (via hipSYCL)
                    - SYCL (via ComputeCpp)
                    - DirectCompute (on Windows)

                    Intel:
                    - Probably PoCL too
                    - Vulkan Compute (with ANV)
                    - Vulkan Compute (with proprietary driver on Windows)
                    - Metal Performance Shaders (on macOS)
                    - NEO OpenCL
                    - Beignet OpenCL
                    - intel_clc OpenCL
                    - Rusticl OpenCL
                    - oneAPI Level Zero
                    - CUDA (partial, via SYCLomatic)
                    - CUDA (partial, via ZLUDA)
                    - SYCL (via oneAPI DPC++)
                    - SYCL (via ComputeCpp)
                    - DirectCompute (on Windows)

                    CPU:
                    - SYCL (via hipSYCL)
                    - SYCL (via ComputeCpp)
                    - PoCL
                    - Vulkan Compute (using Lavapipe)
                    There is only a solution: C++ !!! :-P

                    Comment

                    Working...
                    X