Announcement

Collapse
No announcement yet.

PoCL 3.0 Released With Minimal OpenCL 3.0 Implementation For CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • boboviz
    replied
    Originally posted by coder View Post
    The problem with running naive C++ on GPUs is that you have to keep branch divergence to a minimum, or else your performance is going to be garbage. And that means large swaths of STL are out of the question, due to the amount of heap allocation.
    I know there are some problems with C++, but the idea to have a "unique" (and STANDARD) starting point...

    Leave a comment:


  • coder
    replied
    Originally posted by boboviz View Post
    There is only a solution: C++ !!! :-P
    Yeah, and plenty of half-way solutions, like C++AMP, DPC++, and SYCL. Even OpenCL 2.2 added many C++ features to the kernel language.

    The HSA IR supposedly has all of the facilities needed to support generic C++ (including exceptions!), but then HSA never really caught on.

    The problem with running naive C++ on GPUs is that you have to keep branch divergence to a minimum, or else your performance is going to be garbage. And that means large swaths of STL are out of the question, due to the amount of heap allocation.

    Leave a comment:


  • coder
    replied
    Originally posted by piotrj3 View Post
    Vulkan compute (or Vulkan in general) i know is strong low level, and has quite diffrent philosophy. But I know projects that succesfully employed it.
    You can write compute shaders in Vulkan without using the Vulkan Compute SPIRV. A lot of the projects using Vulkan for computation are doing just that.

    Originally posted by piotrj3 View Post
    We have waifu2x for example that doesn't have popular OpenCL backend, but has CUDA and Vulkan.
    That's deep learning, which has far lower precision requirements. That you can use Vulkan for it says nothing about Vulkan's suitability for things like CAD, CFD, FEA, finance, or HPC.

    Originally posted by piotrj3 View Post
    Point about precision : I don't agree with you here. If you have device compliant with OpenCL and Vulkan it is obvious that they gonna use same underlying hardware with same underlying precision for same kind of operation.
    It sounds like you've never seen the inside of a math library. GPUs don't have a hardware block for each and every mathematical function - they have hardware for doing like Taylor series expansions, but it can only evaluate a certain number of terms per cycle, for instance. For greater accuracy, you have to evaluate more terms, which takes longer. If the implementation assumes you don't need higher precision, it's going to stop sooner.

    And if you look at the precision requirements in the Vulkan spec, they match those of GLSL, which are far lower than OpenCL's.

    Originally posted by piotrj3 View Post
    So when Vulkan doesn't mandate it, precision issue you talk about is only potential issue
    Yes, you can have an implementation which delivers more than the minimum precision, but it can't be assumed. That impacts portability and limits the ability of software developers to support customers, if we're talking about software with any special precision requirements.

    Originally posted by piotrj3 View Post
    Also as far as i know devices that support Vulkan actually does offer full IEEE754 compliance in practise.
    That spec doesn't dictate the accuracy of actual implementations.

    Originally posted by piotrj3 View Post
    Vulkan spec gives quite good doc about maximum allowed error for operations.
    Yes, I've read it, and it's quite a bit looser than OpenCL. Rather than me reposting it, you can just read this post for my comparative analysis of Vulkan vs OpenCL precision guarantees:

    Leave a comment:


  • boboviz
    replied
    Originally posted by tildearrow View Post
    What's the current status of compute acceleration? A mess!

    NVIDIA:
    - CUDA
    - OpenCL (an old version)
    - Rusticl OpenCL (Nouveau, poor performance)
    - PoCL
    - Vulkan Compute
    - SYCL (via hipSYCL)
    - SYCL (via ComputeCpp)
    ​​​​​​- DirectCompute (on Windows)

    AMD:
    - APP OpenCL (original implementation before ROCm era)
    - Clover OpenCL
    - ROCm OpenCL
    - Rusticl OpenCL
    - ROCm HIP
    - CUDA (partial, via ROCm HIPIFY)
    - CUDA (partial, via SYCLomatic)
    - PoCL
    - Vulkan Compute (with Mesa RADV)
    - Vulkan Compute (with AMDVLK)
    - Vulkan Compute (with proprietary driver)
    - Metal Performance Shaders (on macOS)
    - SYCL (via hipSYCL)
    - SYCL (via ComputeCpp)
    - DirectCompute (on Windows)

    Intel:
    - Probably PoCL too
    - Vulkan Compute (with ANV)
    - Vulkan Compute (with proprietary driver on Windows)
    - Metal Performance Shaders (on macOS)
    - NEO OpenCL
    - Beignet OpenCL
    - intel_clc OpenCL
    - Rusticl OpenCL
    - oneAPI Level Zero
    - CUDA (partial, via SYCLomatic)
    - CUDA (partial, via ZLUDA)
    - SYCL (via oneAPI DPC++)
    - SYCL (via ComputeCpp)
    - DirectCompute (on Windows)

    CPU:
    - SYCL (via hipSYCL)
    - SYCL (via ComputeCpp)
    - PoCL
    - Vulkan Compute (using Lavapipe)
    There is only a solution: C++ !!! :-P

    Leave a comment:


  • piotrj3
    replied
    Originally posted by coder View Post
    Really? In what ways?


    Because OpenCL came after, it could take key concepts from CUDA and implement them in a cleaner and more consistent way. I dabbled with CUDA, after learning a bit about OpenCL, and I came to the opposite conclusion as you - that CUDA was more of a mess.

    I'll grant you that OpenCL lagged behind CUDA. That's always going to be true of standards. A single industry player can blaze ahead with their own API and update it in sync their hardware advances. A standard ends up being trailing edge, because they like to see multiple implementations of a feature, before it's incorporated in the core standard.

    This often isn't a big problem, because most apps don't need all the newest features. And for those which do, there are usually vendor-specific extensions you can use that get the job done.


    GPU compute exists on phones and for plenty of non-scientific apps. As I mentioned, Apple and Google are both responsible for killing off OpenCL on phones, even though most SoCs did support it. Google explicitly banned Android phones from shipping with OpenCL drivers.


    When you look at how many implementations of OpenCL existed for different hardware, that really doesn't hold up.

    However, where Nvidia succeeded was by seeding the academic community with hardware and software tools, as well as hosting their GPU Technology Conference. This is largely why early deep learning frameworks supported CUDA first and foremost. With something that's an industry standard, no vendor has the same interest in pushing it into the hands of users, influencers, and into popular & promising software projects.

    CUDA therefore succeeded less by virtual of technical superiority, than because Nvidia understood the strategic importance of pushing it and building momentum behind it.


    People misunderstand and misuse the term "Vulkan compute". Proper Vulkan compute is not used in game engines, nor does Vulkan guarantee the sort of precision that would be needed to use it for scientific purposes.

    Also, Vulkan is a complex API that's difficult to use well. That doesn't mean you can't use it via a framework, but simply talking about "Vulkan compute", on its own, is too simplistic.
    You are 1st person to make such claim from people i know, but i get diffrent people have diffrent likes so i won't get into it.

    Vulkan compute (or Vulkan in general) i know is strong low level, and has quite diffrent philosophy. But I know projects that succesfully employed it. We have waifu2x for example that doesn't have popular OpenCL backend, but has CUDA and Vulkan. And Vulkan backend works very very well. In fact project like that doesn't feel it should favour vulkan over opencl or cuda but here we are.

    Point about precision : I don't agree with you here. If you have device compliant with OpenCL and Vulkan it is obvious that they gonna use same underlying hardware with same underlying precision for same kind of operation. So when Vulkan doesn't mandate it, precision issue you talk about is only potential issue for some kind of wierd hardware that does support vulkan compute shaders, but doesn't support OpenCL at all. Although you are correct there is some number formats in OpenCL not supported by Vulkan. So if you want to rely on very particular number format in porting opencl to vulkan you might end up here with problem.

    Also as far as i know devices that support Vulkan actually does offer full IEEE754 compliance in practise. Vulkan spec gives quite good doc about maximum allowed error for operations.

    Leave a comment:


  • coder
    replied
    Originally posted by piotrj3 View Post
    Very simply, OpenCL is a mess,
    Really? In what ways?

    Originally posted by piotrj3 View Post
    CUDA was earlier and was better
    Because OpenCL came after, it could take key concepts from CUDA and implement them in a cleaner and more consistent way. I dabbled with CUDA, after learning a bit about OpenCL, and I came to the opposite conclusion as you - that CUDA was more of a mess.

    I'll grant you that OpenCL lagged behind CUDA. That's always going to be true of standards. A single industry player can blaze ahead with their own API and update it in sync their hardware advances. A standard ends up being trailing edge, because they like to see multiple implementations of a feature, before it's incorporated in the core standard.

    This often isn't a big problem, because most apps don't need all the newest features. And for those which do, there are usually vendor-specific extensions you can use that get the job done.

    Originally posted by piotrj3 View Post
    Scientists was using anyway workstations that had nvidia gpus anyway, so OpenCL widespread was not useful.
    GPU compute exists on phones and for plenty of non-scientific apps. As I mentioned, Apple and Google are both responsible for killing off OpenCL on phones, even though most SoCs did support it. Google explicitly banned Android phones from shipping with OpenCL drivers.

    Originally posted by piotrj3 View Post
    Simply saying, Nvidia alone did more for CUDA, then rest of the world for OpenCL.
    When you look at how many implementations of OpenCL existed for different hardware, that really doesn't hold up.

    However, where Nvidia succeeded was by seeding the academic community with hardware and software tools, as well as hosting their GPU Technology Conference. This is largely why early deep learning frameworks supported CUDA first and foremost. With something that's an industry standard, no vendor has the same interest in pushing it into the hands of users, influencers, and into popular & promising software projects.

    CUDA therefore succeeded less by virtual of technical superiority, than because Nvidia understood the strategic importance of pushing it and building momentum behind it.

    Originally posted by piotrj3 View Post
    (Vulkan compute is relativly fresh but also it is more made for sake of usage in game engines along the rendering)
    People misunderstand and misuse the term "Vulkan compute". Proper Vulkan compute is not used in game engines, nor does Vulkan guarantee the sort of precision that would be needed to use it for scientific purposes.

    Also, Vulkan is a complex API that's difficult to use well. That doesn't mean you can't use it via a framework, but simply talking about "Vulkan compute", on its own, is too simplistic.

    Leave a comment:


  • piotrj3
    replied
    Originally posted by tildearrow View Post
    What's the current status of compute acceleration? A mess!

    NVIDIA:
    - CUDA
    - OpenCL (an old version)
    - Rusticl OpenCL (Nouveau, poor performance)
    - PoCL
    - Vulkan Compute
    - SYCL (via hipSYCL)
    - SYCL (via ComputeCpp)
    ​​​​​​- DirectCompute (on Windows)

    AMD:
    - APP OpenCL (original implementation before ROCm era)
    - Clover OpenCL
    - ROCm OpenCL
    - Rusticl OpenCL
    - ROCm HIP
    - CUDA (partial, via ROCm HIPIFY)
    - CUDA (partial, via SYCLomatic)
    - PoCL
    - Vulkan Compute (with Mesa RADV)
    - Vulkan Compute (with AMDVLK)
    - Vulkan Compute (with proprietary driver)
    - Metal Performance Shaders (on macOS)
    - SYCL (via hipSYCL)
    - SYCL (via ComputeCpp)
    - DirectCompute (on Windows)

    Intel:
    - Probably PoCL too
    - Vulkan Compute (with ANV)
    - Vulkan Compute (with proprietary driver on Windows)
    - Metal Performance Shaders (on macOS)
    - NEO OpenCL
    - Beignet OpenCL
    - intel_clc OpenCL
    - Rusticl OpenCL
    - oneAPI Level Zero
    - CUDA (partial, via SYCLomatic)
    - CUDA (partial, via ZLUDA)
    - SYCL (via oneAPI DPC++)
    - SYCL (via ComputeCpp)
    - DirectCompute (on Windows)

    CPU:
    - SYCL (via hipSYCL)
    - SYCL (via ComputeCpp)
    - PoCL
    - Vulkan Compute (using Lavapipe)

    However, for some reason, CUDA has over 90% of the usage share, despite thousand of efforts to liberate ourselves from it being made!
    Very simply, OpenCL is a mess, CUDA was earlier and was better and had way better documentation and way better support from Nvidia engineers. Scientists was using anyway workstations that had nvidia gpus anyway, so OpenCL widespread was not useful. Simply saying, Nvidia alone did more for CUDA, then rest of the world for OpenCL.

    Remaining technologies are simply too fresh, to judge (Vulkan compute is relativly fresh but also it is more made for sake of usage in game engines along the rendering) or are simply CUDA copycats that want to achieve compability with CUDA.

    Leave a comment:


  • coder
    replied
    Originally posted by OneTimeShot View Post
    Khronos aren't too good at bundling their APIs. They have Audio APIs, they have OpenCL, video codec APIs, video playback APIs, then all the AI and computer vision stuff...
    They're separate standards for a reason. If they were meant to be coupled, then they'd all be a single standard.

    Originally posted by OneTimeShot View Post
    They should occasionally draw a line in the sand and say "in order to be Vulkan 1.2 compatible, it *must* also support these APIs: OpenAL, OpenCL 1.2, ...".
    That's not their job. That's something a higher-level entity should do, like how Google sets API support requirements in Android. We could have the same thing on Linux, if one or more of the big distros would simply decide to do it.

    Originally posted by OneTimeShot View Post
    you can have a fully Vulkan certified graphics driver, but that doesn't guarantee that anything else is usable...
    Vulkan on one GPU isn't the same as Vulkan on another. It has so many optional features that it's basically its own mini-version of the API support problem you're complaining about. That's why Vulkan 1.3 had to add the concept of Profiles.

    https://www.phoronix.com/scan.php?pa...vulkan-13-2022

    Leave a comment:


  • coder
    replied
    Originally posted by tildearrow View Post
    What's the current status of compute acceleration? A mess!
    Is that a list you're maintaining? If you copied it from somewhere, you should include a link.

    I don't really get why the list includes Windows and MacOS, other than to pad it out and make it seem even more complicated than it is. Same thing with obsolete options, like Beignet. One could get the sense that you're really trying to sew FUD, here.

    I also doubt Vulkan compute is supported by all of those backends. I think you're wrong to assume every Vulkan implementation supports Vulkan Compute. It has a different SPIR-V.

    Originally posted by tildearrow View Post
    for some reason, CUDA has over 90% of the usage share, despite thousand of efforts to liberate ourselves from it being made!
    According to whom?

    Anyway, the problem of failing to coalesce around a good alternative is largely down to most of the big players deciding to push their own solutions - Apple, Google, and Microsoft. Google had the clout to make OpenCL happen, if they'd used as the base of their compute stack and made it a requirement for Android. Instead, they banned it!

    And without an API being ubiquitous, it can't really gain a lot of traction among app developers. If OpenCL support were as universal as OpenGL support, a lot more apps would be using it. POCL holds the potential to help make that a reality. Rusticle will also help with that.
    Last edited by coder; 12 June 2022, 03:51 AM.

    Leave a comment:


  • OneTimeShot
    replied
    Khronos aren't too good at bundling their APIs. They have Audio APIs, they have OpenCL, video codec APIs, video playback APIs, then all the AI and computer vision stuff...

    ...but you never know what is going to work on any one computer. OpenGL 4.0, but no OpenCL. Vulkan, but no video playback. 3d but no video compression. They should occasionally draw a line in the sand and say "in order to be Vulkan 1.2 compatible, it *must* also support these APIs: OpenAL, OpenCL 1.2, ...".

    Even if they are older library versions or just software implementations, they should at least be mandatory to be present for compatibility. This is really why Nvidia won the API wars: you can have a fully Vulkan certified graphics driver, but that doesn't guarantee that anything else is usable... Khronos needs to target all GPU features.

    Leave a comment:

Working...
X