Announcement

Collapse
No announcement yet.

CUDA vs. OpenCL GPGPU Performance On NVIDIA's Pascal

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • pszilard
    replied
    Originally posted by Stebs View Post
    Would this situation change when/if Nvidia would release a OpenCL 2.0 or even 2.1 driver? Or are the missing features and/or performance independed of the availabe version of OpenCL. (And did you use OpenCL 1.2 for Nvidia, or still 1.1?).
    The code in question is OpenCL 1.2 (but it uses little from the 1.2-only features). I highly doubt OpenCL 2.0 will change the picture much, workgroup build-ins may help a bit. Part of the performance difference (although I have not quantified how much especially as performance measurements tools are crippled) is due to the lack of lane shuffle and floating point (or 64-bit integer) atomics support. Frankly, they could expose all those (and more like the __ldg builtin) even OpenCL 1.2 -- that's what vendor extensions are for. Of course, there needs to be will to support OpenCL which seems to be seen as a danger to CUDA evangelization, so I believe the quality of the NVIDIA OpenCL support, without serious pressure on them, will remain be poor.

    Leave a comment:


  • Stebs
    replied
    Originally posted by pszilard View Post
    IThey don't support a lot of the stuff that allow actual scientific/HPC codes to run fast (not synthetic benchmarks with who knows how efficient implementation), e.g. warp shuffle just to name one.

    Our kernels run at least 2x slower in OpenCL compared to CUDA and that's not a fluke or ill-optimized OpenCL.
    Would this situation change when/if Nvidia would release a OpenCL 2.0 or even 2.1 driver? Or are the missing features and/or performance independed of the availabe version of OpenCL. (And did you use OpenCL 1.2 for Nvidia, or still 1.1?).

    Leave a comment:


  • pszilard
    replied

    Originally posted by Foolou View Post
    Yes, it is a shame - but the performance we get from K80s is just too nice at the moment. At least there is some hope that OpenACC (or maybe OpenMP) extentions in C/C++ will be usable on other offloading compute resources like the Xeon Phi 2, so that we can reuse our code.

    It is really a pity that vendor politics seems to prevent a common interface for shared memory parallelism on offloaded resources.
    OpenACC on Intel? Not a chance, have you seen this? :-/
    https://www.youtube.com/watch?v=RBFPBaxl_Jw

    Originally posted by Foolou View Post
    So being someone from that HPC world I assure you: there is a lot of second thoughts, but no alternatives at the moment - only design decisions that might help to be more independent in the future.
    Let's be honest, many of us have jumped happily on the CUDA train and have not looked back much. Most have not even made an attempt to port to OpenCL and file bugs at NV and complain loud that what they are doing is not fair. Sure, it takes an effort, but without sobering up, realizing that the NVIDIA vendor lock-in efforts are working very well, and doing one's best to counteract it, if with nothing else but strong feedback, not much will change.

    Leave a comment:


  • pszilard
    replied

    Originally posted by Linuxhippy View Post
    It is quite sad (but typical) to see NVidia neglecting the open standard OpenCL (no OpenCL-2.x support, less optimized runtime compared to CUDA) - instead they push their proprietary CUDA.
    Sad it is, but also very strong vendor-bias/lock-in campaign too.

    Originally posted by Linuxhippy View Post
    What makes me wonder: Instead of worrying about being depending on a single supplier, the HPC world seems to be quite happy buying NVidia Teslas without a second thought.
    What's the alternative? Sadly, there isn't really one. If you're lucky AMD's GPUs can keep up with NVIDIA's, but good luck fighting the compiler, runtime, lack of features exposed. Their hardware is good IMO, but the combination of poor software stack & dev-tools as well as the inherent challenges of having to deal with a nasty and another huge and aggressive competitor render the situation very difficult for AMD. Plus the issues that come with relying on the relatively slow evolution of an open standard don't make things easier for them to compete.
    Last edited by pszilard; 06-14-2016, 09:14 PM.

    Leave a comment:


  • pszilard
    replied
    I work with both CUDA and OpenCL in HPC and these benchmarks are definitely don't represent real-world performance! OpenCL on NVIDIA is horribly lagging behind CUDA in terms of feature support (a hint for the reason: https://twitter.com/jrprice89/status/667466444355993600). They don't support a lot of the stuff that allow actual scientific/HPC codes to run fast (not synthetic benchmarks with who knows how efficient implementation), e.g. warp shuffle just to name one.

    Our kernels run at least 2x slower in OpenCL compared to CUDA and that's not a fluke or ill-optimized OpenCL.

    So Michael, please pick some more relevant/representative benchmarks.
    [Edit/plug: for a start you could consider our code, GROMACS, a widely used open source molecular simulation package. Beside CUDA and OpenCL support (on NVIDIA and AMD GPUs) it also has SIMD kernels for a dozen or more processor architectures as well as OpenMP multi-threading and MPI].
    Last edited by pszilard; 06-14-2016, 09:12 PM.

    Leave a comment:


  • Foolou
    replied
    Originally posted by Linuxhippy View Post
    What makes me wonder: Instead of worrying about being depending on a single supplier, the HPC world seems to be quite happy buying NVidia Teslas without a second thought.
    Yes, it is a shame - but the performance we get from K80s is just too nice at the moment. At least there is some hope that OpenACC (or maybe OpenMP) extentions in C/C++ will be usable on other offloading compute resources like the Xeon Phi 2, so that we can reuse our code.

    It is really a pity that vendor politics seems to prevent a common interface for shared memory parallelism on offloaded resources.

    So being someone from that HPC world I assure you: there is a lot of second thoughts, but no alternatives at the moment - only design decisions that might help to be more independent in the future.

    Leave a comment:


  • bug77
    replied
    Originally posted by Linuxhippy View Post
    It is quite sad (but typical) to see NVidia neglecting the open standard OpenCL (no OpenCL-2.x support, less optimized runtime compared to CUDA) - instead they push their proprietary CUDA.

    What makes me wonder: Instead of worrying about being depending on a single supplier, the HPC world seems to be quite happy buying NVidia Teslas without a second thought.
    I guess between vendor neutrality and this: http://www.phoronix.com/scan.php?pag...gtx-1080&num=2
    the choice is pretty clear cut when you're after performance.

    Leave a comment:


  • Linuxhippy
    replied
    It is quite sad (but typical) to see NVidia neglecting the open standard OpenCL (no OpenCL-2.x support, less optimized runtime compared to CUDA) - instead they push their proprietary CUDA.

    What makes me wonder: Instead of worrying about being depending on a single supplier, the HPC world seems to be quite happy buying NVidia Teslas without a second thought.

    Leave a comment:


  • Masush5
    replied
    Originally posted by oleid View Post
    How does AMD perform in these tests?
    http://www.phoronix.com/scan.php?pag...gtx-1080&num=2

    Leave a comment:


  • oleid
    replied
    How does AMD perform in these tests?

    Leave a comment:

Working...
X