Announcement

**oleid** · 12 June 2016, 04:41 PM

How does AMD perform in these tests?

**Masush5** · 12 June 2016, 04:54 PM

Originally posted by oleid View Post

How does AMD perform in these tests?

Latest AMDGPU-PRO Ubuntu Linux Performance vs. NVIDIA, Including The GTX 1080 - Phoronix

http://www.phoronix.com/scan.php?page=article&item=amdgpu-gtx-1080&num=2

Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

**Linuxhippy** · 13 June 2016, 02:24 AM

It is quite sad (but typical) to see NVidia neglecting the open standard OpenCL (no OpenCL-2.x support, less optimized runtime compared to CUDA) - instead they push their proprietary CUDA.

What makes me wonder: Instead of worrying about being depending on a single supplier, the HPC world seems to be quite happy buying NVidia Teslas without a second thought.

**bug77** · 13 June 2016, 03:36 AM

Originally posted by Linuxhippy View Post

It is quite sad (but typical) to see NVidia neglecting the open standard OpenCL (no OpenCL-2.x support, less optimized runtime compared to CUDA) - instead they push their proprietary CUDA.

What makes me wonder: Instead of worrying about being depending on a single supplier, the HPC world seems to be quite happy buying NVidia Teslas without a second thought.

I guess between vendor neutrality and this: http://www.phoronix.com/scan.php?pag...gtx-1080&num=2
the choice is pretty clear cut when you're after performance.

**Foolou** · 13 June 2016, 04:55 AM

Originally posted by Linuxhippy View Post

What makes me wonder: Instead of worrying about being depending on a single supplier, the HPC world seems to be quite happy buying NVidia Teslas without a second thought.

Yes, it is a shame - but the performance we get from K80s is just too nice at the moment. At least there is some hope that OpenACC (or maybe OpenMP) extentions in C/C++ will be usable on other offloading compute resources like the Xeon Phi 2, so that we can reuse our code.

It is really a pity that vendor politics seems to prevent a common interface for shared memory parallelism on offloaded resources.

So being someone from that HPC world I assure you: there is a lot of second thoughts, but no alternatives at the moment - only design decisions that might help to be more independent in the future.

**pszilard** · 14 June 2016, 09:01 PM

I work with both CUDA and OpenCL in HPC and these benchmarks are definitely don't represent real-world performance! OpenCL on NVIDIA is horribly lagging behind CUDA in terms of feature support (a hint for the reason: https://twitter.com/jrprice89/status/667466444355993600). They don't support a lot of the stuff that allow actual scientific/HPC codes to run fast (not synthetic benchmarks with who knows how efficient implementation), e.g. warp shuffle just to name one.

Our kernels run at least 2x slower in OpenCL compared to CUDA and that's not a fluke or ill-optimized OpenCL.

So Michael, please pick some more relevant/representative benchmarks.
[Edit/plug: for a start you could consider our code, GROMACS, a widely used open source molecular simulation package. Beside CUDA and OpenCL support (on NVIDIA and AMD GPUs) it also has SIMD kernels for a dozen or more processor architectures as well as OpenMP multi-threading and MPI].

**pszilard** · 14 June 2016, 09:02 PM

Originally posted by Linuxhippy View Post

It is quite sad (but typical) to see NVidia neglecting the open standard OpenCL (no OpenCL-2.x support, less optimized runtime compared to CUDA) - instead they push their proprietary CUDA.

Sad it is, but also very strong vendor-bias/lock-in campaign too.

Originally posted by Linuxhippy View Post

What makes me wonder: Instead of worrying about being depending on a single supplier, the HPC world seems to be quite happy buying NVidia Teslas without a second thought.

What's the alternative? Sadly, there isn't really one. If you're lucky AMD's GPUs can keep up with NVIDIA's, but good luck fighting the compiler, runtime, lack of features exposed. Their hardware is good IMO, but the combination of poor software stack & dev-tools as well as the inherent challenges of having to deal with a nasty and another huge and aggressive competitor render the situation very difficult for AMD. Plus the issues that come with relying on the relatively slow evolution of an open standard don't make things easier for them to compete.

**pszilard** · 14 June 2016, 09:03 PM

Originally posted by Foolou View Post

Yes, it is a shame - but the performance we get from K80s is just too nice at the moment. At least there is some hope that OpenACC (or maybe OpenMP) extentions in C/C++ will be usable on other offloading compute resources like the Xeon Phi 2, so that we can reuse our code.

It is really a pity that vendor politics seems to prevent a common interface for shared memory parallelism on offloaded resources.

OpenACC on Intel? Not a chance, have you seen this? :-/

- YouTube

https://www.youtube.com/watch?v=RBFPBaxl_Jw

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

Originally posted by Foolou View Post

So being someone from that HPC world I assure you: there is a lot of second thoughts, but no alternatives at the moment - only design decisions that might help to be more independent in the future.

Let's be honest, many of us have jumped happily on the CUDA train and have not looked back much. Most have not even made an attempt to port to OpenCL and file bugs at NV and complain loud that what they are doing is not fair. Sure, it takes an effort, but without sobering up, realizing that the NVIDIA vendor lock-in efforts are working very well, and doing one's best to counteract it, if with nothing else but strong feedback, not much will change.

**Stebs** · 17 June 2016, 11:52 AM

Originally posted by pszilard View Post

IThey don't support a lot of the stuff that allow actual scientific/HPC codes to run fast (not synthetic benchmarks with who knows how efficient implementation), e.g. warp shuffle just to name one.

Our kernels run at least 2x slower in OpenCL compared to CUDA and that's not a fluke or ill-optimized OpenCL.

Would this situation change when/if Nvidia would release a OpenCL 2.0 or even 2.1 driver? Or are the missing features and/or performance independed of the availabe version of OpenCL. (And did you use OpenCL 1.2 for Nvidia, or still 1.1?).

Announcement

CUDA vs. OpenCL GPGPU Performance On NVIDIA's Pascal

CUDA vs. OpenCL GPGPU Performance On NVIDIA's Pascal

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment