Announcement

**shmerl** · 03 October 2018, 05:13 PM

Students shouldn't waste time on CUDA lock-in. CUDA is not even an open standard. So Nvidia can bork any implementation they don't like using some legal trash precedents.

**wizard69** · 03 October 2018, 08:18 PM

Originally posted by shmerl View Post

Students shouldn't waste time on CUDA lock-in. CUDA is not even an open standard. So Nvidia can bork any implementation they don't like using some legal trash precedents.

While your tone is questionable the general idea is sound. CUDA needs to die in favor of standards.

**DrYak** · 04 October 2018, 06:53 AM

Originally posted by shmerl View Post

Students shouldn't waste time on CUDA lock-in. CUDA is not even an open standard.

The thing is :
- most of HPC I know of tend to have CUDA hardware for GPGPU tasks. I haven't heard about an university having an AMD GPGPU cluster. (Though AMD does produce dedicated hardware, so there might be a closed commercial one somewhere)
- Nvidia simply managed to grab this market faster (which is ironic, given that most of the early research (BrookGPU, CTM, sound effects written as openGL shaders) was initially done on ATI hardware)
- Training isn't necessarily only bachelor-level student (where the most important thing is understanding the core concepts behind GPGPU. The exact technology implementation (Nvidia's CUDA, AMD ROCm's HCC, or OpenCL) matters less). Training can also be preparing PhD-level students to be able to work on the cluster where they'll be coding and running their simulations. In this case what's import is the exact technology, down to the version number. A Jupyter Notebook compatible with CUDA would certainly help there.

Now I would like open standard to be more proeminent (and in the crypto-mining world, that appeared with very good OpenCL implementation).
Thus I applaud AMD's efforts (including their attempts at CUDA interoperability with HIP).

**Termy** · 04 October 2018, 08:44 AM

Originally posted by DrYak View Post

The thing is :
- most of HPC I know of tend to have CUDA hardware for GPGPU tasks. I haven't heard about an university having an AMD GPGPU cluster. (Though AMD does produce dedicated hardware, so there might be a closed commercial one somewhere)

openCL works on NV als well afaik?
But i would also be interested in why CUDA was able to get such a good foothold? I have nothing to do with GPGPU, but from what i found CUDA is not massively faster than openCL, even on NV-Hardware?

**DrYak** · 04 October 2018, 09:14 AM

Originally posted by Termy View Post

openCL works on NV als well afaik?

The "problem" (which probably isn't seen as an actual problem by the top execs at Nvidia) with OpenCL on Nvidia hardware is that Nvidia's openCL driver are as mature and as well optimized as their CUDA.

If you have Nvidia hardware (like those HPC centers) and if you only plan to run your code on the hardware you have at hand, CUDA is still considered a better bet.

Even more so, because some of these projects tend to be site specific. (you write a simulation software to process a specific data set on a specific cluster)
I wouldn't call it "throw away" because it's not exactly single use on shot, but it's really bound to a specific center (it's "single *type of* use").
thus on the portability vs. cycle shaving scale, they tend to favour the thing that will save 1-2% performance over the 5-10 years of lifetime of the project / cluster, even if it makes the code a little bit less likely to be useful to a completely different team in another site.

Also, CUDA tend to be a tiny bit more higher level and user friendly than openCL. Which could save a tiny bit for dev time, which is probably the next bullet point a project would be after.

Originally posted by Termy View Post

But i would also be interested in why CUDA was able to get such a good foothold?

They managed to quickly rush in with special dedicated hardware (Tesla compute nodes), available through well established partner. Basically ready-to-use server units, that you could just screw as-is into your server rack and be ready to use.
Meanwhile at the same time, ATI was making cards dedicated to workstation. So equipping your cluster would have required some custom built solution.
And if you're planning to build a large cluster the first seems more attractive than the second

(Compatibility-wise, at the beginning , Nvidia made sure that their gaming card could run CUDA too. So that small labs could assemble a multi-GPU workstation out of COTS part, gain knowhow there, that would be eventually transferable to compute nodes.
So university student would be motivated to build a (relatively) test machine that could also double as an (Extremely high-end) Crysis-playing machine after the hours.)

Also even if it came a bit later historically, Nvidia's CUDA SDK was one of the easiest to use that early. It felt very polished (specially compared to other technologies like BrookGPU).

All this made CUDA an attractive product.

Combine that with some strong financial incentive (probably giving rebates to universitires building clusters) and they managed to grab the market, while locking everyone into a proprietary solution.

OpenCL is decent alternative, but it came too late (at a time when CUDA has managed to catch mind share). And Nvidia dragged their feet in bringing a decent implementation into their SDK, so it wasn't worth writing for OpenCL if you only target Nvidia hardware, CUDA works better on the hardware that you happen to already have at hand.

**ax3l** · 04 October 2018, 11:29 AM

i

Originally posted by shmerl View Post

Students shouldn't waste time on CUDA lock-in. CUDA is not even an open standard. So Nvidia can bork any implementation they don't like using some legal trash precedents.

What the short-sighted haters don't see is, that we are using today's only reliably working modern C++ single-source (!) implementation for a demonstrator because we want to get our work and science done now and not in 5 or 10 years. Our contributions is btw. fully FOSS. Yes, the programming standard itself is a unicorn that needs permissive licensing, but step by step.

As soon as HIP, C++AMP and the like make it into clang in the next months and years, future generations will have an easy job adopting our initial work to fully open standards that do not only have open tooling (as this solution already has down to the PTX level) but also an open standard body and permissive licensing.

Open source is about doing, not about bullying others into doing things one would like to have. You like the work and think the stack should be more open? Contribute to the open ongoing HIP or SYCL frontend work in clang and copy-paste from our work for the JIT. It's exactly made for that purpose.

**kieffer** · 04 October 2018, 04:48 PM

"This is believed to be the first interpreter out there for the CUDA runtime API."
What about numba,cuda which is around for a couple of years ?

**ax3l** · 04 October 2018, 06:39 PM

Too shortly stated on the summary, sorry for that. It's the first interpreter for the CUDA C++ runtime API.

Also, be aware that out-of-source solutions such as jitify/nvcrt/... use the CUDA driver API. The difference looks subtle at first but is significant for actual code bases, which all use the runtime API.

**pbarletta** · 06 October 2018, 11:18 AM

Originally posted by ax3l View Post

i

What the short-sighted haters don't see is, that we are using today's only reliably working modern C++ single-source (!) implementation for a demonstrator because we want to get our work and science done now and not in 5 or 10 years. Our contributions is btw. fully FOSS. Yes, the programming standard itself is a unicorn that needs permissive licensing, but step by step.

As soon as HIP, C++AMP and the like make it into clang in the next months and years, future generations will have an easy job adopting our initial work to fully open standards that do not only have open tooling (as this solution already has down to the PTX level) but also an open standard body and permissive licensing.

Open source is about doing, not about bullying others into doing things one would like to have. You like the work and think the stack should be more open? Contribute to the open ongoing HIP or SYCL frontend work in clang and copy-paste from our work for the JIT. It's exactly made for that purpose.

Well said. I'm sorry you had to read those comments.
As someone who routinely uses CUDA, notebooks and Xeus Cling, I thank for your work, I'm eager to try it.

Announcement

NVIDIA CUDA Code In A JIT Interpreted Manner Via Cling

NVIDIA CUDA Code In A JIT Interpreted Manner Via Cling

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment