Announcement

**ssam** · 21 June 2011, 11:41 AM

Originally posted by V!NCENT View Post

What realy sucks balls about OpenCL is that you need to specifically target all kinds of different cards, even though your code will run on any OpenCL device. The problem is hardcore GPU understanding. For example the bank size and terminology is different between nVidia and ATi. Imagine programming soundcards >.<

Does the current IR succesfully work as a GPU design abstraction with Clover, so that Clover converts OpenCL in general code that works just as great on nVidia as ATi? That would be massive win all over the place.

as i understand it if you write code in openCL then it will work fine on ati, nvidia, multicore cpu etc. but if you want the code to super fast then you need to pay close attention to things like memory layout, shared caches, and other hardware dependant stuff, because memory bandwidth and cache misses can be significant. I think that tweaking would be very hard to automate.

**Veerappan** · 21 June 2011, 12:48 PM

Originally posted by ssam View Post

as i understand it if you write code in openCL then it will work fine on ati, nvidia, multicore cpu etc. but if you want the code to super fast then you need to pay close attention to things like memory layout, shared caches, and other hardware dependant stuff, because memory bandwidth and cache misses can be significant. I think that tweaking would be very hard to automate.

The warp/work group sizes can drastically vary between hardware, and the ideal code can as well (vector programming vs other methods). During program startup, it is possible to compile the OpenCL kernels and run quick performance tests to pick an ideal method, but that assumes that you are willing to write the auto-tuning code and also to write multiple codepaths.

But you are right. If you write code that works on one OpenCL device (e.g. Nvidia), it should work on another device (CPU, DSP, AMD card, etc). There are extensions that can come into play, but as long as the device you are trying to execute on supports what you need, it should at least execute and produce results.

Performance tuning of OpenCL code is affected by the specific hardware you're running on, but the code should at least execute properly on other devices.

**ssam** · 21 June 2011, 12:52 PM

Originally posted by Veerappan View Post

Performance tuning of OpenCL code is affected by the specific hardware you're running on, but the code should at least execute properly on other devices.

am i right in thinking that even between different models from the same manufacturer you might need to do different optimisation?

**Veerappan** · 21 June 2011, 03:13 PM

Originally posted by ssam View Post

am i right in thinking that even between different models from the same manufacturer you might need to do different optimisation?

Yeah. Case in point would be AMD. Their r600/r700 chips were mostly 5-wide vector units, but the Cayman chips have moved to 4-wide vector units. The next architecture is supposedly going to be SIMD-based, which will lead to entirely different optimization strategies (possibly similar to Fermi, but we'll see).

**plonoma** · 21 June 2011, 03:58 PM

It would be wonderful if linux could have a FLOSS OpenCL implementation for CPU's.

Hopefully this project could become part of the kernel in the future?

Really looking forward to being able to use OpenCL on Linux.

**89c51** · 21 June 2011, 04:11 PM

Originally posted by plonoma View Post

It would be wonderful if linux could have a FLOSS OpenCL implementation for CPU's.

doesn't this beat the purpose of OpenCL???

except if at some point we get CPUs that are so fast that we don't need special graphic chips.

**bridgman** · 21 June 2011, 04:28 PM

Originally posted by Veerappan View Post

Yeah. Case in point would be AMD. Their r600/r700 chips were mostly 5-wide vector units, but the Cayman chips have moved to 4-wide vector units. The next architecture is supposedly going to be SIMD-based, which will lead to entirely different optimization strategies (possibly similar to Fermi, but we'll see).

One point I don't see mentioned much - current architectures are VLIW *and* SIMD.

The SIMDs are 16-wide on high end chips and 4- or 8-wide on lower end chips.

**Veerappan** · 22 June 2011, 05:13 PM

Originally posted by bridgman View Post

One point I don't see mentioned much - current architectures are VLIW *and* SIMD.

The SIMDs are 16-wide on high end chips and 4- or 8-wide on lower end chips.

Thanks for the clarification. The VLIW part is going away in the next architecture, right? At least that's what the recent presentations that I've read (e.g. Anandtech) have all stated.

**Veerappan** · 22 June 2011, 05:22 PM

Originally posted by plonoma View Post

It would be wonderful if linux could have a FLOSS OpenCL implementation for CPU's.

Hopefully this project could become part of the kernel in the future?

Really looking forward to being able to use OpenCL on Linux.

Seoul National University / Samsung OpenCL run-time:

http://opencl.snu.ac.kr/

License: LGPL v3
Supports: ARM, DSPs (TI) and Cell SPUs

Uses LLVM and Clang, and they've got future plans for x86 support. It builds on x86_64, but I haven't gotten more than a simple hello world program to link, and the hello world program explicitly tells me that my CPU model is unsupported currently (phenom ii x6 1055t).

I'm not saying that it's feature complete or that it's perfect, but I've heard from people who've used it on ARM and it does the trick. Given that it's LGPL, I don't see any license issues with using it on Linux.

I don't see it going into the kernel (it is something that should probably remain in user-space as a library), but it might be something that could be included in distributions in the future after some further testing/porting.

Announcement

Gallium3D Clover Can Now Execute OpenCL Native Kernels

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment