Page 2 of 2 FirstFirst 12
Results 11 to 19 of 19

Thread: Gallium3D Clover Can Now Execute OpenCL Native Kernels

  1. #11

    Default

    Quote Originally Posted by V!NCENT View Post
    What realy sucks balls about OpenCL is that you need to specifically target all kinds of different cards, even though your code will run on any OpenCL device. The problem is hardcore GPU understanding. For example the bank size and terminology is different between nVidia and ATi. Imagine programming soundcards >.<

    Does the current IR succesfully work as a GPU design abstraction with Clover, so that Clover converts OpenCL in general code that works just as great on nVidia as ATi? That would be massive win all over the place.
    as i understand it if you write code in openCL then it will work fine on ati, nvidia, multicore cpu etc. but if you want the code to super fast then you need to pay close attention to things like memory layout, shared caches, and other hardware dependant stuff, because memory bandwidth and cache misses can be significant. I think that tweaking would be very hard to automate.

  2. #12
    Join Date
    Nov 2008
    Location
    Madison, WI, USA
    Posts
    837

    Default

    Quote Originally Posted by ssam View Post
    as i understand it if you write code in openCL then it will work fine on ati, nvidia, multicore cpu etc. but if you want the code to super fast then you need to pay close attention to things like memory layout, shared caches, and other hardware dependant stuff, because memory bandwidth and cache misses can be significant. I think that tweaking would be very hard to automate.
    The warp/work group sizes can drastically vary between hardware, and the ideal code can as well (vector programming vs other methods). During program startup, it is possible to compile the OpenCL kernels and run quick performance tests to pick an ideal method, but that assumes that you are willing to write the auto-tuning code and also to write multiple codepaths.

    But you are right. If you write code that works on one OpenCL device (e.g. Nvidia), it should work on another device (CPU, DSP, AMD card, etc). There are extensions that can come into play, but as long as the device you are trying to execute on supports what you need, it should at least execute and produce results.

    Performance tuning of OpenCL code is affected by the specific hardware you're running on, but the code should at least execute properly on other devices.

  3. #13

    Default

    Quote Originally Posted by Veerappan View Post
    Performance tuning of OpenCL code is affected by the specific hardware you're running on, but the code should at least execute properly on other devices.
    am i right in thinking that even between different models from the same manufacturer you might need to do different optimisation?

  4. #14
    Join Date
    Nov 2008
    Location
    Madison, WI, USA
    Posts
    837

    Default

    Quote Originally Posted by ssam View Post
    am i right in thinking that even between different models from the same manufacturer you might need to do different optimisation?
    Yeah. Case in point would be AMD. Their r600/r700 chips were mostly 5-wide vector units, but the Cayman chips have moved to 4-wide vector units. The next architecture is supposedly going to be SIMD-based, which will lead to entirely different optimization strategies (possibly similar to Fermi, but we'll see).

  5. #15
    Join Date
    Sep 2010
    Posts
    419

    Default

    It would be wonderful if linux could have a FLOSS OpenCL implementation for CPU's.

    Hopefully this project could become part of the kernel in the future?

    Really looking forward to being able to use OpenCL on Linux.

  6. #16
    Join Date
    Jan 2009
    Posts
    1,516

    Default

    Quote Originally Posted by plonoma View Post
    It would be wonderful if linux could have a FLOSS OpenCL implementation for CPU's.
    doesn't this beat the purpose of OpenCL???

    except if at some point we get CPUs that are so fast that we don't need special graphic chips.

  7. #17
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,279

    Default

    Quote Originally Posted by Veerappan View Post
    Yeah. Case in point would be AMD. Their r600/r700 chips were mostly 5-wide vector units, but the Cayman chips have moved to 4-wide vector units. The next architecture is supposedly going to be SIMD-based, which will lead to entirely different optimization strategies (possibly similar to Fermi, but we'll see).
    One point I don't see mentioned much - current architectures are VLIW *and* SIMD.

    The SIMDs are 16-wide on high end chips and 4- or 8-wide on lower end chips.

  8. #18
    Join Date
    Nov 2008
    Location
    Madison, WI, USA
    Posts
    837

    Default

    Quote Originally Posted by bridgman View Post
    One point I don't see mentioned much - current architectures are VLIW *and* SIMD.

    The SIMDs are 16-wide on high end chips and 4- or 8-wide on lower end chips.
    Thanks for the clarification. The VLIW part is going away in the next architecture, right? At least that's what the recent presentations that I've read (e.g. Anandtech) have all stated.
    Last edited by Veerappan; 06-22-2011 at 05:22 PM.

  9. #19
    Join Date
    Nov 2008
    Location
    Madison, WI, USA
    Posts
    837

    Default

    Quote Originally Posted by plonoma View Post
    It would be wonderful if linux could have a FLOSS OpenCL implementation for CPU's.

    Hopefully this project could become part of the kernel in the future?

    Really looking forward to being able to use OpenCL on Linux.
    Seoul National University / Samsung OpenCL run-time:
    http://opencl.snu.ac.kr/

    License: LGPL v3
    Supports: ARM, DSPs (TI) and Cell SPUs

    Uses LLVM and Clang, and they've got future plans for x86 support. It builds on x86_64, but I haven't gotten more than a simple hello world program to link, and the hello world program explicitly tells me that my CPU model is unsupported currently (phenom ii x6 1055t).

    I'm not saying that it's feature complete or that it's perfect, but I've heard from people who've used it on ARM and it does the trick. Given that it's LGPL, I don't see any license issues with using it on Linux.

    I don't see it going into the kernel (it is something that should probably remain in user-space as a library), but it might be something that could be included in distributions in the future after some further testing/porting.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •