Announcement

Collapse
No announcement yet.

Gallium3D Clover Can Now Execute OpenCL Native Kernels

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by V!NCENT View Post
    What realy sucks balls about OpenCL is that you need to specifically target all kinds of different cards, even though your code will run on any OpenCL device. The problem is hardcore GPU understanding. For example the bank size and terminology is different between nVidia and ATi. Imagine programming soundcards >.<

    Does the current IR succesfully work as a GPU design abstraction with Clover, so that Clover converts OpenCL in general code that works just as great on nVidia as ATi? That would be massive win all over the place.
    as i understand it if you write code in openCL then it will work fine on ati, nvidia, multicore cpu etc. but if you want the code to super fast then you need to pay close attention to things like memory layout, shared caches, and other hardware dependant stuff, because memory bandwidth and cache misses can be significant. I think that tweaking would be very hard to automate.

    Comment


    • #12
      Originally posted by ssam View Post
      as i understand it if you write code in openCL then it will work fine on ati, nvidia, multicore cpu etc. but if you want the code to super fast then you need to pay close attention to things like memory layout, shared caches, and other hardware dependant stuff, because memory bandwidth and cache misses can be significant. I think that tweaking would be very hard to automate.
      The warp/work group sizes can drastically vary between hardware, and the ideal code can as well (vector programming vs other methods). During program startup, it is possible to compile the OpenCL kernels and run quick performance tests to pick an ideal method, but that assumes that you are willing to write the auto-tuning code and also to write multiple codepaths.

      But you are right. If you write code that works on one OpenCL device (e.g. Nvidia), it should work on another device (CPU, DSP, AMD card, etc). There are extensions that can come into play, but as long as the device you are trying to execute on supports what you need, it should at least execute and produce results.

      Performance tuning of OpenCL code is affected by the specific hardware you're running on, but the code should at least execute properly on other devices.

      Comment


      • #13
        Originally posted by Veerappan View Post
        Performance tuning of OpenCL code is affected by the specific hardware you're running on, but the code should at least execute properly on other devices.
        am i right in thinking that even between different models from the same manufacturer you might need to do different optimisation?

        Comment


        • #14
          Originally posted by ssam View Post
          am i right in thinking that even between different models from the same manufacturer you might need to do different optimisation?
          Yeah. Case in point would be AMD. Their r600/r700 chips were mostly 5-wide vector units, but the Cayman chips have moved to 4-wide vector units. The next architecture is supposedly going to be SIMD-based, which will lead to entirely different optimization strategies (possibly similar to Fermi, but we'll see).

          Comment


          • #15
            It would be wonderful if linux could have a FLOSS OpenCL implementation for CPU's.

            Hopefully this project could become part of the kernel in the future?

            Really looking forward to being able to use OpenCL on Linux.

            Comment


            • #16
              Originally posted by plonoma View Post
              It would be wonderful if linux could have a FLOSS OpenCL implementation for CPU's.
              doesn't this beat the purpose of OpenCL???

              except if at some point we get CPUs that are so fast that we don't need special graphic chips.

              Comment


              • #17
                Originally posted by Veerappan View Post
                Yeah. Case in point would be AMD. Their r600/r700 chips were mostly 5-wide vector units, but the Cayman chips have moved to 4-wide vector units. The next architecture is supposedly going to be SIMD-based, which will lead to entirely different optimization strategies (possibly similar to Fermi, but we'll see).
                One point I don't see mentioned much - current architectures are VLIW *and* SIMD.

                The SIMDs are 16-wide on high end chips and 4- or 8-wide on lower end chips.
                Test signature

                Comment


                • #18
                  Originally posted by bridgman View Post
                  One point I don't see mentioned much - current architectures are VLIW *and* SIMD.

                  The SIMDs are 16-wide on high end chips and 4- or 8-wide on lower end chips.
                  Thanks for the clarification. The VLIW part is going away in the next architecture, right? At least that's what the recent presentations that I've read (e.g. Anandtech) have all stated.
                  Last edited by Veerappan; 22 June 2011, 05:22 PM.

                  Comment


                  • #19
                    Originally posted by plonoma View Post
                    It would be wonderful if linux could have a FLOSS OpenCL implementation for CPU's.

                    Hopefully this project could become part of the kernel in the future?

                    Really looking forward to being able to use OpenCL on Linux.
                    Seoul National University / Samsung OpenCL run-time:


                    License: LGPL v3
                    Supports: ARM, DSPs (TI) and Cell SPUs

                    Uses LLVM and Clang, and they've got future plans for x86 support. It builds on x86_64, but I haven't gotten more than a simple hello world program to link, and the hello world program explicitly tells me that my CPU model is unsupported currently (phenom ii x6 1055t).

                    I'm not saying that it's feature complete or that it's perfect, but I've heard from people who've used it on ARM and it does the trick. Given that it's LGPL, I don't see any license issues with using it on Linux.

                    I don't see it going into the kernel (it is something that should probably remain in user-space as a library), but it might be something that could be included in distributions in the future after some further testing/porting.

                    Comment

                    Working...
                    X