Announcement

Collapse
No announcement yet.

20-Way NVIDIA/AMD GPU Darktable OpenCL Photography Performance

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Really happy to see AMD cards performing so well.

    Comment


    • #22
      Originally posted by twoertwein View Post
      it would be nice to have a comparison of some Intel CPUs vs. their GPUs with beignet
      This is the "boat" one on a Broadwell laptop with latest Beignet:

      Code:
      *GPU*: [dev_process_export] pixel pipeline processing took *64,495* secs (14,497 CPU)
      
      *CPU*: [dev_process_export] pixel pipeline processing took *43,410* secs (168,420 CPU)
      
      model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
      model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
      model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
      model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
      
      00:02.0 VGA compatible controller: Intel Corporation HD Graphics 5500 (rev 09)
      
      Linux arch-laptop 4.8.10-1-ARCH #1 SMP PREEMPT Mon Nov 21 11:55:43 CET 2016 x86_64 GNU/Linux
      Last edited by darkbasic; 11-28-2016, 01:04 PM.
      ## VGA ##
      AMD: X1950XTX, HD3870, HD5870
      Intel: GMA45, HD3000 (Core i5 2500K)

      Comment


      • #23
        darkbasic thanks! I have a similar experience on my HD 4000. Maybe it is faster for Intels' Iris graphic cards with L4 cache?

        Tuxee if OpenCL is working in other application, you might wan to open an issue on darktable

        Comment


        • #24
          Originally posted by darkbasic View Post
          This is the "boat" one on a Broadwell laptop with latest Beignet:
          Lol, faster on CPU.... I hope it's just because the OpenCL driver for Intel is immature.

          Comment


          • #25
            Originally posted by starshipeleven View Post
            Lol, faster on CPU.... I hope it's just because the OpenCL driver for Intel is immature.
            I don't agree about the lack of maturit of Beignet.

            Those intel CPU&IGP share the same bandwidth, nothing to win on this side. The desktop Core i5 i7 have 4x AVX vector operation wich are very capable. Those same chips have entry-level IGP.
            Take a laptop GPU with an Iris IGP and the story is different as they have half CPU cores and twice GPU cores, i.e. 4x more processing power. Everything is logic.

            Comment


            • #26
              Originally posted by kieffer View Post
              Those intel CPU&IGP share the same bandwidth, nothing to win on this side. The desktop Core i5 i7 have 4x AVX vector operation wich are very capable. Those same chips have entry-level IGP.
              entry level my ass, the Intel HD Graphics is entry level and it's in pentium/celerons.

              Also it is a GPU, a coprocessor that is designed for pure parallel loads like rendering. I don't know what loads it gets with OpenCL, but I thought it was meant for parallel loads as it targets GPUs.

              Take a laptop GPU with an Iris IGP and the story is different as they have half CPU cores and twice GPU cores, i.e. 4x more processing power. Everything is logic.
              \this still does not tell me why a relatively decent iGPU is getting beaten by a CPU on a load where the GPU should do better by design.

              Comment


              • #27
                A few figures for the IGPs of Skylake:
                * HD Graphics 510: 12 units
                * HD Graphics 515/520/530 24 units
                * Iris Graphics 540/550: 48 units + 64 Mo L4 cache
                * Iris Pro Graphics 580: 72 units + 64 / 128 Mo L4 cache.

                So yes, I dare writing again the 520 is an entry-level IGP, which shares the same bandwidth with the 4 CPU cores associated.
                I compared a core i7 67xx and confirm the performances measured on CPU are the same as the one on GPU (for the OpenCL application I am developing).
                On the opposite the Iris on a macbook-pro (for example) outperforms the 2 CPU cores.

                Most OpenCL kernel have been written with discrete GPU in mind (including mine), GPUs have 300GB/s bandwidth (sorry, the TitanX is my usual target), so with only 30GB/s they may not be in optimal conditions.

                Comment


                • #28
                  Originally posted by kieffer View Post
                  A few figures for the IGPs of Skylake:
                  * HD Graphics 510: 12 units
                  * HD Graphics 515/520/530 24 units
                  * Iris Graphics 540/550: 48 units + 64 Mo L4 cache
                  * Iris Pro Graphics 580: 72 units + 64 / 128 Mo L4 cache.

                  So yes, I dare writing again the 520 is an entry-level IGP, which shares the same bandwidth with the 4 CPU cores associated.
                  I know that iGPUs are all garbage if compared with a dedicated GPU, but the entry-level iGPU is the 510 or Intel HD graphics in older chips.

                  520 and friends is midrange, Iris is high end.

                  And all iGPUs share bandwith with CPU (also the CPU can use the L4 cache).

                  I compared a core i7 67xx and confirm the performances measured on CPU are the same as the one on GPU (for the OpenCL application I am developing).
                  On the opposite the Iris on a macbook-pro (for example) outperforms the 2 CPU cores.
                  This only shows relative power of GPUs with current driver, does not tell anything on the driver's quality.

                  Most OpenCL kernel have been written with discrete GPU in mind (including mine), GPUs have 300GB/s bandwidth (sorry, the TitanX is my usual target), so with only 30GB/s they may not be in optimal conditions.
                  Yeah, not doubting that most OCL applications are targeting decent GPUs, I'm doubting that current Intel driver is mature enough to fully utilize the hardware.

                  Comment


                  • #29
                    Originally posted by starshipeleven View Post
                    This only shows relative power of GPUs with current driver, does not tell anything on the driver's quality.
                    On the macbook-pro 13", which features a core i5-4308U I compared OpenCL on MacOS vs Linux.
                    * linux the GPU driver is Beignet, the CPU driver being AMD, intel and pocl.
                    * OSX, both drivers are all labelled Apple.

                    Intel drivers on linux outperforms OSX on CPU (Apple driver are really bad on CPU, moreover many features are not available)
                    AMD has decent performances, it is also the most robust on the CPU. Pocl is the slowest on linux but still slightly faster than Apple's implementation.

                    On GPU Beignet beats Apple's implementation again, but not by much. This allows me to state that beignet is getting most of the juice out of the gen7.5 GPU.

                    My benchmark is (basically) a map operation, 1/3 of the time and a sparse-matrix-sense-vector-multiplication using a CSR representation the other 2/3 of the time, as described in this article: https://arxiv.org/pdf/1412.6367v1.pdf


                    Comment

                    Working...
                    X