Announcement

Collapse
No announcement yet.

20-Way NVIDIA/AMD GPU Darktable OpenCL Photography Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • kieffer
    replied
    Originally posted by starshipeleven View Post
    This only shows relative power of GPUs with current driver, does not tell anything on the driver's quality.
    On the macbook-pro 13", which features a core i5-4308U I compared OpenCL on MacOS vs Linux.
    * linux the GPU driver is Beignet, the CPU driver being AMD, intel and pocl.
    * OSX, both drivers are all labelled Apple.

    Intel drivers on linux outperforms OSX on CPU (Apple driver are really bad on CPU, moreover many features are not available)
    AMD has decent performances, it is also the most robust on the CPU. Pocl is the slowest on linux but still slightly faster than Apple's implementation.

    On GPU Beignet beats Apple's implementation again, but not by much. This allows me to state that beignet is getting most of the juice out of the gen7.5 GPU.

    My benchmark is (basically) a map operation, 1/3 of the time and a sparse-matrix-sense-vector-multiplication using a CSR representation the other 2/3 of the time, as described in this article: https://arxiv.org/pdf/1412.6367v1.pdf


    Leave a comment:


  • starshipeleven
    replied
    Originally posted by kieffer View Post
    A few figures for the IGPs of Skylake:
    * HD Graphics 510: 12 units
    * HD Graphics 515/520/530 24 units
    * Iris Graphics 540/550: 48 units + 64 Mo L4 cache
    * Iris Pro Graphics 580: 72 units + 64 / 128 Mo L4 cache.

    So yes, I dare writing again the 520 is an entry-level IGP, which shares the same bandwidth with the 4 CPU cores associated.
    I know that iGPUs are all garbage if compared with a dedicated GPU, but the entry-level iGPU is the 510 or Intel HD graphics in older chips.

    520 and friends is midrange, Iris is high end.

    And all iGPUs share bandwith with CPU (also the CPU can use the L4 cache).

    I compared a core i7 67xx and confirm the performances measured on CPU are the same as the one on GPU (for the OpenCL application I am developing).
    On the opposite the Iris on a macbook-pro (for example) outperforms the 2 CPU cores.
    This only shows relative power of GPUs with current driver, does not tell anything on the driver's quality.

    Most OpenCL kernel have been written with discrete GPU in mind (including mine), GPUs have 300GB/s bandwidth (sorry, the TitanX is my usual target), so with only 30GB/s they may not be in optimal conditions.
    Yeah, not doubting that most OCL applications are targeting decent GPUs, I'm doubting that current Intel driver is mature enough to fully utilize the hardware.

    Leave a comment:


  • kieffer
    replied
    A few figures for the IGPs of Skylake:
    * HD Graphics 510: 12 units
    * HD Graphics 515/520/530 24 units
    * Iris Graphics 540/550: 48 units + 64 Mo L4 cache
    * Iris Pro Graphics 580: 72 units + 64 / 128 Mo L4 cache.

    So yes, I dare writing again the 520 is an entry-level IGP, which shares the same bandwidth with the 4 CPU cores associated.
    I compared a core i7 67xx and confirm the performances measured on CPU are the same as the one on GPU (for the OpenCL application I am developing).
    On the opposite the Iris on a macbook-pro (for example) outperforms the 2 CPU cores.

    Most OpenCL kernel have been written with discrete GPU in mind (including mine), GPUs have 300GB/s bandwidth (sorry, the TitanX is my usual target), so with only 30GB/s they may not be in optimal conditions.

    Leave a comment:


  • starshipeleven
    replied
    Originally posted by kieffer View Post
    Those intel CPU&IGP share the same bandwidth, nothing to win on this side. The desktop Core i5 i7 have 4x AVX vector operation wich are very capable. Those same chips have entry-level IGP.
    entry level my ass, the Intel HD Graphics is entry level and it's in pentium/celerons.

    Also it is a GPU, a coprocessor that is designed for pure parallel loads like rendering. I don't know what loads it gets with OpenCL, but I thought it was meant for parallel loads as it targets GPUs.

    Take a laptop GPU with an Iris IGP and the story is different as they have half CPU cores and twice GPU cores, i.e. 4x more processing power. Everything is logic.
    \this still does not tell me why a relatively decent iGPU is getting beaten by a CPU on a load where the GPU should do better by design.

    Leave a comment:


  • kieffer
    replied
    Originally posted by starshipeleven View Post
    Lol, faster on CPU.... I hope it's just because the OpenCL driver for Intel is immature.
    I don't agree about the lack of maturit of Beignet.

    Those intel CPU&IGP share the same bandwidth, nothing to win on this side. The desktop Core i5 i7 have 4x AVX vector operation wich are very capable. Those same chips have entry-level IGP.
    Take a laptop GPU with an Iris IGP and the story is different as they have half CPU cores and twice GPU cores, i.e. 4x more processing power. Everything is logic.

    Leave a comment:


  • starshipeleven
    replied
    Originally posted by darkbasic View Post
    This is the "boat" one on a Broadwell laptop with latest Beignet:
    Lol, faster on CPU.... I hope it's just because the OpenCL driver for Intel is immature.

    Leave a comment:


  • twoertwein
    replied
    darkbasic thanks! I have a similar experience on my HD 4000. Maybe it is faster for Intels' Iris graphic cards with L4 cache?

    Tuxee if OpenCL is working in other application, you might wan to open an issue on darktable

    Leave a comment:


  • darkbasic
    replied
    Originally posted by twoertwein View Post
    it would be nice to have a comparison of some Intel CPUs vs. their GPUs with beignet
    This is the "boat" one on a Broadwell laptop with latest Beignet:

    Code:
    *GPU*: [dev_process_export] pixel pipeline processing took [B]*64,495*[/B] secs (14,497 CPU)
    
    *CPU*: [dev_process_export] pixel pipeline processing took [B]*43,410*[/B] secs (168,420 CPU)
    
    model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
    model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
    model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
    model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
    
    00:02.0 VGA compatible controller: Intel Corporation HD Graphics 5500 (rev 09)
    
    Linux arch-laptop 4.8.10-1-ARCH #1 SMP PREEMPT Mon Nov 21 11:55:43 CET 2016 x86_64 GNU/Linux
    Last edited by darkbasic; 28 November 2016, 01:04 PM.

    Leave a comment:


  • boxerab
    replied
    Really happy to see AMD cards performing so well.

    Leave a comment:


  • Tuxee
    replied
    Originally posted by twoertwein View Post

    strange, why should darktable care about the type of GPU? As far as I know, darktable tests whether the GPU has enough memory and whether it is fast enough. Based on that it may disable OpenCL support. I have an old Intel HD 4000, which is slower than the CPU
    That's the output:

    [opencl_init] opencl library 'libOpenCL.so.1' found on your system and loaded
    unknown GPU generation
    [opencl_init] found 1 platform
    [opencl_init] could not get device id size: -1
    [opencl_init] found 0 device
    [opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.

    Leave a comment:

Working...
X