Really happy to see AMD cards performing so well.
Announcement
Collapse
No announcement yet.
20-Way NVIDIA/AMD GPU Darktable OpenCL Photography Performance
Collapse
X
-
Originally posted by twoertwein View Postit would be nice to have a comparison of some Intel CPUs vs. their GPUs with beignet
Code:*GPU*: [dev_process_export] pixel pipeline processing took [B]*64,495*[/B] secs (14,497 CPU) *CPU*: [dev_process_export] pixel pipeline processing took [B]*43,410*[/B] secs (168,420 CPU) model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 5500 (rev 09) Linux arch-laptop 4.8.10-1-ARCH #1 SMP PREEMPT Mon Nov 21 11:55:43 CET 2016 x86_64 GNU/Linux
Last edited by darkbasic; 28 November 2016, 01:04 PM.## VGA ##
AMD: X1950XTX, HD3870, HD5870
Intel: GMA45, HD3000 (Core i5 2500K)
Comment
-
Originally posted by starshipeleven View PostLol, faster on CPU.... I hope it's just because the OpenCL driver for Intel is immature.
Those intel CPU&IGP share the same bandwidth, nothing to win on this side. The desktop Core i5 i7 have 4x AVX vector operation wich are very capable. Those same chips have entry-level IGP.
Take a laptop GPU with an Iris IGP and the story is different as they have half CPU cores and twice GPU cores, i.e. 4x more processing power. Everything is logic.
Comment
-
Originally posted by kieffer View PostThose intel CPU&IGP share the same bandwidth, nothing to win on this side. The desktop Core i5 i7 have 4x AVX vector operation wich are very capable. Those same chips have entry-level IGP.
Also it is a GPU, a coprocessor that is designed for pure parallel loads like rendering. I don't know what loads it gets with OpenCL, but I thought it was meant for parallel loads as it targets GPUs.
Take a laptop GPU with an Iris IGP and the story is different as they have half CPU cores and twice GPU cores, i.e. 4x more processing power. Everything is logic.
Comment
-
A few figures for the IGPs of Skylake:
* HD Graphics 510: 12 units
* HD Graphics 515/520/530 24 units
* Iris Graphics 540/550: 48 units + 64 Mo L4 cache
* Iris Pro Graphics 580: 72 units + 64 / 128 Mo L4 cache.
So yes, I dare writing again the 520 is an entry-level IGP, which shares the same bandwidth with the 4 CPU cores associated.
I compared a core i7 67xx and confirm the performances measured on CPU are the same as the one on GPU (for the OpenCL application I am developing).
On the opposite the Iris on a macbook-pro (for example) outperforms the 2 CPU cores.
Most OpenCL kernel have been written with discrete GPU in mind (including mine), GPUs have 300GB/s bandwidth (sorry, the TitanX is my usual target), so with only 30GB/s they may not be in optimal conditions.
Comment
-
Originally posted by kieffer View PostA few figures for the IGPs of Skylake:
* HD Graphics 510: 12 units
* HD Graphics 515/520/530 24 units
* Iris Graphics 540/550: 48 units + 64 Mo L4 cache
* Iris Pro Graphics 580: 72 units + 64 / 128 Mo L4 cache.
So yes, I dare writing again the 520 is an entry-level IGP, which shares the same bandwidth with the 4 CPU cores associated.
520 and friends is midrange, Iris is high end.
And all iGPUs share bandwith with CPU (also the CPU can use the L4 cache).
I compared a core i7 67xx and confirm the performances measured on CPU are the same as the one on GPU (for the OpenCL application I am developing).
On the opposite the Iris on a macbook-pro (for example) outperforms the 2 CPU cores.
Most OpenCL kernel have been written with discrete GPU in mind (including mine), GPUs have 300GB/s bandwidth (sorry, the TitanX is my usual target), so with only 30GB/s they may not be in optimal conditions.
Comment
-
Originally posted by starshipeleven View PostThis only shows relative power of GPUs with current driver, does not tell anything on the driver's quality.
* linux the GPU driver is Beignet, the CPU driver being AMD, intel and pocl.
* OSX, both drivers are all labelled Apple.
Intel drivers on linux outperforms OSX on CPU (Apple driver are really bad on CPU, moreover many features are not available)
AMD has decent performances, it is also the most robust on the CPU. Pocl is the slowest on linux but still slightly faster than Apple's implementation.
On GPU Beignet beats Apple's implementation again, but not by much. This allows me to state that beignet is getting most of the juice out of the gen7.5 GPU.
My benchmark is (basically) a map operation, 1/3 of the time and a sparse-matrix-sense-vector-multiplication using a CSR representation the other 2/3 of the time, as described in this article: https://arxiv.org/pdf/1412.6367v1.pdf
- Likes 1
Comment
Comment