Hmm, also interesting: I got 31011133.23 average for the mandelGPU test. Although the test was not running at 1920x1080, but rather the default resolution, so that may account for the difference.
If the resolution isn't important though, as it sometimes isn't, then I'm getting about 1.5x the performance of the GTX 460 with the HD5970. This is more in line with what I was expecting.
Announcement
Collapse
No announcement yet.
Looking At The OpenCL Performance Of ATI & NVIDIA On Linux
Collapse
X
-
Hmm. On my HD5970 for SmallPT 1.6 GPU Caustic3, I'm getting 45200 KSamples/sec on the GPU, and ~16000 KSamples/sec on my Core i7 920. Neither part is overclocked; they're at their factory default clock rates.
The GPU number is lower than either of Michael's radeons, but still a ways faster than Michael's GT 240. The numbers seem unaffected by whether compiz is on. I find it hard to accept that a HD5970 gets poorer results than a 5770. Even if a HD5970 is two 5850 cores together, shouldn't even one of those cores single-handedly outperform a 5770? And wouldn't OpenCL have the smarts to use both cores automatically to make it nearly twice as fast?
I noticed something funky about the tests, though. When the test is running, the output visual says at the bottom something like 52000K samples/sec. This is substantially larger than the 45000 Ksamples/sec reported by PTS in the output. I'm not sure why such the large discrepancy. Bug in PTS? Bug in the test?
Either way, it seems (disappointingly) that a HD5970 is only 3 times faster at this test than a Core i7? It is probably more economical to use a bunch of CPUs than to use GPUs for this kind of workload, seeing how a Core i7 is much cheaper than a dual gpu HD5970. We already know from other tests that a GPU is many, many, many times faster than the CPU at OpenGL 3d rendering, so maybe the parts needed for general purpose GPGPU are kept to a modest level on Evergreen in order to support top-of-the-line 3d graphics. I'm not complaining, since I don't use GPGPU for anything other than PTS
Leave a comment:
-
MandelGPU might suffer from something similar (redefining __constant to __global), but I haven't checked.
Leave a comment:
-
Michael, please keep in mind that SmallPtGPU contains a bug/incompatibility that seriously limits performance on NVidia hardware, especially pre-Fermi.
Here's a diff that fixes it. This improves performance more than ten-fold on G80/GT200.
Leave a comment:
-
Great benchmark!
Thank you Michael!
It would be nice to compare it with some CPU. So we can actually see if low end cards make sense for that processing work.
I'm not sure which OpenCL CPU implementation is efficient and uses SSE2 etc. Maybe Intel has such implementation of the OpenCL compiler, or LLVM has some OpenCL frontend/parser.
Leave a comment:
-
do any of those benchmarks use double precision floating point?
(and as always: it would be nice if you could put error bars on the plots)
Leave a comment:
-
Originally posted by kernelOfTruth View Postah - thanks for the clarification !
it's just that the results look rather in favor of Nvidia
do you have a 5830 available (afaik that's a cypress LE) ?
Leave a comment:
-
Originally posted by Michael View PostNo access to such hardware...
it's just that the results look rather in favor of Nvidia
do you have a 5830 available (afaik that's a cypress LE) ?
Leave a comment:
-
What about the FirePro's? Could be fun to see if they are that much faster in OpenCL than the consumer cards.
Leave a comment:
Leave a comment: