Originally posted by bridgman
View Post
Announcement
Collapse
No announcement yet.
NVIDIA vs. AMD OpenCL Linux Benchmarks With Darktable 2.2
Collapse
X
-
Are these tests only FP32? If so, FP64 benchs would be interesting. I expect a reverse situation, Kepler being not bad in FP64 and AMD is mush better usually. Thank for these tests though, there are really interesting, Let's wait now AMD's new line coming for 2017.
Comment
-
WOriginally posted by dungeon View Post
He is using Debian, where amdgpu-pro isn't avalable and oss driver CL unusable... since some focuses are shifted around zero point
And benchmarking does not tells that so he get nothing in practice, from that POV he misses which isn't an answer... so misses nothing, double or even triple nothing
Comment
-
If OpenCL were a priority to me, AMD seems like a great choice. The 470 is pretty competitive. You could buy 3 of those for the price of a single 1080. Depending on what/how you intend to process your data, you wouldn't have to worry about anything like crossfire, so you'd get a pretty fast system.
Comment
-
Originally posted by L_A_G View PostThe GTX 680, 760, 780 TI, 950, 960, 970 and 980 all finishing within margin of error in the Masskrug test is pretty intriguing. What's going on there? It looks like they're all limited by some hardware resource other than the traditional available compute units or memory bandwidth.
My guess would be that threads are stalling due to the special function units (which in CUDA-based GPUs are separate from the CUDA cores and much fewer in number) are being used to their capacity and threads are having to wait to use them. If that's the case, then there's probably some optimization work that could be done for much improved performance as a GPU with 2048 cores and a 256 lane wide memory interface should not perform within margin of error of a card with 768 of the same cores and a 128 lane wide memory interface both working at roughly the same clock rate.
Comment
-
Originally posted by sdack View PostAgreed. There is obviously an issue here, but it is impossible to tell if it is within Darktable, the OpenCL API or the driver. The benchmark is evidence of a problem, but that's about it. Titling it "Nvidia vs. AMD" was premature when a 780 Ti outperforms a 980, and both are being outperformed by a factor of 3 by a 1060.
Last year I had a job at my university where got to work on optimizing an HPC application using CUDA. After making some good optimizations I found that I was getting roughly the same performance on both a GTX 970 and 680. Scratched my head for a while until my professor gave me the advice that I should try to avoid using modulo (%) all that much because it's apparently pretty expensive to use on GPUs. I had been using modulo for threads to figure out what part of the compute job was their task and after changing it so that the same effect was achieved with the regular arithmetic operations of division, multiplication and subtraction I found that the time to do one pretty big job had been reduced by about half on the GTX 970 and 1/4 on the 680.
Not 100% sure if my assessment of what happened was correct, but my guess would be that I was running up against the maximum capacity of the special function units as CUDA cores don't seem to be capable of doing modulo in hardware and the compiler doesn't seem to know how to or just won't how transform those modulo operations into regular arithmetic operations.
Comment
Comment