The CUDA support isn't due to personal preference or something like that. It's just because the same code is roughly 40% slower with OpenCL vs. CUDA on Nvidia cards, and most users don't value open source ideals enough to justify such a slowdown. Also, CUDA kernels can be precompiled, while OpenCL takes about a minute to build them at runtime in the first place.
As for why OpenCL's feature set is lower in Cycles - well, AMDs drivers have been (and still are, afaik) broken in that regard since a few years. The Cycles kernel is pretty huge by GPGPU standards, and AMDs driver just refuses to compile it and crashes Blender instead. Since about a year, Cycles does work on AMD with modifications that split the kernel into multiple smaller ones so the driver doesn't crash, but that split code doesn't support some features yet due to the added complexity.
For benchmarking, it's possible to use OpenCL on Nvidia, both the split and full kernel (Nvidia drivers are more stable in that regard). To do so, run blender with "--debug-value 256", go to the new "Debug" panel in the Render settings and set the OpenCL kernel type to Split or Mega (= full kernel). Now, you can enable OpenCL in the User preferences, just as it works for CUDA.
A quick test on my system (GTX 780) takes 13.5sec with OpenCL split, 12.4sec with OpenCL Mega and 9.3sec with CUDA.
Also, the Pascal results are probably pretty bad currently because the work group size etc. isn't specified in the code for Pascal yet, so it just reuses the Maxwell settings. CUDA 8 also produces slower .cubins than 7.5, so if you used a build that was made with CUDA 7.5 (like the official releases) and added a Pascal kernel that was built with CUDA 8, the difference might just be the compiler.
As for why OpenCL's feature set is lower in Cycles - well, AMDs drivers have been (and still are, afaik) broken in that regard since a few years. The Cycles kernel is pretty huge by GPGPU standards, and AMDs driver just refuses to compile it and crashes Blender instead. Since about a year, Cycles does work on AMD with modifications that split the kernel into multiple smaller ones so the driver doesn't crash, but that split code doesn't support some features yet due to the added complexity.
For benchmarking, it's possible to use OpenCL on Nvidia, both the split and full kernel (Nvidia drivers are more stable in that regard). To do so, run blender with "--debug-value 256", go to the new "Debug" panel in the Render settings and set the OpenCL kernel type to Split or Mega (= full kernel). Now, you can enable OpenCL in the User preferences, just as it works for CUDA.
A quick test on my system (GTX 780) takes 13.5sec with OpenCL split, 12.4sec with OpenCL Mega and 9.3sec with CUDA.
Also, the Pascal results are probably pretty bad currently because the work group size etc. isn't specified in the code for Pascal yet, so it just reuses the Maxwell settings. CUDA 8 also produces slower .cubins than 7.5, so if you used a build that was made with CUDA 7.5 (like the official releases) and added a Pascal kernel that was built with CUDA 8, the difference might just be the compiler.
Comment