Is there an intrinsic reason why OpenCL was so much slower?
Some possibilities I have considered:
* OpenCL is slower on this NVIDIA hardware because less effort has gone into optimizing the drivers, firmware & software
* OpenCL is slower on this hardware because the OpenCL standard itself is less able to take advantage of the hardware features than CUDA is
* OpenCL is slower on this hardware because NVIDIA have deliberately limited it in order to push their proprietary CUDA standard
Announcement
Collapse
No announcement yet.
LCZero Chess Engine Performance With OpenCL vs. CUDA + cuDNN vs. FP16 With Tensor Cores
Collapse
X
-
LCZero Chess Engine Performance With OpenCL vs. CUDA + cuDNN vs. FP16 With Tensor Cores
Phoronix: LCZero Chess Engine Performance With OpenCL vs. CUDA + cuDNN vs. FP16 With Tensor Cores
A Phoronix reader pointed out LCZero (Leela Chess Zero) a few days ago as an interesting chess engine powered by neural networks and supports BLAS, OpenCL, and NVIDIA CUDA+cuDNN back-ends. Particularly with the FP16 cuDNN support, this chess engine can be super fast on NVIDIA's latest Turing GPUs with tensor cores...
Tags: None
Leave a comment: