LCZero Chess Engine Performance With OpenCL vs. CUDA + cuDNN vs. FP16 With Tensor Cores
With LCZero's build process being sane for its different back-ends and the program turning out to be benchmark-friendly and meeting my requirements, it's now available via the Phoronix Test Suite with a simple phoronix-test-suite benchmark lczero (granted, the back-end support may obviously vary depending upon your hardware/driver support) and more details over on OpenBenchmarking.org.
Given its back-end coverage, I set out this weekend testing up various Maxwell/Pascal/Turing GPUs I had available. Here are those initial numbers. Tests compared to Radeon GPUs with OpenCL will be coming in the next few days.
With the OpenCL back-end for LCZero, the GeForce RTX Turing GPUs were already performing quite well in relation to the GeForce 900 Maxwell and GeForce 1000 Pascal graphics cards.
When switching over to the CUDA + cuDNN back-end, the performance for all of the GPUs at least doubled in comparison to the OpenCL back-end. In the case of the TITAN RTX, its performance was 2.35x the OpenCL back-end.
The CUDA FP16 back-end was only working for the Turing GPUs, but there even the data speaks volumes. The TITAN RTX was 2.2x the speed of the conventional CUDA back-end compared to this FP16 support that was able to utilize Turing's tensor cores. In the case of the RTX 2060, the performance was nearly 2.9x the speed of the standard CUDA back-end or 6.7x in relation to the OpenCL back-end.
More benchmarks of LCZero will be coming up in some future articles on Phoronix with this OpenCL/CUDA/BLAS benchmark now being available via the Phoronix Test Suite.