Lczero Neural Network Chess Benchmarks With OpenCL Radeon vs. NVIDIA
Yesterday I posted a number of Lczero chess engine benchmarks on NVIDIA GPUs using its OpenCL back-end as well as its CUDA+cuDNN back-end, which offered massive performance gains compared to CL on the many tested NVIDIA GPUs. With the CUDA+cuDNN code performing so much better than OpenCL, some wondered whether NVIDIA was intentionally gimping their OpenCL performance. Well, here are results side-by-side now with Radeon GPUs on OpenCL.
With the interesting LCZero chess engine being available now by the Phoronix Test Suite, I carried out some additional tests of this chess engine powered by neural networks while running on a Ryzen Threadripper 2990WX and re-testing all of the NVIDIA cards with OpenCL as well as the Polaris/Vega Radeon cards while running ROCm 2.0.
From the earlier article has the CUDA+cuDNN results for those interested while this is just the OpenCL look. Additionally, a reference run was carried out with the LCZero BLAS back-end while making use of OpenBLAS on the Ryzen Threadripper 2990WX for seeing how the CPU-based performance compares for this open-source project.
The Radeon OpenCL performance with LCZero was coming in well short of expectations. Keep in mind the NVIDIA results were still done using OpenCL while if switching over to the CUDA+cuDNN back-end the results can be multiple times faster especially with FP16 on the NVIDIA Turing hardware, as shown in yesterday's article. The Radeon/Polaris data itself was rather odd given the positioning of the cards, either due to an LCZero and/or ROCm 2.0 shortcoming.
It's very well possible the LCZero code isn't as tuned as their CUDA code-base, but at least these results appear to show that NVIDIA isn't intentionally crippling their OpenCL driver to make CUDA look better, well, if they are the ROCm performance is just even more drastically behind. Regardless, at least OpenCL out of either GPU vendor is much faster than running this neural network chess benchmark on the CPU with OpenBLAS. Those wanting to try out LCZero on your own system can install the Phoronix Test Suite and run phoronix-test-suite benchmark lczero. Certainly lczero is most worthwhile for now with the CUDA+cuDNN back-end for drastically better performance.