Announcement

Collapse
No announcement yet.

Lczero Neural Network Chess Benchmarks With OpenCL Radeon vs. NVIDIA

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lczero Neural Network Chess Benchmarks With OpenCL Radeon vs. NVIDIA

    Phoronix: Lczero Neural Network Chess Benchmarks With OpenCL Radeon vs. NVIDIA

    Yesterday I posted a number of Lczero chess engine benchmarks on NVIDIA GPUs using its OpenCL back-end as well as its CUDA+cuDNN back-end, which offered massive performance gains compared to CL on the many tested NVIDIA GPUs. With the CUDA+cuDNN code performing so much better than OpenCL, some wondered whether NVIDIA was intentionally gimping their OpenCL performance. Well, here are results side-by-side now with Radeon GPUs on OpenCL...

    http://www.phoronix.com/scan.php?pag...-Radeon-NVIDIA

  • #2
    the RX Vega 64 is slower than RX 590
    this code must be a masterpiece^^


    Comment


    • #3
      And rocm2 beeing slow, this is with a vega56 and the amdgpu-pro opencl backend:
      https://openbenchmarking.org/result/...SK-VEGA5685245
      Last edited by ObiWan; 01-15-2019, 08:15 AM.

      Comment


      • #4
        That's a factor two. Rocm opencl really needs some tuning.

        Comment


        • #5
          For CPU backend, I think you can link to intel mkl-dnn lib (instead of OpenBLAS) to get much better performance.
          Search for mkl-dnn in github
          Also, I assume you are already running it with enough no. of threads (e.g: --threads=32).

          Comment


          • #6
            Whoever wrote their OpenCL stack should be shot. It reminds me of how piss poor Blender's monolithic stack was before AMD rewrote it into a split stack and threaded it properly.

            Comment


            • #7
              Originally posted by ObiWan View Post
              And rocm2 beeing slow, this is with a vega56 and the amdgpu-pro opencl backend:
              https://openbenchmarking.org/result/...SK-VEGA5685245
              It isn't just ROCm. Most of these slowness comes from the client's poor coding and knowledge of OpenCL as well.

              Comment


              • #8
                Originally posted by Marc Driftmeyer View Post

                It isn't just ROCm. Most of these slowness comes from the client's poor coding and knowledge of OpenCL as well.
                Additionally I don't even think it is a good idea to use OpenCL directly rather than building the AI engine on top of pytorch/tensorflow.
                Last edited by zxy_thf; 01-15-2019, 08:49 PM.

                Comment


                • #9
                  Originally posted by zxy_thf View Post
                  Additionally I don't even think it is a good idea to use OpenCL directly rather than building the AI engine on top of pytorch/tensorflow.
                  I thought neither TensorFlow nor PyTorch worked with OpenCL?

                  Comment


                  • #10
                    Thanks, very interesting test!
                    So getting a 2060 is now probably the way to go to get a really strong chess engine. lczero just got the second place at the TCEC computer chess tournament, behind stockfish.
                    But with a 2060 you would need to get quite some expensive hardware to make Stockfish a competitor. Would need to find some numbers for stockfish speed depending on the number of cores, but currently at a ratio of 1:1000 in nodes per second lczero is most likely stronger than stockfish. That being said stockfish would be still very useful for deep tactical analysis problems.

                    Comment

                    Working...
                    X