Announcement

Collapse
No announcement yet.

RadeonSI/R600g Mesa 11.2-devel Clover OpenCL Benchmarks On Linux 4.5

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RadeonSI/R600g Mesa 11.2-devel Clover OpenCL Benchmarks On Linux 4.5

    Phoronix: RadeonSI/R600g Mesa 11.2-devel Clover OpenCL Benchmarks On Linux 4.5

    Following this morning's article about Russian Super-Computing Users Get Tired Of Catalyst, Start Looking At Open-Source AMD, I decided to run some fresh Radeon open-source OpenCL benchmarks on my own using the Gallium3D Clover state tracker with the HPC researchers also being curious how this very latest open-source AMD graphics stack is performing. Here are some initial results with Mesa 11.2-devel Git built against LLVM 3.9 SVN (thanks Padoka!) and using the Linux 4.5 Git kernel...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    I added beignet git (compiled against llvm 3.7 for now) some time ago to my padoka PPA, so if you want to test how well it performs, its in the ppa for the taking.

    Right now, beignet has a bug when running with haswell and doubles (i notified the beignet developers), but it should work really well with GEN8+ cpus (7.5 haswell will work, but it might segfault).

    Beignet is also being compiled with opencl mesa support, so it should show up in the clinfo and related tools.

    happy benchmarking

    Comment


    • #3
      Unfortunately, our other common OpenCL tests like LuxMark, JuliaGPU,SmallPT-GPU, etc, still fail currently on the open-source Clover Gallium3D driver stack.

      When will Clover become useful? I'm starting to lose hope.
      ## VGA ##
      AMD: X1950XTX, HD3870, HD5870
      Intel: GMA45, HD3000 (Core i5 2500K)

      Comment


      • #4
        The latest pocl release also seems to be able to run on top of HSA. I have no idea how well it works or how much of the OpenCL standard is implemented, but it might be interesting to see a comparison, if you are looking for other things to benchmark.

        Comment


        • #5
          I tryed the test on fedora, but it fails with this error.

          checking OpenCL/opencl.h usability... no
          checking OpenCL/opencl.h presence... no
          checking for OpenCL/opencl.h... no
          checking CL/opencl.h usability... yes
          checking CL/opencl.h presence... yes
          checking for CL/opencl.h... yes
          checking for usable OpenCL library... no
          configure: error:

          I don't know what dependency is missing. But I installed clpeak and I did the test. There seams to be a bug on ARUBA, because on TURKS it works.

          bash-4.3$ clpeak

          Platform: Clover
          Device: AMD ARUBA (DRM 2.43.0, LLVM 3.7.0)
          Driver version : 11.1.0 (Linux x64)
          Compute units : 6
          Clock frequency : 0 MHz
          Build Log: input.cl:48:2009: warning: null character ignored
          unsupported call to function __floatunsidf in compute_dp_v1

          Device: AMD TURKS (DRM 2.43.0, LLVM 3.7.0)
          Driver version : 11.1.0 (Linux x64)
          Compute units : 6
          Clock frequency : 500 MHz

          Global memory bandwidth (GBPS)
          float : 23.26
          float2 : 23.83
          float4 : 23.10
          float8 : 14.28
          float16 : 8.11

          Single-precision compute (GFLOPS)
          float : 94.50
          float2 : 185.05
          float4 : 189.67
          float8 : 217.56
          float16 : 261.36

          No double precision support! Skipped

          Integer compute (GIOPS)
          int : 47.39
          int2 : 94.34
          int4 : 92.86
          int8 : 93.47
          int16 : 93.81

          Transfer bandwidth (GBPS)
          enqueueWriteBuffer : 1.50
          enqueueReadBuffer : 0.87
          enqueueMapBuffer(for read) : 1.66
          memcpy from mapped ptr : 1.83
          enqueueUnmap(after write) : 700.69
          memcpy to mapped ptr : 1.50

          Kernel launch latency : 455.13 us


          Platform: Portable Computing Language
          Device: pthread-AMD A8-4555M APU with Radeon(tm) HD Graphics
          Driver version : 0.12 (Linux x64)
          Compute units : 4
          Clock frequency : 1600 MHz

          Global memory bandwidth (GBPS)
          float : 4.79
          float2 : 5.39
          float4 : 5.72
          float8 : 6.90
          float16 : 7.93

          Single-precision compute (GFLOPS)
          float : 3.87
          float2 : 1.33
          float4 : 4.14
          float8 : 8.41
          float16 : 15.58

          Transfer bandwidth (GBPS)
          enqueueWriteBuffer : 2.60
          enqueueReadBuffer : 1.97
          enqueueMapBuffer(for read) : 72550.12
          memcpy from mapped ptr : 1.92
          enqueueUnmap(after write) : 162688.16
          memcpy to mapped ptr : 1.87

          Kernel launch latency : 91.65 us

          Last edited by boffo; 02 February 2016, 04:45 PM.

          Comment


          • #6
            Originally posted by boffo View Post
            I tryed the test on fedora, but it fails with this error.

            checking OpenCL/opencl.h usability... no
            checking OpenCL/opencl.h presence... no
            checking for OpenCL/opencl.h... no
            checking CL/opencl.h usability... yes
            checking CL/opencl.h presence... yes
            checking for CL/opencl.h... yes
            checking for usable OpenCL library... no
            configure: error:

            I don't know what dependency is missing. But I installed clpeak and I did the test.
            It seems you are missing the opencl-headers package.
            Originally posted by boffo View Post
            There seams to be a bug on ARUBA, because on TURKS it works.
            Interestingly, according to the article it fails on CAYMAN too. Both ARUBA and CAYMAN are VLIW4, while the others are VLIW5 or GCN. Maybe this issue is specific to VLIW4.

            Comment


            • #7
              Originally posted by chithanh View Post
              It seems you are missing the opencl-headers package.
              No, the opencl-headers package is installed.

              Comment


              • #8
                This clpeak tool is neat, I noticed it was not on the AUR, so I just threw it up there if anyone on Arch wants to try it and doesn't want to pull down the whole PTS to do it:

                Comment


                • #9
                  Originally posted by groo_pcd View Post
                  I added beignet git (compiled against llvm 3.7 for now) some time ago to my padoka PPA, so if you want to test how well it performs, its in the ppa for the taking.

                  Right now, beignet has a bug when running with haswell and doubles (i notified the beignet developers), but it should work really well with GEN8+ cpus (7.5 haswell will work, but it might segfault).

                  Beignet is also being compiled with opencl mesa support, so it should show up in the clinfo and related tools.

                  happy benchmarking
                  Beignet works really well for me with IVB. There is an issue, though:

                  The i915 hangcheck falsely identifies some OpenCL workloads as a hung GPU and trigger reset which causes that CL kernel to abort.

                  echo N >/sys/module/i915/parameters/enable_hangcheck does work around it.

                  Comment


                  • #10
                    Here is the result running clpeak on AMD A107400P laptop with Dual Radeon GPU: interesting that the lower-end GPU is AMD HAINAN, a SI architecture acting as a link for Kaveri. I noticed freedesktop.org show that Hybrid support is mostly but I haven't seen a change compared to running Radeon Crimson driver on Windows.

                    Code:
                    $ clpeak
                    
                    Platform: Clover
                      Device: AMD KAVERI (DRM 2.43.0, LLVM 3.7.0)
                        Driver version  : 11.1.0 (Linux x64)
                        Compute units   : 6
                        Clock frequency : 654 MHz
                    
                        Global memory bandwidth (GBPS)
                          float   : 18.12
                          float2  : 18.77
                          float4  : 18.95
                          float8  : 13.79
                          float16 : 8.37
                    
                        Single-precision compute (GFLOPS)
                          float   : 474.30
                          float2  : 477.80
                          float4  : 473.74
                          float8  : 473.69
                          float16 : 470.87
                    
                        Double-precision compute (GFLOPS)
                          double   : 31.27
                          double2  : 31.21
                          double4  : 31.08
                          double8  : 30.28
                          double16 : 30.57
                    
                        Integer compute (GIOPS)
                          int   : 94.68
                          int2  : 99.22
                          int4  : 96.05
                          int8  : 98.66
                          int16 : 98.01
                    
                        Transfer bandwidth (GBPS)
                          enqueueWriteBuffer         : 3.61
                          enqueueReadBuffer          : 2.73
                          enqueueMapBuffer(for read) : 1456.51
                            memcpy from mapped ptr   : 2.84
                          enqueueUnmap(after write)  : 1203.75
                            memcpy to mapped ptr     : 2.81
                    
                        Kernel launch latency : 174.12 us
                    
                      Device: AMD HAINAN (DRM 2.43.0, LLVM 3.7.0)
                        Driver version  : 11.1.0 (Linux x64)
                        Compute units   : 5
                        Clock frequency : 855 MHz
                    
                        Global memory bandwidth (GBPS)
                          float   : 11.03
                          float2  : 11.54
                          float4  : 11.57
                          float8  : 11.73
                          float16 : 7.66
                    
                        Single-precision compute (GFLOPS)
                          float   : 515.12
                          float2  : 513.40
                          float4  : 519.06
                          float8  : 514.49
                          float16 : 507.25
                    
                        Double-precision compute (GFLOPS)
                          double   : 33.72
                          double2  : 33.70
                          double4  : 33.67
                          double8  : 33.59
                          double16 : 33.44
                    
                        Integer compute (GIOPS)
                          int   : 107.02
                          int2  : 107.01
                          int4  : 106.92
                          int8  : 106.99
                          int16 : 106.84
                    
                        Transfer bandwidth (GBPS)
                          enqueueWriteBuffer         : 4.06
                          enqueueReadBuffer          : 2.78
                          enqueueMapBuffer(for read) : 1356.42
                            memcpy from mapped ptr   : 2.73
                          enqueueUnmap(after write)  : 1552.55
                            memcpy to mapped ptr     : 2.73
                    
                        Kernel launch latency : 173.02 us

                    Comment

                    Working...
                    X