AMD Ryzen 3 3200G, openSUSE Leap 15.2.
With amdgpu-pro OpenCL headless driver:
Code:
:~> clpeak Platform: AMD Accelerated Parallel Processing Device: gfx902 Driver version : 3180.7 (PAL,HSAIL) (Linux x64) Compute units : 8 Clock frequency : 1250 MHz Global memory bandwidth (GBPS) float : 42.63 float2 : 44.12 float4 : 45.13 float8 : 46.01 float16 : 45.59 Single-precision compute (GFLOPS) float : 1252.46 float2 : 1251.55 float4 : 1249.05 float8 : 1238.10 float16 : 1225.67 half-precision compute (GFLOPS) half : 1250.89 half2 : 2421.75 half4 : 2409.56 half8 : 2381.20 half16 : 2237.11 Double-precision compute (GFLOPS) double : 79.14 double2 : 79.10 double4 : 79.01 double8 : 78.85 double16 : 78.41 Integer compute (GIOPS) int : 252.97 int2 : 252.96 int4 : 252.96 int8 : 252.96 int16 : 252.94 Transfer bandwidth (GBPS) enqueueWriteBuffer : 42.38 enqueueReadBuffer : 14.94 enqueueMapBuffer(for read) : 35320.46 memcpy from mapped ptr : 14.29 enqueueUnmap(after write) : 78951.60 memcpy to mapped ptr : 14.34 Kernel launch latency : 48.96 us
Code:
:~> clpeak Platform: AMD Accelerated Parallel Processing Device: gfx902+xnack Driver version : 3212.0 (HSA1.1,LC) (Linux x64) Compute units : 11 Clock frequency : 1250 MHz Global memory bandwidth (GBPS) float : 12.66 float2 : 12.75 float4 : 12.76 float8 : 12.74 float16 : 12.20 Single-precision compute (GFLOPS) float : 202.43 float2 : 202.28 float4 : 201.77 float8 : 200.75 float16 : 198.48 half-precision compute (GFLOPS) half : 202.33 half2 : 396.88 half4 : 394.08 half8 : 389.46 half16 : 384.83 Double-precision compute (GFLOPS) double : 12.76 double2 : 12.76 double4 : 12.71 double8 : 12.71 double16 : 12.58 Integer compute (GIOPS) int : 40.81 int2 : 40.81 int4 : 40.81 int8 : 40.80 int16 : 40.79 Transfer bandwidth (GBPS) enqueueWriteBuffer : 2.11 enqueueReadBuffer : 5.34 enqueueMapBuffer(for read) : 33925.49 memcpy from mapped ptr : 5.23 enqueueUnmap(after write) : 70409.30 memcpy to mapped ptr : 5.41 Kernel launch latency : -500964480.00 us