Just in case if anyone wants to compare with the OpenCL performance of 4 core Mali-G610 on RK3588.
Platform: ARM Platform | |
Device: Mali-G610 r0p0 | |
Driver version : 3.0 (Linux ARM64) | |
Compute units : 4 | |
Clock frequency : 1000 MHz | |
Global memory bandwidth (GBPS) | |
float : 23.49 | |
float2 : 25.14 | |
float4 : 25.20 | |
float8 : 19.88 | |
float16 : 12.13 | |
Single-precision compute (GFLOPS) | |
float : 437.30 | |
float2 : 465.54 | |
float4 : 457.02 | |
float8 : 428.78 | |
float16 : 401.26 | |
Half-precision compute (GFLOPS) | |
half : 436.25 | |
half2 : 864.25 | |
half4 : 895.39 | |
half8 : 869.74 | |
half16 : 827.63 | |
No double precision support! Skipped | |
Integer compute (GIOPS) | |
int : 122.98 | |
int2 : 123.53 | |
int4 : 122.55 | |
int8 : 121.17 | |
int16 : 122.02 | |
Transfer bandwidth (GBPS) | |
enqueueWriteBuffer : 7.22 | |
enqueueReadBuffer : 8.19 | |
enqueueMapBuffer(for read) : 52.29 | |
memcpy from mapped ptr : 7.48 | |
enqueueUnmap(after write) : 59.96 | |
memcpy to mapped ptr : 8.57 | |
Kernel launch latency : 36.42 us |
Comment