OpenBLAS 0.3.14 Released With Performance Improvements For AMD Ryzen, POWER10

OpenBLAS 0.3.14 on the x86_64 has an optimized BFloat16 GEMM kernel for Intel Cooper Lake processors, auto-detection is added for Rocket Lake and Tiger Lake, and AMD Ryzen processors are enjoying improved performance for SASUM / DASUM / SROT / DROT kernels. The OpenBLAS x86_64 code also has fixed its detection of AMD's Clang-based AOCC compiler, support for BLAS/CBLAS tests on Windows, and other fixes.
Outside of x86_64, on the POWER front there is now optimized POWER10 kernels for SSCAL / DSCAL / CSCAL / ZSCAL / SROT / DROT / CDOT / SASUM / DASUM. There are also improved performance for other existing kernels on IBM POWER10 too. The POWER code also now can be compiled by NVIDIA's HPC compiler.
On the ARM64 front there is support for compiling with the NVIDIA HPC and NAG Fortran compilers. A RISC-V compilation fix, several new CBLAS interfaces (CROTG, ZROTG, CSROT, and ZDROT), and other various compiler fixes round out this release.
More details and downloads for the OpenBLAS 0.3.14 release via GitHub.
3 Comments