OpenBLAS 0.3.14 Released With Performance Improvements For AMD Ryzen, POWER10

Written by Michael Larabel in Programming on 18 March 2021 at 12:00 AM EDT. 3 Comments
OpenBLAS 0.3.14 is out today as the newest version of this open-source BLAS (Basic Linear Algebra Subprograms) library that continues to work on maximizing the performance for x86_64 and other architectures.

OpenBLAS 0.3.14 on the x86_64 has an optimized BFloat16 GEMM kernel for Intel Cooper Lake processors, auto-detection is added for Rocket Lake and Tiger Lake, and AMD Ryzen processors are enjoying improved performance for SASUM / DASUM / SROT / DROT kernels. The OpenBLAS x86_64 code also has fixed its detection of AMD's Clang-based AOCC compiler, support for BLAS/CBLAS tests on Windows, and other fixes.

Outside of x86_64, on the POWER front there is now optimized POWER10 kernels for SSCAL / DSCAL / CSCAL / ZSCAL / SROT / DROT / CDOT / SASUM / DASUM. There are also improved performance for other existing kernels on IBM POWER10 too. The POWER code also now can be compiled by NVIDIA's HPC compiler.

On the ARM64 front there is support for compiling with the NVIDIA HPC and NAG Fortran compilers. A RISC-V compilation fix, several new CBLAS interfaces (CROTG, ZROTG, CSROT, and ZDROT), and other various compiler fixes round out this release.

More details and downloads for the OpenBLAS 0.3.14 release via GitHub.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via

Popular News This Week