OpenBLAS 0.3.16 Brings Various CPU Fixes, More Optimizations

Written by Michael Larabel in Programming on 12 July 2021 at 09:00 AM EDT. 15 Comments
OpenBLAS as the popular open-source high performance BLAS/LAPACK implementation has seen a new release with more CPU/architecture specific work as well as some new common optimizations.

OpenBLAS 0.3.16 was released on Sunday and with this release some of the changes include:

- Added CPU type detection for Intel Ice Lake SP while Tiger Lake detection has been fixed..

- CPU type detection is also now in place for newer Centaur/Zhaoxin CPUs.

- AVX-512 CPUs should see better SGEMV_N and SGEMV_T performance for cases of small N sizes.

- Performance improvements around xGER, xSPR, xSPR2, xSYR, xSYR2, xTRSV, SGEMV_N, and DGEMV_N for small input sizes and consecutive arguments.

- Performance improvements for xGETRF, xPORTF and xPOTRI for small input sizes.

- Initial support for the Arm Cortex-A55.

- Fixed building OpenBLAS for the Apple M1 when using GCC/GFortran.

Downloads and more details on all of the OpenBLAS 0.3.16 changes via GitHub.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via

Popular News This Week