OpenBLAS 0.3.16 Brings Various CPU Fixes, More Optimizations
OpenBLAS as the popular open-source high performance BLAS/LAPACK implementation has seen a new release with more CPU/architecture specific work as well as some new common optimizations.
OpenBLAS 0.3.16 was released on Sunday and with this release some of the changes include:
- Added CPU type detection for Intel Ice Lake SP while Tiger Lake detection has been fixed..
- CPU type detection is also now in place for newer Centaur/Zhaoxin CPUs.
- AVX-512 CPUs should see better SGEMV_N and SGEMV_T performance for cases of small N sizes.
- Performance improvements around xGER, xSPR, xSPR2, xSYR, xSYR2, xTRSV, SGEMV_N, and DGEMV_N for small input sizes and consecutive arguments.
- Performance improvements for xGETRF, xPORTF and xPOTRI for small input sizes.
- Initial support for the Arm Cortex-A55.
- Fixed building OpenBLAS for the Apple M1 when using GCC/GFortran.
Downloads and more details on all of the OpenBLAS 0.3.16 changes via GitHub.
OpenBLAS 0.3.16 was released on Sunday and with this release some of the changes include:
- Added CPU type detection for Intel Ice Lake SP while Tiger Lake detection has been fixed..
- CPU type detection is also now in place for newer Centaur/Zhaoxin CPUs.
- AVX-512 CPUs should see better SGEMV_N and SGEMV_T performance for cases of small N sizes.
- Performance improvements around xGER, xSPR, xSPR2, xSYR, xSYR2, xTRSV, SGEMV_N, and DGEMV_N for small input sizes and consecutive arguments.
- Performance improvements for xGETRF, xPORTF and xPOTRI for small input sizes.
- Initial support for the Arm Cortex-A55.
- Fixed building OpenBLAS for the Apple M1 when using GCC/GFortran.
Downloads and more details on all of the OpenBLAS 0.3.16 changes via GitHub.
15 Comments