OpenBLAS 0.3.26 Brings More x86_64 Optimizations, Better LoongArch64 & ARM64

OpenBLAS 0.3.26 was released this week as the newest feature update to this open-source Basic Linear Algebra Subprograms (BLAS) library.

OpenBLAS 0.3.26 features much faster GESV performance for small problem sizes, pulls in various fixes from the reference LAPACK code, various build system improvements, and a number of architecture-specific optimizations and fixes.

On the x86_64 side, OpenBLAS 0.3.26 fixes the CASUM computation on Skylake-X and newer targets in cases where AVX-512 is not supported, other AVX-512 related fixes, works around a problem in the pre-AVX kernel for GEMv, and speeds up thread management on Microsoft Windows.

AMD and Intel x86_64 CPUs


OpenBLAS 0.3.26 also fixes several issues on ARM64 (AArch64), provides some new optimizations for Neoverse-V1 and other performance tuning, support for the Apple M1 and newer targets for DYNAMIC_ARCH builds, and more. There are also various IBM POWER optimizations and new/improved optimized kernels for almost all BLAS functions on LoongArch64.

Downloads and more details on the OpenBLAS 0.3.26 release via GitHub.
