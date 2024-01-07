Show Your Support: Have you heard of Phoronix Premium? It's what complements advertisements on this site for our premium ad-free service. For less than $4 USD per month, you can help support our site while the funds generated allow us to keep doing Linux hardware reviews, performance benchmarking, maintain our community forums, and much more.
OpenBLAS 0.3.26 Brings More x86_64 Optimizations, Better LoongArch64 & ARM64
OpenBLAS 0.3.26 features much faster GESV performance for small problem sizes, pulls in various fixes from the reference LAPACK code, various build system improvements, and a number of architecture-specific optimizations and fixes.
On the x86_64 side, OpenBLAS 0.3.26 fixes the CASUM computation on Skylake-X and newer targets in cases where AVX-512 is not supported, other AVX-512 related fixes, works around a problem in the pre-AVX kernel for GEMv, and speeds up thread management on Microsoft Windows.
OpenBLAS 0.3.26 also fixes several issues on ARM64 (AArch64), provides some new optimizations for Neoverse-V1 and other performance tuning, support for the Apple M1 and newer targets for DYNAMIC_ARCH builds, and more. There are also various IBM POWER optimizations and new/improved optimized kernels for almost all BLAS functions on LoongArch64.
Downloads and more details on the OpenBLAS 0.3.26 release via GitHub.