OpenBLAS 0.3.28 made it out today as the open-source optimized BLAS library that caters to a wide range of processors spanning various architectures. With this OpenBLAS 0.3.28 release are yet more optimizations and new CPU optimized paths.OpenBLAS 0.3.28 reworks its "HUGETLB" implementation from GotoBLAS, improves multi-threaded GEMM performance for certain matrices, improved BLAS3 performance on large multi-core systems via enhanced parallelism, improved performance of initial memory allocation, and a range of other common optimizations and fixes.OpenBLAS 0.3.28 also brings official support for Intel Xeon Emerald Rapids and Intel Core Ultra (Meteor Lake) processors. There is also now auto-detection of Zhaoxin KX-7000 CPUs, fixing auto-detection for old Intel Prescott CPUs, improved compiler options for CMake and LLVM builds on AVX-512 capable targets, and other x86_64 optimizations.

Over on the ARM64 side is improved GEMM performance on the Arm Neoverse V1, new optimized kernels for the A64FX, and other changes. There are also a number of LoongArch, RISC-V, and POWER optimizations too in this BLAS library update.Downloads and more details on OpenBLAS 0.3.28 for this leading open-source BLAS implementation via GitHub