OpenBLAS 0.3.28 Brings More Optimizations, Meteor Lake & Emerald Rapids Support
OpenBLAS 0.3.28 made it out today as the open-source optimized BLAS library that caters to a wide range of processors spanning various architectures. With this OpenBLAS 0.3.28 release are yet more optimizations and new CPU optimized paths.
OpenBLAS 0.3.28 reworks its "HUGETLB" implementation from GotoBLAS, improves multi-threaded GEMM performance for certain matrices, improved BLAS3 performance on large multi-core systems via enhanced parallelism, improved performance of initial memory allocation, and a range of other common optimizations and fixes.
OpenBLAS 0.3.28 also brings official support for Intel Xeon Emerald Rapids and Intel Core Ultra (Meteor Lake) processors. There is also now auto-detection of Zhaoxin KX-7000 CPUs, fixing auto-detection for old Intel Prescott CPUs, improved compiler options for CMake and LLVM builds on AVX-512 capable targets, and other x86_64 optimizations.
Over on the ARM64 side is improved GEMM performance on the Arm Neoverse V1, new optimized kernels for the A64FX, and other changes. There are also a number of LoongArch, RISC-V, and POWER optimizations too in this BLAS library update.
Downloads and more details on OpenBLAS 0.3.28 for this leading open-source BLAS implementation via GitHub.
OpenBLAS 0.3.28 reworks its "HUGETLB" implementation from GotoBLAS, improves multi-threaded GEMM performance for certain matrices, improved BLAS3 performance on large multi-core systems via enhanced parallelism, improved performance of initial memory allocation, and a range of other common optimizations and fixes.
OpenBLAS 0.3.28 also brings official support for Intel Xeon Emerald Rapids and Intel Core Ultra (Meteor Lake) processors. There is also now auto-detection of Zhaoxin KX-7000 CPUs, fixing auto-detection for old Intel Prescott CPUs, improved compiler options for CMake and LLVM builds on AVX-512 capable targets, and other x86_64 optimizations.
Over on the ARM64 side is improved GEMM performance on the Arm Neoverse V1, new optimized kernels for the A64FX, and other changes. There are also a number of LoongArch, RISC-V, and POWER optimizations too in this BLAS library update.
Downloads and more details on OpenBLAS 0.3.28 for this leading open-source BLAS implementation via GitHub.
Add A Comment