OpenBLAS 0.3.25 Adds New AVX-512 Optimizations For Sapphire Rapids & More
Ahead of Supercomputing SC23 week, a new version of OpenBLAS has been published for this leading open-source Basic Linear Algebra Subprograms (BLAS) library. OpenBLAS 0.3.25 brings new improvements for Intel and AMD x86_64 CPUs as well as a number of general improvements, and continued tuning for other architectures like ARM64, POWER, and LoongArch.
OpenBLAS 0.3.25 brings a number of general improvements to this BLAS library, fixes building by the Cray CCE compiler, back-ports some changes from the upcoming LAPACK 3.12 reference library release, and then various architecture-specific improvements.
For Intel CPUs there are AVX-512 optimizations added for ?ASUM on Sapphire Rapids and Cooper Lake processors. For AMD CPUs there is a fix for compile-time auto-detection of AMD Ryzen Zen 3 and Zen 4 processors.
On the ARM64 side there are various fixes, a number of different fixes for IBM POWER, and then 64-bit LoongArch has added optimized SGEMV and DTRSM kernels.
Downloads and more details on the OpenBLAS 0.3.25 release via GitHub.
OpenBLAS 0.3.25 brings a number of general improvements to this BLAS library, fixes building by the Cray CCE compiler, back-ports some changes from the upcoming LAPACK 3.12 reference library release, and then various architecture-specific improvements.
For Intel CPUs there are AVX-512 optimizations added for ?ASUM on Sapphire Rapids and Cooper Lake processors. For AMD CPUs there is a fix for compile-time auto-detection of AMD Ryzen Zen 3 and Zen 4 processors.
On the ARM64 side there are various fixes, a number of different fixes for IBM POWER, and then 64-bit LoongArch has added optimized SGEMV and DTRSM kernels.
Downloads and more details on the OpenBLAS 0.3.25 release via GitHub.
Add A Comment