OpenBLAS 0.3.10 Released With Initial BFloat16 Support, x86_64 Optimizations
A new feature release is now available for this leading open-source BLAS linear algebra library.
With this Sunday's release of OpenBLAS 0.3.10 there is initial BFloat16 (BF16) support and initial implementation in SHGEMM, imported various LAPACK bug fixes, thread locking improvements, an API for setting thread affinity on Linux via OpenBLAS, CMake build system improvements, support for MIPS 24K/24KE processors based on P5600 kernels, optimized SGEMM kernel for Cortex-A53, improved ThunderX2 performance, various performance improvements for recent x86_64 CPUs, AVX-512 fixes, and other fixes throughout and various optimizations.
From our perspective, most exciting is the initial BFloat16 support given the Intel and Arm CPUs coming to market with supporting this half-precision floating point format as well as the x86_64 optimizations. BFloat16 is important for machine learning / AI and we're anticipating more OpenBLAS BF16 support moving forward. With the x86_64 optimizations there is better DGEMM performance on Skylake-X, better STRSM performance for Haswell / Skylake X / Ryzen, and other fixes/improvements.
The full list of OpenBLAS 0.3.10 changes via the project's GitHub.
With this Sunday's release of OpenBLAS 0.3.10 there is initial BFloat16 (BF16) support and initial implementation in SHGEMM, imported various LAPACK bug fixes, thread locking improvements, an API for setting thread affinity on Linux via OpenBLAS, CMake build system improvements, support for MIPS 24K/24KE processors based on P5600 kernels, optimized SGEMM kernel for Cortex-A53, improved ThunderX2 performance, various performance improvements for recent x86_64 CPUs, AVX-512 fixes, and other fixes throughout and various optimizations.
From our perspective, most exciting is the initial BFloat16 support given the Intel and Arm CPUs coming to market with supporting this half-precision floating point format as well as the x86_64 optimizations. BFloat16 is important for machine learning / AI and we're anticipating more OpenBLAS BF16 support moving forward. With the x86_64 optimizations there is better DGEMM performance on Skylake-X, better STRSM performance for Haswell / Skylake X / Ryzen, and other fixes/improvements.
The full list of OpenBLAS 0.3.10 changes via the project's GitHub.
4 Comments