
OpenBLAS continues striving to compete with Intel's MKL and other optimized BLAS implementations and with more AVX2 and AVX-512 should help with the performance on the latest Intel and AMD CPUs. There is now an AVX-512 DGEMM kernel, the AVX-512 SGEMM kernel was "significantly" improved, and new AVX-512 optimized kernels for CGEMM and ZGEMM. On the AVX2 front the kernels for STRMM, SGEMM, and CGEMM are said to have been significantly sped-up along with new kernels for CGEMM3M and ZGEMM3M.
OpenBLAS 0.3.8 also adds support for QEMU virtual CPU detection, Intel Goldmont Plus CPU auto-detection, ARMv8 performance optimizations, various POWER optimizations, LAPACK 3.9.0 is now integrated, CMake build system improvements, and other general optimizations. There is also GCC 10 compiler support and improving compilation with g95 and non-GNU versions of the LD linker. Rounding out the release is official NetBSD support.
More details on the OpenBLAS 0.3.8 release via GitHub.
3 Comments