Announcement

Collapse
No announcement yet.

OpenBLAS 0.3.10 Released With Initial BFloat16 Support, x86_64 Optimizations

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • OpenBLAS 0.3.10 Released With Initial BFloat16 Support, x86_64 Optimizations

    Phoronix: OpenBLAS 0.3.10 Released With Initial BFloat16 Support, x86_64 Optimizations

    A new feature release is now available for this leading open-source BLAS linear algebra library...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Typo:

    Originally posted by phoronix View Post
    various performance improvements for recent x86_64 CPus,

    Comment


    • #3
      This is a tad funny looking back, I did a float16 package that I presented at math conference in 2004. The only difference at that time is that I used LUT and I could keep the precision of float 32 or even float64 in most cases.

      Since any operation based on a ADC of 12 bits can only produce 2**12 possibilities and the result of 2 variable operation will add <=1 bit to the domain size, you can keep this pretty fast and small. This kept the FPU filled with data from the main memory 2x or 4x. it's nice to see that precision is not that important anymore.

      Comment


      • #4
        By far the most important math library. The AMD equivalents blis and libflame are performant but produce numerical errors - so they are useless. Intels MKL are very good and useable on Epyc/Threadripper, too. ATLAS is (too) slow, as are the reference BLAS/LAPACK implementations.
        This is one area where AMD really has to catch up...

        Comment


        • #5
          Originally posted by tchiwam View Post
          This is a tad funny looking back, I did a float16 package that I presented at math conference in 2004. The only difference at that time is that I used LUT and I could keep the precision of float 32 or even float64 in most cases.
          Got a link to a paper?

          Comment

          Working...
          X