Announcement

Collapse
No announcement yet.

Intel Publishes Whitepaper On New BFloat16 Floating-Point Format For Future CPUs

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel Publishes Whitepaper On New BFloat16 Floating-Point Format For Future CPUs

    Phoronix: Intel Publishes Whitepaper On New BFloat16 Floating-Point Format For Future CPUs

    Intel has published their initial whitepaper on BF16/BFloat16, a new floating point format to be supported by future Intel processors...

    http://www.phoronix.com/scan.php?pag...-Deep-Learning

  • coder
    replied
    Originally posted by microcode View Post
    Saturating addition is a good choice for some use cases.
    I don't disagree with that statement, but that doesn't really speak to my point. I was just trying to illustrate what low precision this format has.

    IEEE 754 half-precision was created as a balance between range and precision, whereas BFloat16 is all about range. IMO, that limits its potential for a great many uses. It's fine for deep learning, but not a whole lot else. I'd rather the industry stuck with existing half-precision.

    Leave a comment:


  • microcode
    replied
    Originally posted by coder View Post
    As I said, I think the half-floats specified in IEEE 754 are more generally useful. That's what GPUs have used, to date.

    To wit:

    Q: when is 256+ 1 = 256?

    A: when you're using BFloat16.
    Saturating addition is a good choice for some use cases.

    Leave a comment:


  • coder
    replied
    Originally posted by microcode View Post
    I don't really see why people think this is such a big deal. This is incredibly simple to do in hardware, and it has major benefits for these and some other workloads.
    As I said, I think the half-floats specified in IEEE 754 are more generally useful. That's what GPUs have used, to date.

    To wit:

    Q: when is 256+ 1 = 256?

    A: when you're using BFloat16.

    Leave a comment:


  • coder
    replied
    Originally posted by wizard69 View Post
    This seems to be a long ways off. What I’d like to know is why hasn’t Intel or AMD defined a specialized processor core for these workloads? That is like Apple and other ARM developers have done with specialized ML accelerators.
    Because they both make GPUs, which are kinda that.

    Specialized hardware blocks make more sense in cell phone SoCs because power-efficiency is more valuable than die area. Qualcomm took a slightly different approach of enhancing its existing DSP block to run machine learning (although they can also employ the Adreno GPU and CPU blocks).

    Both Intel and AMD have added support for IEEE 754-based half-precision floats, years ago. I think Intel added it in Gen 8 (Broadwell; 2014) HD Graphics and AMD added it in Vega.

    Intel is adding "DL-boost" to their upcoming 10 nm CPU cores, which is probably the context of this article. I think it's basically some subset of AVX-512 vector extensions that utilize this BFloat16 format.

    AMD keeps adding more deep learning instructions to GCN, but not yet anything that can compare with Nvidia's Tensor cores.

    Leave a comment:


  • mlau
    replied
    Originally posted by microcode View Post

    Well, if they do that you can easily sue them on any shipment they represent that way. It is not FP32 precision, it is FP32 range, it's a specialized format. I highly doubt that this will be misrepresented egregiously and I don't really see why people think this is such a big deal. This is incredibly simple to do in hardware, and it has major benefits for these and some other workloads.
    What it is also great for is marketing: "Look, our next generation has made a large jump in FP32(******) performance!" (******): in very select workloads where precision doesn't matter, blabla, another 5 lines with clarifications in even smaller fontsize.

    Intel is after all a company driven by the marketing department.

    I don't doubt that it's a nifty format for certain uses. And if it's easy to implement in HW and SW, all the better.
    Last edited by mlau; 11-16-2018, 03:02 AM.

    Leave a comment:


  • microcode
    replied
    Originally posted by mlau View Post
    And it will look good in benchmarks: "Almost FP32 precision at FP16 perf!". And then the "almost" will some day silently be dropped.
    Well, if they do that you can easily sue them on any shipment they represent that way. It is not FP32 precision, it is FP32 range, it's a specialized format. I highly doubt that this will be misrepresented egregiously and I don't really see why people think this is such a big deal. This is incredibly simple to do in hardware, and it has major benefits for these and some other workloads.

    Leave a comment:


  • wizard69
    replied
    This seems to be a long ways off. What I’d like to know is why hasn’t Intel or AMD defined a specialized processor core for these workloads? That is like Apple and other ARM developers have done with specialized ML accelerators.

    Leave a comment:


  • coder
    replied
    Originally posted by jacob View Post
    Will AMD CPUs be legally able to support this?
    I thought I read it came from Google. Anyway, it's what they use in their TPU2's.

    I'm skeptical it's really any faster than half-precision floats, other than conversion to/from normal fp32. IMO, half-precision is generally more useful.

    Without denormals, even fp32 isn't very usable for many applications (hence, the popularity of fp64 for GPU compute).

    Leave a comment:


  • carewolf
    replied
    You could basically introduce that today, if the compiler can assume those things are not important (not supposed to be supported), they can optimize much better.

    Leave a comment:

Working...
X