Announcement

**mlau** · 16 November 2018, 02:59 AM

Originally posted by microcode View Post

Well, if they do that you can easily sue them on any shipment they represent that way. It is not FP32 precision, it is FP32 range, it's a specialized format. I highly doubt that this will be misrepresented egregiously and I don't really see why people think this is such a big deal. This is incredibly simple to do in hardware, and it has major benefits for these and some other workloads.

What it is also great for is marketing: "Look, our next generation has made a large jump in FP32(******) performance!" (******): in very select workloads where precision doesn't matter, blabla, another 5 lines with clarifications in even smaller fontsize.

Intel is after all a company driven by the marketing department.

I don't doubt that it's a nifty format for certain uses. And if it's easy to implement in HW and SW, all the better.

**coder** · 17 January 2019, 02:51 AM

Originally posted by wizard69 View Post

This seems to be a long ways off. What I’d like to know is why hasn’t Intel or AMD defined a specialized processor core for these workloads? That is like Apple and other ARM developers have done with specialized ML accelerators.

Because they both make GPUs, which are kinda that.

Specialized hardware blocks make more sense in cell phone SoCs because power-efficiency is more valuable than die area. Qualcomm took a slightly different approach of enhancing its existing DSP block to run machine learning (although they can also employ the Adreno GPU and CPU blocks).

Both Intel and AMD have added support for IEEE 754-based half-precision floats, years ago. I think Intel added it in Gen 8 (Broadwell; 2014) HD Graphics and AMD added it in Vega.

Intel is adding "DL-boost" to their upcoming 10 nm CPU cores, which is probably the context of this article. I think it's basically some subset of AVX-512 vector extensions that utilize this BFloat16 format.

AMD keeps adding more deep learning instructions to GCN, but not yet anything that can compare with Nvidia's Tensor cores.

**coder** · 17 January 2019, 02:58 AM

Originally posted by microcode View Post

I don't really see why people think this is such a big deal. This is incredibly simple to do in hardware, and it has major benefits for these and some other workloads.

As I said, I think the half-floats specified in IEEE 754 are more generally useful. That's what GPUs have used, to date.

To wit:

Q: when is 256+ 1 = 256?

A: when you're using BFloat16.

**microcode** · 18 January 2019, 10:29 PM

Originally posted by coder View Post

As I said, I think the half-floats specified in IEEE 754 are more generally useful. That's what GPUs have used, to date.

To wit:

Q: when is 256+ 1 = 256?

A: when you're using BFloat16.

Saturating addition is a good choice for some use cases.

**coder** · 18 January 2019, 11:01 PM

Originally posted by microcode View Post

Saturating addition is a good choice for some use cases.

I don't disagree with that statement, but that doesn't really speak to my point. I was just trying to illustrate what low precision this format has.

IEEE 754 half-precision was created as a balance between range and precision, whereas BFloat16 is all about range. IMO, that limits its potential for a great many uses. It's fine for deep learning, but not a whole lot else. I'd rather the industry stuck with existing half-precision.

Announcement

Intel Publishes Whitepaper On New BFloat16 Floating-Point Format For Future CPUs

Comment

Comment

Comment

Comment

Comment