Announcement

**jayN** · 13 April 2021, 02:16 PM

Originally posted by coder View Post

Oh, that's right. I forgot it was all Ethernet.

So, is the data movement all push-based? It seems like they'd be at a disadvantage for reads.

BTW, I think NVLink is cache-coherent, which is just unnecessary overhead in some important cases (but nice, in others).

One thing I'll say is definitely working against Nvidia, in the future, is their apparent desire to be all things to all users. AI training doesn't need fp64, so why exactly does the A100 still have it?

The AI operations don't need to know anything about the source of the data. It makes sense for all the data to be push based...

I haven't seen the Habana Gaudi training code. I'd guess a controlling thread initiates a bunch of ethernet DMA transfers using the ROCE feature. while the ai operation threads just wait for inputs to arrive. The weights probably get stored off in HBME blocks.

NVDA probably gets their fp64 by fusing or pipelining FP16 or FP32 operations. I believe Intel's Xe-HPC has dedicated FP64.

The HPC people are using AI now, but they also want their 64 bit operations. I saw a good presentation on a project from CERN. There's a write-up here:

https://www.intel.com/content/www/us...mer-story.html

**coder** · 14 April 2021, 05:50 AM

Originally posted by jayN View Post

NVDA probably gets their fp64 by fusing or pipelining FP16 or FP32 operations.

I think I recall something about that in the Pascal (P100) whitepaper, but that still doesn't mean it's without substantial cost.

Originally posted by jayN View Post

I believe Intel's Xe-HPC has dedicated FP64.

AMD's next CDNA compute processor (can't really call it a GPU if it has no graphics units, right?) seems to be going for a dedicated 64-bit approach.

Originally posted by jayN View Post

The HPC people are using AI now, but they also want their 64 bit operations. I saw a good presentation on a project from CERN.

I don't doubt that, but I'm sure the market for AI training is far larger than HPC. So, it's more that Nvidia should make a processor optimized for AI training, without the HPC baggage. Then, if HPC people want AI, they can use a mix of processors that are each better at the functions they support.

Don't get me wrong: for the enthusiast, that would be bad. I have a Radeon VII at home and a Titan V at work. I love that these are all-purpose accelerators, from fp64 to AI and graphics. And it was AI that motivated the purchase of the Titan V, so we also got an awesome graphics card as a bonus. I'm just saying that it seems time for Nvidia to finally do a more targeted AI training processor, to continue to remain competitive in that market.

Announcement

Intel Details New Data Streaming Accelerator For Future CPUs - Linux Support Started

Comment

Comment