Announcement

**sdack** · 29 September 2021, 04:33 PM

Originally posted by david-nk View Post

Can you name some neural network architectures then that are designed to be primarily trained on a CPU instead of a GPU or TPU?

You are asking a stupid question. Start by understanding AI basics before you try to understand architectures. When you need a first example then look at cmix, which is a compressor that achieves higher compression rates than LZMA by applying AI on top of standard compression methods. It shows you how AIs can be used in about every discipline, in about every problem, and why it is important to have AI capabilities within CPUs, too. Learn how AIs work and you will understand how it is not about size, but that these can yield advantages on any scale. It does not need first a GPU to use AIs. To give you another example, so is AMD using an AI logic within the branch prediction of their Zen CPUs. It is very simple and basic, and yet achieves better results than before.

**muncrief** · 29 September 2021, 05:51 PM

Lord almighty, I wish people would stop saying artificial intelligence when they mean machine learning, which has absolutely nothing to do with AI at all.

We could actually have AI now, but over five decades ago companies chose machine learning instead because developing true AI would require understanding the way the human brain works. And that would have taken a coordinated 30 year plan among researchers and industry.

So they went for the super crap called machine learning instead, which reached its practical limit long ago.

That's why our "smart" devices can't even pretend to understand compound sentences, or any complex sentence, and never will.

True AI will come, and there are a handful of incredibly brilliant researchers working on truly understanding the brain, some even in the midst of creating its connectome right now.

But we're already 50 years behind because of the greed and lack of vision of both industry and far too many scientists.

**Teggs** · 29 September 2021, 07:02 PM

Originally posted by brucethemoose View Post

AMD already has their 2025 products in the pipe in some form. As you said, this isn't a "goal" so much as a hint at what's already cookin.

I would say that it is a 'goal' if some necessary pieces haven't been locked down yet. That would be the microcode/firmware/userspace programs of 2025 if nothing else. Unless they already meet those numbers with prototype silicon and software, I don't have a problem with allowing them a bit of theatre.

**pal666** · 29 September 2021, 07:21 PM

i hope it wouldn't osborne current hardware

**jaxa** · 29 September 2021, 08:35 PM

Originally posted by pal666 View Post

i hope it wouldn't osborne current hardware

One man's i7-3770 is another man's treasure.

**alcalde** · 29 September 2021, 10:19 PM

Originally posted by sdack View Post

You are asking a stupid question. Start by understanding AI basics before you try to understand architectures. When you need a first example then look at cmix, which is a compressor that achieves higher compression rates than LZMA by applying AI on top of standard compression methods. It shows you how AIs can be used in about every discipline, in about every problem, and why it is important to have AI capabilities within CPUs, too. Learn how AIs work and you will understand how it is not about size, but that these can yield advantages on any scale. It does not need first a GPU to use AIs. To give you another example, so is AMD using an AI logic within the branch prediction of their Zen CPUs. It is very simple and basic, and yet achieves better results than before.

It's not a stupid question, you're making an extraordinary claim. Today AI is all about deep learning, and that requires GPU acceleration for decent performance. Hence, OP is right, no one's really concerned about efficiency improvements for CPUs doing AI training because CPUs aren't doing the AI training today (unless you're like me and AMD decides to drop support for your graphics card in their ROCM stack right after you buy it and it would be cheaper to buy a new car right now than to get a new graphics card).

**arQon** · 30 September 2021, 01:02 AM

All I want is ARM's sane variable-length arrays of variable-sized data. It's the only sensible approach to this, and it works brilliantly. But instead we get Intel's AVX clusterf**k, and apparently AMD is willing to just keep following that shitty braindead design and as a result always be years behind in delivering it to market. I would love to understand how it is that such a competent team can relentlessly keep dropping the ball in this area.

**pegasus** · 30 September 2021, 03:23 AM

I'll just leave this here:

An Energy Efficient and Massively Parallel Approach to Valid Numerics

https://www.youtube.com/watch?v=jN9L7TpMxeA

In this slidecast, John Gustafson presents: An Energy Efficient and Massively Parallel Approach to Valid Numerics."Written by one of the foremost experts in ...

**Sadhu** · 30 September 2021, 03:50 AM

Originally posted by david-nk View Post

Can you name some neural network architectures then that are designed to be primarily trained on a CPU instead of a GPU or TPU?

SLIDE - https://www.cs.rice.edu/~as143/Papers/SLIDE_MLSys.pdf

Abstract:
"Deep Learning (DL) algorithms are the central focus of modern machine learning systems. As data volumes keep growing, it has become customary to train large neural networks with hundreds of millions of parameters to maintain enough capacity to memorize these volumes and obtain state-of-the-art accuracy. To get around the costly computations associated with large models and data, the community is increasingly investing in specialized hardware for model training. However, specialized hardware is expensive and hard to generalize to a multitude of tasks. The progress on the algorithmic front has failed to demonstrate a direct advantage over powerful hardware such as NVIDIA-V100 GPUs. This paper provides an exception. We propose SLIDE (Sub-LInear Deep learning Engine) that uniquely blends smart randomized algorithms, with multi-core parallelism and workload optimization. Using just a CPU, SLIDE drastically reduces the computations during both training and inference outperforming an optimized implementation of Tensorflow (TF) on the best available GPU. Our evaluations on industry-scale recommendation datasets, with large fully connected architectures, show that training with SLIDE on a 44 core CPU is more than 3.5 times (1 hour vs. 3.5 hours) faster than the same network trained using TF on Tesla V100 at any given accuracy level. On the same CPU hardware, SLIDE is over 10x faster than TF. We provide codes and scripts for reproducibility"

**Slartifartblast** · 30 September 2021, 06:30 AM

Going down to 3nm as is their roadmap for then would certainly help.

Announcement

AMD Aims For 30x Energy Efficiency Improvement For AI Training + HPC By 2025

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment