Originally posted by david-nk
View Post
Announcement
Collapse
No announcement yet.
AMD Aims For 30x Energy Efficiency Improvement For AI Training + HPC By 2025
Collapse
X
-
Last edited by sdack; 29 September 2021, 04:36 PM.
-
Lord almighty, I wish people would stop saying artificial intelligence when they mean machine learning, which has absolutely nothing to do with AI at all.
We could actually have AI now, but over five decades ago companies chose machine learning instead because developing true AI would require understanding the way the human brain works. And that would have taken a coordinated 30 year plan among researchers and industry.
So they went for the super crap called machine learning instead, which reached its practical limit long ago.
That's why our "smart" devices can't even pretend to understand compound sentences, or any complex sentence, and never will.
True AI will come, and there are a handful of incredibly brilliant researchers working on truly understanding the brain, some even in the midst of creating its connectome right now.
But we're already 50 years behind because of the greed and lack of vision of both industry and far too many scientists.
Comment
-
Originally posted by brucethemoose View PostAMD already has their 2025 products in the pipe in some form. As you said, this isn't a "goal" so much as a hint at what's already cookin.
Comment
-
Originally posted by sdack View PostYou are asking a stupid question. Start by understanding AI basics before you try to understand architectures. When you need a first example then look at cmix, which is a compressor that achieves higher compression rates than LZMA by applying AI on top of standard compression methods. It shows you how AIs can be used in about every discipline, in about every problem, and why it is important to have AI capabilities within CPUs, too. Learn how AIs work and you will understand how it is not about size, but that these can yield advantages on any scale. It does not need first a GPU to use AIs. To give you another example, so is AMD using an AI logic within the branch prediction of their Zen CPUs. It is very simple and basic, and yet achieves better results than before.
- Likes 1
Comment
-
All I want is ARM's sane variable-length arrays of variable-sized data. It's the only sensible approach to this, and it works brilliantly. But instead we get Intel's AVX clusterf**k, and apparently AMD is willing to just keep following that shitty braindead design and as a result always be years behind in delivering it to market. I would love to understand how it is that such a competent team can relentlessly keep dropping the ball in this area.
- Likes 1
Comment
-
Originally posted by david-nk View PostCan you name some neural network architectures then that are designed to be primarily trained on a CPU instead of a GPU or TPU?
Abstract:
"Deep Learning (DL) algorithms are the central focus of modern machine learning systems. As data volumes keep growing, it has become customary to train large neural networks with hundreds of millions of parameters to maintain enough capacity to memorize these volumes and obtain state-of-the-art accuracy. To get around the costly computations associated with large models and data, the community is increasingly investing in specialized hardware for model training. However, specialized hardware is expensive and hard to generalize to a multitude of tasks. The progress on the algorithmic front has failed to demonstrate a direct advantage over powerful hardware such as NVIDIA-V100 GPUs. This paper provides an exception. We propose SLIDE (Sub-LInear Deep learning Engine) that uniquely blends smart randomized algorithms, with multi-core parallelism and workload optimization. Using just a CPU, SLIDE drastically reduces the computations during both training and inference outperforming an optimized implementation of Tensorflow (TF) on the best available GPU. Our evaluations on industry-scale recommendation datasets, with large fully connected architectures, show that training with SLIDE on a 44 core CPU is more than 3.5 times (1 hour vs. 3.5 hours) faster than the same network trained using TF on Tesla V100 at any given accuracy level. On the same CPU hardware, SLIDE is over 10x faster than TF. We provide codes and scripts for reproducibility"
- Likes 1
Comment
Comment