Announcement

Collapse
No announcement yet.

AMD Aims For 30x Energy Efficiency Improvement For AI Training + HPC By 2025

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by david-nk View Post
    Can you name some neural network architectures then that are designed to be primarily trained on a CPU instead of a GPU or TPU?
    You are asking a stupid question. Start by understanding AI basics before you try to understand architectures. When you need a first example then look at cmix, which is a compressor that achieves higher compression rates than LZMA by applying AI on top of standard compression methods. It shows you how AIs can be used in about every discipline, in about every problem, and why it is important to have AI capabilities within CPUs, too. Learn how AIs work and you will understand how it is not about size, but that these can yield advantages on any scale. It does not need first a GPU to use AIs. To give you another example, so is AMD using an AI logic within the branch prediction of their Zen CPUs. It is very simple and basic, and yet achieves better results than before.
    Last edited by sdack; 29 September 2021, 04:36 PM.

    Comment


    • #12
      Lord almighty, I wish people would stop saying artificial intelligence when they mean machine learning, which has absolutely nothing to do with AI at all.

      We could actually have AI now, but over five decades ago companies chose machine learning instead because developing true AI would require understanding the way the human brain works. And that would have taken a coordinated 30 year plan among researchers and industry.

      So they went for the super crap called machine learning instead, which reached its practical limit long ago.

      That's why our "smart" devices can't even pretend to understand compound sentences, or any complex sentence, and never will.

      True AI will come, and there are a handful of incredibly brilliant researchers working on truly understanding the brain, some even in the midst of creating its connectome right now.

      But we're already 50 years behind because of the greed and lack of vision of both industry and far too many scientists.

      Comment


      • #13
        Originally posted by brucethemoose View Post
        AMD already has their 2025 products in the pipe in some form. As you said, this isn't a "goal" so much as a hint at what's already cookin.
        I would say that it is a 'goal' if some necessary pieces haven't been locked down yet. That would be the microcode/firmware/userspace programs of 2025 if nothing else. Unless they already meet those numbers with prototype silicon and software, I don't have a problem with allowing them a bit of theatre.

        Comment


        • #14
          i hope it wouldn't osborne current hardware

          Comment


          • #15
            Originally posted by pal666 View Post
            i hope it wouldn't osborne current hardware
            One man's i7-3770 is another man's treasure.

            Comment


            • #16
              Originally posted by sdack View Post
              You are asking a stupid question. Start by understanding AI basics before you try to understand architectures. When you need a first example then look at cmix, which is a compressor that achieves higher compression rates than LZMA by applying AI on top of standard compression methods. It shows you how AIs can be used in about every discipline, in about every problem, and why it is important to have AI capabilities within CPUs, too. Learn how AIs work and you will understand how it is not about size, but that these can yield advantages on any scale. It does not need first a GPU to use AIs. To give you another example, so is AMD using an AI logic within the branch prediction of their Zen CPUs. It is very simple and basic, and yet achieves better results than before.
              It's not a stupid question, you're making an extraordinary claim. Today AI is all about deep learning, and that requires GPU acceleration for decent performance. Hence, OP is right, no one's really concerned about efficiency improvements for CPUs doing AI training because CPUs aren't doing the AI training today (unless you're like me and AMD decides to drop support for your graphics card in their ROCM stack right after you buy it and it would be cheaper to buy a new car right now than to get a new graphics card).

              Comment


              • #17
                All I want is ARM's sane variable-length arrays of variable-sized data. It's the only sensible approach to this, and it works brilliantly. But instead we get Intel's AVX clusterf**k, and apparently AMD is willing to just keep following that shitty braindead design and as a result always be years behind in delivering it to market. I would love to understand how it is that such a competent team can relentlessly keep dropping the ball in this area.

                Comment


                • #18
                  I'll just leave this here:
                  In this slidecast, John Gustafson presents: An Energy Efficient and Massively Parallel Approach to Valid Numerics."Written by one of the foremost experts in ...

                  Comment


                  • #19
                    Originally posted by david-nk View Post
                    Can you name some neural network architectures then that are designed to be primarily trained on a CPU instead of a GPU or TPU?
                    SLIDE - https://www.cs.rice.edu/~as143/Papers/SLIDE_MLSys.pdf

                    Abstract:
                    "Deep Learning (DL) algorithms are the central focus of modern machine learning systems. As data volumes keep growing, it has become customary to train large neural networks with hundreds of millions of parameters to maintain enough capacity to memorize these volumes and obtain state-of-the-art accuracy. To get around the costly computations associated with large models and data, the community is increasingly investing in specialized hardware for model training. However, specialized hardware is expensive and hard to generalize to a multitude of tasks. The progress on the algorithmic front has failed to demonstrate a direct advantage over powerful hardware such as NVIDIA-V100 GPUs. This paper provides an exception. We propose SLIDE (Sub-LInear Deep learning Engine) that uniquely blends smart randomized algorithms, with multi-core parallelism and workload optimization. Using just a CPU, SLIDE drastically reduces the computations during both training and inference outperforming an optimized implementation of Tensorflow (TF) on the best available GPU. Our evaluations on industry-scale recommendation datasets, with large fully connected architectures, show that training with SLIDE on a 44 core CPU is more than 3.5 times (1 hour vs. 3.5 hours) faster than the same network trained using TF on Tesla V100 at any given accuracy level. On the same CPU hardware, SLIDE is over 10x faster than TF. We provide codes and scripts for reproducibility"

                    Comment


                    • #20
                      Going down to 3nm as is their roadmap for then would certainly help.

                      Comment

                      Working...
                      X