Announcement

Collapse
No announcement yet.

AMD Releases AMD-135M: An Open-Source Small Language Model

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Releases AMD-135M: An Open-Source Small Language Model

    Phoronix: AMD Releases AMD-135M: An Open-Source Small Language Model

    AMD today announced "AMD-135M" as their first small language model they are publicly releasing. AMD-135M is open-source with the training code, dataset, and weights all being open-source to help in the development of other SLMs and LLMs...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    AMD really taking advantage of their last-mover advantage here

    Comment


    • #3
      Great, another thing that apparently doesn't run on the XDNA TPUs they've been selling for two generations now.

      Comment


      • #4
        Originally posted by fallingcats View Post
        Great, another thing that apparently doesn't run on the XDNA TPUs they've been selling for two generations now.
        They mention running it on the Ryzen NPU in the GitHub repo.

        Comment


        • #5
          Originally posted by LtdJorge View Post

          They mention running it on the Ryzen NPU in the GitHub repo.
          Oh wow, way to be proven wrong I guess. I did specifically have a look at the blog trying to see if it supported XDNA, but apparently they're using the term NPU now. What happened to TPU anyway? Was the term too closely tied to Google?

          For the repo though, I can't find any info on how the actually managed to run it on the NPU. They seem to be more concerned with proving that the speculative decoding trick from half a year ago is working. That plus the fact that they're running the model quantized to int4 on the NPU (when it should be capable of Block FP16) makes me think that the performance is shit for anything actually useful.
          Last edited by fallingcats; 28 September 2024, 12:55 AM.

          Comment


          • #6
            Originally posted by fallingcats View Post
            XDNA, but apparently they're using the term NPU now.
            I believe that AMD was reluctant to use NPU because traditionally, NPU in AMDspeak meant "APU with the graphics part disabled", e.g. the Athlon X4 950 (for reference, google "AM4 NPU" with the quotes).

            But now that everyone says NPU when referring to AI accelerators, I guess AMD has to join.

            Comment


            • #7
              Originally posted by fallingcats View Post

              Oh wow, way to be proven wrong I guess. I did specifically have a look at the blog trying to see if it supported XDNA, but apparently they're using the term NPU now. What happened to TPU anyway? Was the term too closely tied to Google?

              For the repo though, I can't find any info on how the actually managed to run it on the NPU. They seem to be more concerned with proving that the speculative decoding trick from half a year ago is working. That plus the fact that they're running the model quantized to int4 on the NPU (when it should be capable of Block FP16) makes me think that the performance is shit for anything actually useful.
              Yeah, TPU is more of a Google thing. Ryzen XDNA has always been called NPU AFAIK. I mean, there's only been two generations.

              Comment


              • #8
                Originally posted by LtdJorge View Post

                Yeah, TPU is more of a Google thing. Ryzen XDNA has always been called NPU AFAIK. I mean, there's only been two generations.
                It´s actually a Versal AI Core from Xilinx (which was bought by AMD), maybe that´s why they call it "X"DNA? as well?
                But it´s quite limited in datatypes

                Comment


                • #9
                  Originally posted by LtdJorge View Post

                  They mention running it on the Ryzen NPU in the GitHub repo.
                  I am worried by this move. First the NPU has been out for 2 gens and the support is meh! Second AMD promised to make us able to run 30 billion parameters models at 100 tokens/s in the near future https://www.theregister.com/2024/07/15/amd_ai_pc_goal/

                  The problem? Why are they spitting out a miserable 135 million parameters model? Such a thing is a dud. LLMs have demonstrated very good properties but starting at 30b+ params. The 7b params are practically just demos. Same with 12-13-14b. The omnipresent quantization ( often at 4 bits ) represents a big loss in perplexity, except for 70b+ params models where it gets compensated by the sheer size of the model.

                  The only realistic way for small models ( meaning 2-15b params ) to be functional is to have them fine-tuned on a very small use case and of course they become practically single purpose.

                  What I am worried is that AMD isn't able to maintain its promises with the NPU and is trying a PR stunt with these two models. After all the biggest problem with large language models is memory bandwidth, much more than computing power.

                  Comment


                  • #10
                    Originally posted by pabloski View Post

                    I am worried by this move. First the NPU has been out for 2 gens and the support is meh! Second AMD promised to make us able to run 30 billion parameters models at 100 tokens/s in the near future https://www.theregister.com/2024/07/15/amd_ai_pc_goal/

                    The problem? Why are they spitting out a miserable 135 million parameters model? Such a thing is a dud. LLMs have demonstrated very good properties but starting at 30b+ params. The 7b params are practically just demos. Same with 12-13-14b. The omnipresent quantization ( often at 4 bits ) represents a big loss in perplexity, except for 70b+ params models where it gets compensated by the sheer size of the model.

                    The only realistic way for small models ( meaning 2-15b params ) to be functional is to have them fine-tuned on a very small use case and of course they become practically single purpose.

                    What I am worried is that AMD isn't able to maintain its promises with the NPU and is trying a PR stunt with these two models. After all the biggest problem with large language models is memory bandwidth, much more than computing power.
                    I think this model is actually a demo on training, not on inference. They want developers to get their hands on training on AMD Instinct, but it can be done first on Ryzen and the like, with extremely fast inference.

                    So just a model for training, not for actual results.

                    Comment

                    Working...
                    X