Announcement

Collapse
No announcement yet.

Rusticl Capable Of Running Tinygrad For LLaMA Model

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by kiffmet View Post

    The goal is to make AI accessible to everyone, regardless of the device. CPUs, GPUs, DSPs, custom accelerators, … basically everything with an FPU. I.e. think about PyTorch or TensorFlow - it took ages until the former finally received ROCm support, while the latter had to be forked and always lags a few versions behind upstream. Tinygrad aims to make the programming aspect of creating a NN easier aswell.
    Yes, exactly.

    Why not contribute to Apache TVM? It already has all sorts of exotic hardware as functional targets.

    Why not build on top of IREE or MLIR, or contribute to the PyTorch MLIR effort?

    And there are a gazillion other smaller efforts: https://github.com/merrymercy/awesome-tensor-compilers

    And other not yet public efforts like HippoML or... Other ones I can't even remember or Google, because there have been so many. There was one really promising one built as some kind of alternative Python interpreter, with a lightly restricted subset of Python.

    In other words, aside from Hotz I guess, I don't see what tinygrad is doing different. Bigger projects like TVM and MLIR already have extremely fast, real demos of transformers models and such on all sort of hardware (including AMD). More focused efforts on the scale of tinygrad (like GGML) are SOTA in specific niches. Triton+PyTorch is vendor neutral and explicity endorsed by AMD. Tinygrad is... good on AMD at some point in the future, and other vendors later?
    Last edited by brucethemoose; 15 July 2023, 12:05 AM.

    Comment


    • #12
      Originally posted by brucethemoose View Post

      Yes, exactly.

      Why not contribute to Apache TVM? It already has all sorts of exotic hardware as functional targets.

      Why not build on top of IREE or MLIR, or contribute to the PyTorch MLIR effort?

      And there are a gazillion other smaller efforts: https://github.com/merrymercy/awesome-tensor-compilers

      And other not yet public efforts like HippoML or... Other ones I can't even remember or Google, because there have been so many. There was one really promising one built as some kind of alternative Python interpreter, with a lightly restricted subset of Python.

      In other words, aside from Hotz I guess, I don't see what tinygrad is doing different. Bigger projects like TVM and MLIR already have extremely fast, real demos of transformers models and such on all sort of hardware (including AMD). More focused efforts on the scale of tinygrad (like GGML) are SOTA in specific niches. Triton+PyTorch is vendor neutral and explicity endorsed by AMD. Tinygrad is... good on AMD at some point in the future, and other vendors later?
      Originally posted by brucethemoose View Post

      You are talking about Hotz?

      He is not. There are a couple of frameworks that can run fast OpenCL or Vulkan LLaMA (among other models) right now, some of them with CUDA rivaling performance running on Nvidia. Last I checked, tinygrad is so slow in LLaMA (5s/token?) that its basically not functional, though I can't find any recent benchmarks and I can't find a single interface that is using it as a backend.


      But this Hotz guy has attracted tons of attention on Twitter an YouTube... Honestly, I don't understand all this hype around tinygrad and Hotz.
      ​​​
      I cant say whether or not he is a genius, but he has contributed a LOT to open tech, tinygrad will be playing a large roll in another project he started comma AI, an open source driving assistant that is actually quite good, but IIRC there were limitations in the existsing compute stacks which contributed to tinygrads development​. as for what tinygrad is doing differently, it's actually working, unlike all those other projects, tinygrad is making real progress, that alone makes it stand out above the rest.

      Comment


      • #13
        Originally posted by vancha View Post
        How is one person currently doing a better job than an entire company that has actively been trying to enter that segment of the market? He's right, the amd drivers are a mess for compute workloads. Not something i can say from experience, but i've heard it numerous time from people that do.
        The guy's a genius, so I wonder what he'll end up with, hopefully he reaches his goal.
        And I thought that you were talking about the Mesa developers 😅

        Comment


        • #14
          Originally posted by Quackdoc View Post

          As for what tinygrad is doing differently, it's actually working, unlike all those other projects, tinygrad is making real progress, that alone makes it stand out above the rest.
          ...Well this is what has me skeptical. As far as I can tell, there is no way I can just compile tinygrad from source and run some transformers or diffusers model faster than PyTorch right now, assuming I have a 7900XTX. But I can do this with some of the other frameworks.

          ​​

          Comment


          • #15
            Originally posted by brucethemoose View Post

            ...Well this is what has me skeptical. As far as I can tell, there is no way I can just compile tinygrad from source and run some transformers or diffusers model faster than PyTorch right now, assuming I have a 7900XTX. But I can do this with some of the other frameworks.
            run faster? no idea, run at all? yeah, historically other compute libraries on my RX580 have been a massive pain to the point where I could spend a day mucking about, I had a demo running in about half an hour, quite happy with that.

            Comment


            • #16
              Originally posted by Quackdoc View Post

              run faster? no idea, run at all? yeah, historically other compute libraries on my RX580 have been a massive pain to the point where I could spend a day mucking about, I had a demo running in about half an hour, quite happy with that.
              Is it 4GB?

              Kobold.cpp (running a ggml backend) should already be fast and will get faster soon (when the Vulkan PR is merged), with a 1 click download.

              SHARK stable diffusion should also work. Its not quite 1 click... Honestly I have not kept up with the recent Stable Diffusion AMD developments, like the DirectML port or auto installation in InvokeAI/VoltaML.

              I don't think there is enough VRAM for either of the Apache TVM Vulkan demos from MLC unfortunately. 4GB is going to be very limiting in the future, except in frameworks that can split models onto the CPU effectively.
              Last edited by brucethemoose; 15 July 2023, 02:09 AM.

              Comment


              • #17
                Originally posted by brucethemoose View Post

                Is it 4GB?

                Kobold.cpp (running a ggml backend) should already be fast and will get faster soon (when the Vulkan PR is merged), with a 1 click download.

                SHARK stable diffusion should also work. Its not quite 1 click... Honestly I have not kept up with the recent Stable Diffusion AMD developments, like the DirectML port or auto installation in InvokeAI/VoltaML.

                I don't think there is enough VRAM for either of the Apache TVM Vulkan demos from MLC unfortunately. 4GB is going to be very limiting in the future, except in frameworks that can split models onto the CPU effectively.
                correct is 4gb, I dont use windows so directML is a no go, there was a onnx version, I didnt want something that worked well, I am more concerned with it working at all. I had a really bad time with a lot of stuff but tinygrad is quite easy to get going

                Comment


                • #18
                  Originally posted by brucethemoose View Post
                  (...) As far as I can tell, there is no way I can (...) run some transformers or diffusers model faster than PyTorch right now, assuming I have a 7900XTX. But I can do this with some of the other frameworks.
                  Are you really trying to frame the project being in early stages as a bad thing? If so, then the whole world should just stop developing software alltogether.
                  Last edited by kiffmet; 17 July 2023, 06:46 PM.

                  Comment


                  • #19
                    Originally posted by kiffmet View Post

                    Are you really trying to frame the project being in early stages as a bad thing? If so, then the whole world should just stop developing software alltogether.
                    I am not against tinygrad. I look forward to trying it out, and hacking in the demos as backends for existing UIs once they are fast.

                    I am excited about it.

                    ...But I am not hyped about it. And I don't like the sentiment that its the hope for ML outside Nvidia the future. Taking the headline as an example, Apache TVM and Llama.cpp are already way better at running Llama on AMD, and both are improving rapidly.
                    Last edited by brucethemoose; 17 July 2023, 09:26 PM.

                    Comment

                    Working...
                    X