Announcement

Collapse
No announcement yet.

Llamafile 0.8.5 Delivers Greater Performance: Tiny Models 2x Faster On Threadripper

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Llamafile 0.8.5 Delivers Greater Performance: Tiny Models 2x Faster On Threadripper

    Phoronix: Llamafile 0.8.5 Delivers Greater Performance: Tiny Models 2x Faster On Threadripper

    The Mozilla Ocho group has published their newest version of Llamafile, the open-source project that makes it very easy to distribute and run large language models (LLMs) as a single file. Llamafile is an excellent solution for easily sharing and running LLMs and supporting both speedy CPU-based execution as well as GPU acceleration where available...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    does llamafile depend on the software stock providing compute from the hardware vendor?

    i.e. rocm from AMD

    Comment


    • #3
      Originally posted by Jedibeeftrix View Post
      does llamafile depend on the software stock providing compute from the hardware vendor?

      i.e. rocm from AMD
      for gpu acceleration yes. i think u can also use rocr, on nvidia you need cuda.

      CPU performance ain't that bad though on the smaller models (but they suck)

      Comment


      • #4
        I look forward to the time when workstations like the one pictured are dirt cheap on eBay.

        Comment


        • #5
          What size counts as a tiny model? 7b or even smaller?

          Comment


          • #6
            Originally posted by pWe00Iri3e7Z9lHOX2Qx View Post
            I look forward to the time when workstations like the one pictured are dirt cheap on eBay.
            I see you're playing the long game then

            Respect

            Comment


            • #7
              Originally posted by Jedibeeftrix View Post
              does llamafile depend on the software stock providing compute from the hardware vendor?

              i.e. rocm from AMD
              To my knowledge yes, if you want a portable solution that can also run on fully open drivers Koboldcpp is worth checking out. Just like llamafile its a fork of llamacpp and comes with its own more userfriendly UI, API servers, and various backends in one file.

              You could run a K quant GGUF on Vulkan if you wish to avoid ROCm or CUDA. But if you do use CUDA all you need is the propriatary blob. I managed to run it on the manjaro live cd before without installing any packages.

              Also includes a --benchmark which may be interesting for the test suite considering the benchmark can output to csv and its pretty easy to launch the various backends since at least for the nvidia side its all single bin. Only rocm relies on a seperate fork.

              Comment


              • #8
                just a nitpick, but "tiny large language model" sounds off. maybe they should label it "portable language model" ?

                Comment

                Working...
                X