Announcement

Collapse
No announcement yet.

Llamafile 0.8.5 Delivers Greater Performance: Tiny Models 2x Faster On Threadripper

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • yoshi314
    replied
    just a nitpick, but "tiny large language model" sounds off. maybe they should label it "portable language model" ?

    Leave a comment:


  • Henk717
    replied
    Originally posted by Jedibeeftrix View Post
    does llamafile depend on the software stock providing compute from the hardware vendor?

    i.e. rocm from AMD
    To my knowledge yes, if you want a portable solution that can also run on fully open drivers Koboldcpp is worth checking out. Just like llamafile its a fork of llamacpp and comes with its own more userfriendly UI, API servers, and various backends in one file.

    You could run a K quant GGUF on Vulkan if you wish to avoid ROCm or CUDA. But if you do use CUDA all you need is the propriatary blob. I managed to run it on the manjaro live cd before without installing any packages.

    Also includes a --benchmark which may be interesting for the test suite considering the benchmark can output to csv and its pretty easy to launch the various backends since at least for the nvidia side its all single bin. Only rocm relies on a seperate fork.

    Leave a comment:


  • Kjell
    replied
    Originally posted by pWe00Iri3e7Z9lHOX2Qx View Post
    I look forward to the time when workstations like the one pictured are dirt cheap on eBay.
    I see you're playing the long game then

    Respect

    Leave a comment:


  • Lycanthropist
    replied
    What size counts as a tiny model? 7b or even smaller?

    Leave a comment:


  • pWe00Iri3e7Z9lHOX2Qx
    replied
    I look forward to the time when workstations like the one pictured are dirt cheap on eBay.

    Leave a comment:


  • rabcor
    replied
    Originally posted by Jedibeeftrix View Post
    does llamafile depend on the software stock providing compute from the hardware vendor?

    i.e. rocm from AMD
    for gpu acceleration yes. i think u can also use rocr, on nvidia you need cuda.

    CPU performance ain't that bad though on the smaller models (but they suck)

    Leave a comment:


  • Jedibeeftrix
    replied
    does llamafile depend on the software stock providing compute from the hardware vendor?

    i.e. rocm from AMD

    Leave a comment:


  • Llamafile 0.8.5 Delivers Greater Performance: Tiny Models 2x Faster On Threadripper

    Phoronix: Llamafile 0.8.5 Delivers Greater Performance: Tiny Models 2x Faster On Threadripper

    The Mozilla Ocho group has published their newest version of Llamafile, the open-source project that makes it very easy to distribute and run large language models (LLMs) as a single file. Llamafile is an excellent solution for easily sharing and running LLMs and supporting both speedy CPU-based execution as well as GPU acceleration where available...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite
Working...
X