Llamafile 0.8.5 Delivers Greater Performance: Tiny Models 2x Faster On Threadripper
The Mozilla Ocho group has published their newest version of Llamafile, the open-source project that makes it very easy to distribute and run large language models (LLMs) as a single file. Llamafile is an excellent solution for easily sharing and running LLMs and supporting both speedy CPU-based execution as well as GPU acceleration where available.
Llamafile 0.8.5 is the newest version and delivers on yet more performance tuning... On top of the recent work around AVX2 optimizations, more AMD GPU offloading, and other work. Justine Tunney explained of the latest performance work in Llamafile 0.8.5:
Doubling the performance for tiny models on AMD Ryzen Threadripper class hardware!
Llamafile 0.8.5 also delivers faster AVX2 matrix multiplication for MoE models and legacy quants. There are also some AMD Zen 4 performance optimizations, BF16 NVIDIA CUDA support, and other improvements.
Downloads and more details on the Llamafile 0.8.5 release via GitHub. I'll be working on new LLamafile benchmarks soon.
Llamafile 0.8.5 is the newest version and delivers on yet more performance tuning... On top of the recent work around AVX2 optimizations, more AMD GPU offloading, and other work. Justine Tunney explained of the latest performance work in Llamafile 0.8.5:
"As of #435 the K quants now go consistently 2x faster than llama.cpp upstream. On big CPUs like Threadripper we've doubled the performance of tiny models, for both prompt processing and token generation for tiny models."
Doubling the performance for tiny models on AMD Ryzen Threadripper class hardware!
HP Z6 G5 A with AMD Ryzen Threadripper PRO 7000 series
Llamafile 0.8.5 also delivers faster AVX2 matrix multiplication for MoE models and legacy quants. There are also some AMD Zen 4 performance optimizations, BF16 NVIDIA CUDA support, and other improvements.
Downloads and more details on the Llamafile 0.8.5 release via GitHub. I'll be working on new LLamafile benchmarks soon.
7 Comments