Llamafile 0.8 Releases With LLaMA3 & Grok Support, Faster F16 Performance

Written by Michael Larabel in Free Software on 25 April 2024 at 06:36 AM EDT. 1 Comment
Llamafile has been quite an interesting project out of Mozilla's Ocho group in the era of AI. Llamafile makes it easy to run and distribute large language models (LLMs) that are self-contained within a single file. Llamafile builds off Llama.cpp and makes it easy to ship an entire LLM as a single file with both CPU and GPU execution support. Llamafile 0.8 is out now to join in on the LLaMA3 fun as well as delivering other model support and enhancing the CPU performance.

Llamafile 0.8 is an exciting release with support for LLaMA3, Grok, and Mixtral 8x22b added.

Mixture of Experts (MoE) models like Mixtral and Grok are also now 2~5x faster for executing on CPUs after refactoring the tinyBLAS CPU code. There is also around 20% faster F16 performance on the Raspberry Pi 5, around 30% faster F16 performance on Intel Skylake, and around 60% faster F16 performance on an Apple M2.

Llamafile logo

Llamafile 0.8 also brings improved CPU feature detection and other enhancements:
- Support for LLaMA3 is now available
- Support for Grok has been introduced
- Support for Mixtral 8x22b has been introduced
- Support for Command-R models has been introduced
- MoE models (e.g. Mixtral, Grok) now go 2-5x faster on CPU
- F16 is now 20% faster on Raspberry Pi 5 (TinyLLaMA 1.1b prompt eval improved 62 -> 75 tok/sec)
- F16 is now 30% faster on Skylake (TinyLLaMA 1.1b prompt eval improved 171 -> 219 tok/sec)
- F16 is now 60% faster on Apple M2 (Mistral 7b prompt eval improved 79 -> 128 tok/sec)
- Add ability to override chat template in web gui when creating llamafiles
- Improve markdown and syntax highlighting in server
- CPU feature detection has been improved

Llamafile 0.8 downloads via GitHub. I'll be working on new Llamafile benchmarks soon.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week