Llamafile 0.8.7 Brings Fixes, Better ARM Performance & Preps For New Server

Written by Michael Larabel in Mozilla on 24 June 2024 at 11:11 AM EDT. 6 Comments
MOZILLA
Llamafile has been one of the better new initiatives out of Mozilla in recent years. Llamafile makes it easy to conveniently distribute and run large language models as a single file while supporting both CPU and GPU execution and all-around making AI LLMs much more approachable for end-users. Out today is Llamafile 0.8.7 with more performance optimizations and new features.

After recent Llamafile releases have been tuning the Intel/AMD AVX performance, today's Llamafile 0.8.7 release brings some ARM performance improvements. There is better performance on Arm for legacy and K-quants while also bringing optimized matrix multiplication for I-quants on AArch64.

Llamafile 0.8.7 also fixes some AMD GPU issues on Windows by now always using tinyBLAS there, improved CPU brand detection, and other fixes.

Llamafile logo


Moving forward, a new Llamafile server is preparing to roll-out. Justine Tunney mentioned in the v0.8.7 release announcement on GitHub:
"It should be noted that, in future releases, we plan to introduce a new server for llamafile. This new server is being designed for performance and production-worthiness. It's not included in this release, since the new server currently only supports a tokenization endpoint. However the endpoint is capable of doing 2 million requests per second whereas with the current server, the most we've ever seen is a few thousand."

This patch adding the new Llamafile server notes that it is not only much faster than before but also designed to be crash-proof, reliable, and preempting.

Llamafile continues looking great for easy to distribute and run large language models. Learn more about this open-source project via Llamafile.ai.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week