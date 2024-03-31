Llamafile 0.7 Brings AVX-512 Support: 10x Faster Prompt Eval Times For AMD Zen 4

Written by Michael Larabel in Programming on 31 March 2024 at 10:00 AM EDT. Add A Comment
PROGRAMMING
A new release of Llamafile is available this Easter Sunday from the Mozilla Ocho group. Llamafile is a means of distributing and running large language models (LLMs) from a single file, making LLMs much easier to distribute and use by developers and end-users. Llamafile remains one of the more interesting non-browser projects out of Mozilla in recent times that so far has a bright future.

Llamafile makes dealing with large language models much more convenient and easier to deploy by leveraging Llama.cpp and making it easy to deliver an entire LLM within a single-file executable that works on most systems while being able to leverage both CPU and GPU execution.

With Llamafile 0.7 out today there is finally AVX-512 support! Those testing out Llama 0.7 on AVX-512 enabled CPUs like AMD Zen 4 are finding around 10x faster prompt evaluation times with this support. It's a very nice Easter gift for those with AVX-512 and using Llamafile for large language models on CPUs.

AMD AM5 CPUs for Easter with Korbinian Starkbier


I've been running some Llamafile benchmarks for a few months and look forward to trying out Llamafile 0.7 for looking at its performance gains on AVX-512 Intel and AMD processors.

Llamafile 0.7 also brings BF16 CPU support, a security fix, various Windows improvements, prompt evaluation on the Raspberry Pi 5 with F16 weights is now around 8x faster, and various other improvements.

Downloads and more information on Llamafile 0.7 via GitHub.
Add A Comment
Related News
Flatpak 1.15.7 Will Now Automatically Remove Obsolete Driver Versions
TornadoVM v1.0.3 OpenJDK/GraalVM Plug-In For Java Heterogeneous Hardware Support
Rust-Written Coreutils 0.0.25 With Improved GNU Compatibility
OpenJDK Java 22 Rolls Into GA With New Features
Mold Linker Jumps From v2.4.1 To v2.30 To Resolve GNU libtool Compatibility
Rust-Tailored Slint GUI Toolkit Adding Python API
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week
XZ Struck By Malicious Code That Could Allow Unauthorized Remote System Access
Linux 6.9 Will Boot Much Faster For Systems With Large Amounts Of RAM
GitHub Disables The XZ Repository Following Today's Malicious Disclosure
Microsoft Enables DNS Tunneling By Default For WSL - More Reliable Networking
SDL Developers Weigh Reverting Wayland Over X11 For SDL 3.0
Linux 6.9 Deprecates The EXT2 File-System Driver
Microsoft Engineer Sends Rust Linux Kernel Patches For In-Place Module Initialization
GCC Compiler Adds Support For Device Offloading With AMD RDNA3 APUs (GFX1103)