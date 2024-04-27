Show Your Support: This site is primarily supported by advertisements. Ads are what have allowed this site to be maintained on a daily basis for the past 19+ years. We do our best to ensure only clean, relevant ads are shown, when any nasty ads are detected, we work to remove them ASAP. If you would like to view the site without ads while still supporting our work, please consider our ad-free Phoronix Premium.
Llamafile 0.8.1 GPU LLM Offloading Works Now With More AMD GPUs
Most significant with Friday's Llamafile 0.8.1 release is getting GPU support working for more AMD graphics processors / accelerators. Due to some of the AMD offload code within Llamafile only assuming numeric "GFX" graphics IP version identifiers and not alpha-numeric, GPU offload was mistakenly broken for a number of AMD Instinct / Radeon parts. For hardware like the Instinct MI250 with the GFX90A IP, the "A" was not being correctly parsed and not passed to the HIP compiler. In turn this would error out and break Llamafile GPU acceleration on AMD GPUs having non-numeric characters as part of their GFX identifier. That's now fixed up with Llamafile 0.8.1 and thus AMD GPU acceleration working on more hardware for Llamafile-based large language model deployments.
Additionally, Llamafile 0.8.1 now ships pre-built NVIDIA and AMD ROCk modules for both Windows and Linux users for further easing the deployment of Llamafile single-file LLMs that support both CPU and GPU execution.
Llamafile 0.8.1 also adds support for the Phi-3 Mini 4k model, fixed a bug causing GPU model crashes. support for Command-R Plus has proper 64-bit indexing, and other fixes.
Downloads and more details on the new Llamafile 0.8.1 release via Mozilla-Ocho on GitHub.