AMD BIOS Tuning Guide Impact For Boosting AI/ML Performance On EPYC 9005 Series

Written by Michael Larabel in Software on 29 November 2024 at 10:36 AM EST. Page 3 of 4. 6 Comments.
Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Text Generation 128. AI/ML Tuning Recommendations was the fastest.
Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Prompt Processing 512. AI/ML Tuning Recommendations was the fastest.
Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Prompt Processing 512. AI/ML Tuning Recommendations was the fastest.

Following the AMD workload tuning guide to apply BIOS recommendations only took a few minutes of time for pushing the EPYC AI/ML performance higher.

Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Prompt Processing 1024. AI/ML Tuning Recommendations was the fastest.
Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Prompt Processing 2048. AI/ML Tuning Recommendations was the fastest.
Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: granite-3.0-3b-a800m-instruct-Q8_0, Test: Text Generation 128. AI/ML Tuning Recommendations was the fastest.
Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: Mistral-7B-Instruct-v0.3-Q8_0, Test: Prompt Processing 512. AI/ML Tuning Recommendations was the fastest.
Llama.cpp benchmark with settings of Backend: CPU BLAS, Model: Mistral-7B-Instruct-v0.3-Q8_0, Test: Prompt Processing 1024. AI/ML Tuning Recommendations was the fastest.

Llama.cpp on the 96-core AMD EPYC Zen 5 processor was also showing some added performance gains after following the AMD BIOS tuning recommendations for AI/ML workloads.

Whisperfile benchmark with settings of Model Size: Small. AI/ML Tuning Recommendations was the fastest.
Whisperfile benchmark with settings of Model Size: Medium. AI/ML Tuning Recommendations was the fastest.
Whisper.cpp benchmark with settings of Model: ggml-small.en, Input: 2016 State of the Union. AI/ML Tuning Recommendations was the fastest.
Whisper.cpp benchmark with settings of Model: ggml-medium.en, Input: 2016 State of the Union. AI/ML Tuning Recommendations was the fastest.

Whisper.cpp was also seeing some advantages out of AMD's tuning guide for AI/ML workloads on 5th Gen EPYC.

Related Articles