Llama.cpp AI Performance With The GeForce RTX 5090

Written by Michael Larabel in Graphics Cards on 27 January 2025 at 02:33 PM EST. Page 2 of 3. 41 Comments.
Llama.cpp benchmark with settings of Backend: NVIDIA CUDA, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Text Generation 128. RTX 5090 was the fastest.

For Llama.cpp with Llama 3.1 8B and looking at the text generation with 128 tokens, there was a huge win with the GeForce RTX 5090.... 1.58x the performance of the GeForce RTX 4090. And much more significant than the relatively small delta going from the RTX 3090 to RTX 4090.

Llama.cpp benchmark with settings of Backend: NVIDIA CUDA, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Text Generation 128. RTX 5090 was the fastest.
Llama.cpp benchmark with settings of Backend: NVIDIA CUDA, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Text Generation 128. RTX 5090 was the fastest.

The GeForce RTX 5090 was consuming much more power than the prior NVIDIA graphics cards but still on a performance-per-Watt basis was comparable to the GeForce RTX 4090 and RTX 4080 SUPER.

Llama.cpp benchmark with settings of Backend: NVIDIA CUDA, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Prompt Processing 2048. RTX 5090 was the fastest.

For prompt processing with a batch size of 2048, the RTX 5090 was about 17% faster than the RTX 4090 -- which itself was a huge improvement over the RTX 30 and other RTX 40 graphics cards.

Llama.cpp benchmark with settings of Backend: NVIDIA CUDA, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Prompt Processing 2048. RTX 5090 was the fastest.
Llama.cpp benchmark with settings of Backend: NVIDIA CUDA, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Prompt Processing 2048. RTX 5090 was the fastest.

On a performance-per-Watt basis for Llama 3.1 prompt processing, the RTX 5090 came in between the RTX 4090 and RTX 4080 SUPER for its power efficiency.

Llama.cpp benchmark with settings of Backend: NVIDIA CUDA, Model: Llama-3.1-Tulu-3-8B-Q8_0, Test: Prompt Processing 2048. RTX 5090 was the fastest.

Even with the increased power use of the GeForce RTX 5090, this Founders Edition graphics card continues to thermally operate rather efficiently.

Related Articles