Llama.cpp AI Performance With The GeForce RTX 5090
For Llama.cpp with Llama 3.1 8B and looking at the text generation with 128 tokens, there was a huge win with the GeForce RTX 5090.... 1.58x the performance of the GeForce RTX 4090. And much more significant than the relatively small delta going from the RTX 3090 to RTX 4090.
The GeForce RTX 5090 was consuming much more power than the prior NVIDIA graphics cards but still on a performance-per-Watt basis was comparable to the GeForce RTX 4090 and RTX 4080 SUPER.
For prompt processing with a batch size of 2048, the RTX 5090 was about 17% faster than the RTX 4090 -- which itself was a huge improvement over the RTX 30 and other RTX 40 graphics cards.
On a performance-per-Watt basis for Llama 3.1 prompt processing, the RTX 5090 came in between the RTX 4090 and RTX 4080 SUPER for its power efficiency.
Even with the increased power use of the GeForce RTX 5090, this Founders Edition graphics card continues to thermally operate rather efficiently.