Radeon DRM Graphics Benchmarks On Linux 3.15
Phoronix: Radeon DRM Graphics Benchmarks On Linux 3.15
Our latest focus in benchmarking the Linux 3.15 kernel is the Radeon DRM kernel graphics driver. There's been some reports of small performance changes with this newest kernel currently under development, in part due to some video memory optimizations that landed this cycle. In this article are benchmarks of four AMD Radeon graphics cards when running Linux 3.14 and 3.15 Git.
You should have waited a slightly bit longer. I've just pushed out a patch that will improve performance for SI and CIK quite a bit: http://lists.freedesktop.org/archive...ay/058858.html
Ill do new benchmarks when it's merged... Since as usual, whenever posting benchmarks, there's always comments "well you should have waited for XXX", but needless to say this is the current 3.15 situation and I'll happily run more benchmarks whenever any other interesting code lands.
Originally Posted by Deathsimple
What's the state of r9 290 on opensource?
I remember asking about large page support before. Is there really no support for it in r600-eg?
Michael, just a technical question, sorry if its a noob one...
if the performance change in 3.15git is of video-memory optimization - shouldn´t this be benchmarked with cards that have limited amount of video memory (e.g. APUs, iGPUs) or with games that need more vRAM than available on the dGPU? Or should just everything be faster with this?
Correct, there is no support for it before NI and even NI-CIK only support it for VRAM not GART.
Originally Posted by curaga
Fair question. This particular optimization does not affect how efficiently the actual memory is used, just the efficiency of how the translation cache (TLB) between GPU and memory is used. Rather than storing instructions/data, the TLB is part of the memory management unit and caches the physical addresses corresponding to virtual addresses issued by the CPU. It's just like any other cache in the sense that cache misses require you to access much slower memory... each time you get a TLB miss the MMU has to walk page tables in memory to find the translation and then free up a TLB to store the translation it reads from memory.
Originally Posted by tomtomme
More efficient use of TLBs means fewer interruptions while the MMU walks through page tables, and higher *effective* memory bandwidth. Same idea as huge page support on the CPU's MMU -- using 2M pages rather than 4K pages can give a 30% performance improvement in some cases just from reduced overhead walking page tables and reloading TLB entries. TLB misses are really expensive -- without a TLB each memory access on a GPU would require ~2 additional accesses for page tables, while a CPU would require 3-4 additional accesses since the page tables have more levels. Yes that's 3-4 page table accesses for every "useful" memory access.
The obvious follow-on question is "why not make the TLBs really big ?" -- answer is that TLBs are content-addressable memories and so fairly expensive in terms of die size, so the die area required for a bigger TLB can usually be better used by adding more cache, or more cores etc...
Last edited by bridgman; 05-02-2014 at 08:36 AM.