Results 1 to 8 of 8

Thread: Radeon DRM Graphics Benchmarks On Linux 3.15

  1. #1
    Join Date
    Jan 2007
    Posts
    15,108

    Default Radeon DRM Graphics Benchmarks On Linux 3.15

    Phoronix: Radeon DRM Graphics Benchmarks On Linux 3.15

    Our latest focus in benchmarking the Linux 3.15 kernel is the Radeon DRM kernel graphics driver. There's been some reports of small performance changes with this newest kernel currently under development, in part due to some video memory optimizations that landed this cycle. In this article are benchmarks of four AMD Radeon graphics cards when running Linux 3.14 and 3.15 Git.

    http://www.phoronix.com/vr.php?view=20331

  2. #2
    Join Date
    Oct 2008
    Location
    Germany
    Posts
    74

    Default

    You should have waited a slightly bit longer. I've just pushed out a patch that will improve performance for SI and CIK quite a bit: http://lists.freedesktop.org/archive...ay/058858.html

  3. #3

    Default

    Quote Originally Posted by Deathsimple View Post
    You should have waited a slightly bit longer. I've just pushed out a patch that will improve performance for SI and CIK quite a bit: http://lists.freedesktop.org/archive...ay/058858.html
    Ill do new benchmarks when it's merged... Since as usual, whenever posting benchmarks, there's always comments "well you should have waited for XXX", but needless to say this is the current 3.15 situation and I'll happily run more benchmarks whenever any other interesting code lands.

  4. #4
    Join Date
    May 2010
    Posts
    89

    Default

    What's the state of r9 290 on opensource?

  5. #5
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,185

    Default

    I remember asking about large page support before. Is there really no support for it in r600-eg?

  6. #6
    Join Date
    Mar 2012
    Posts
    325

    Default

    Michael, just a technical question, sorry if its a noob one...
    if the performance change in 3.15git is of video-memory optimization - shouldn´t this be benchmarked with cards that have limited amount of video memory (e.g. APUs, iGPUs) or with games that need more vRAM than available on the dGPU? Or should just everything be faster with this?

  7. #7
    Join Date
    Oct 2008
    Location
    Germany
    Posts
    74

    Default

    Quote Originally Posted by curaga View Post
    Is there really no support for it in r600-eg?
    Correct, there is no support for it before NI and even NI-CIK only support it for VRAM not GART.

  8. #8
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,514

    Default

    Quote Originally Posted by tomtomme View Post
    Michael, just a technical question, sorry if its a noob one...
    if the performance change in 3.15git is of video-memory optimization - shouldn´t this be benchmarked with cards that have limited amount of video memory (e.g. APUs, iGPUs) or with games that need more vRAM than available on the dGPU? Or should just everything be faster with this?
    Fair question. This particular optimization does not affect how efficiently the actual memory is used, just the efficiency of how the translation cache (TLB) between GPU and memory is used. Rather than storing instructions/data, the TLB is part of the memory management unit and caches the physical addresses corresponding to virtual addresses issued by the CPU. It's just like any other cache in the sense that cache misses require you to access much slower memory... each time you get a TLB miss the MMU has to walk page tables in memory to find the translation and then free up a TLB to store the translation it reads from memory.

    More efficient use of TLBs means fewer interruptions while the MMU walks through page tables, and higher *effective* memory bandwidth. Same idea as huge page support on the CPU's MMU -- using 2M pages rather than 4K pages can give a 30% performance improvement in some cases just from reduced overhead walking page tables and reloading TLB entries. TLB misses are really expensive -- without a TLB each memory access on a GPU would require ~2 additional accesses for page tables, while a CPU would require 3-4 additional accesses since the page tables have more levels. Yes that's 3-4 page table accesses for every "useful" memory access.

    The obvious follow-on question is "why not make the TLBs really big ?" -- answer is that TLBs are content-addressable memories and so fairly expensive in terms of die size, so the die area required for a bigger TLB can usually be better used by adding more cache, or more cores etc...
    Last edited by bridgman; 05-02-2014 at 08:36 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •