Page 2 of 4 FirstFirst 1234 LastLast
Results 11 to 20 of 37

Thread: R6xx/R7XX kernel 2.6.33 module performance hacks

  1. #11
    Join Date
    May 2009
    Location
    Richland, WA
    Posts
    134

    Default

    Quote Originally Posted by tettamanti View Post
    I'm not conviced it's the and; ignore it for a moment: with consecutive OUT_RINGs the CPU still needs to compute the next index into the ring before actually writing into it, so it's possible that a mov into the ring is stalled by the inc of the index.
    In a 2 issue superscalar cpu a mov followed by an index inc can be executed in one cycle. Another move followed by another index inc can be executed on the very next cycle. In this case no stall occurs however the inc takes up one of the available execution pipes. The result is one move per cpu cycle.
    With open-coded offsets instead the index is known at compile time and the compiler emits the movs back to back.
    In this case both pipes are filled with move instructions each cpu cycle. The result is 2 moves per cpu cycle.

  2. #12
    Join Date
    Jan 2010
    Posts
    21

    Default

    Quote Originally Posted by Obscene_CNN View Post
    I didn't check with that one. I will try it.

    I did have someone test it with ut2004 with great results
    That someone would be me

    x11perf gives me some improvemenet

    before

    Code:
    4800000 reps @   0.0012 msec (855000.0/sec): Char in 80-char aa line (Charter 10)
    4800000 reps @   0.0012 msec (848000.0/sec): Char in 80-char aa line (Charter 10)
    after
    Code:
    4800000 reps @   0.0011 msec (883000.0/sec): Char in 80-char aa line (Charter 10)
    4800000 reps @   0.0012 msec (866000.0/sec): Char in 80-char aa line (Charter 10)
    4800000 reps @   0.0011 msec (942000.0/sec): Char in 80-char aa line (Charter 10)
    4800000 reps @   0.0011 msec (906000.0/sec): Char in 80-char aa line (Charter 10)
    4800000 reps @   0.0012 msec (844000.0/sec): Char in 80-char aa line (Charter 10)
    glxgears is the same fps, but a lot smoother, w/o these 21 patches the gears would pause for a split ms.

    The most noticeable difference (for me) is in ut2004, where unplayble maps (13~17, it's now like 19~22) are unbearable And where playable maps have better fps and most important (like glxgears) not more pauses/shutter. I guess these patches improved the minimum frame rate and eliminated pauses.

    My system is an amd x2 3800 with 3850 AGP (8x) with resolution of 1600x1050.

    BTW, patches applied cleanly on 2.6.32 radeon-testing and played ut2004 about an hour, haven't had crashes nor redering bugs (AFAICS). And suspended to RAM and resumed correctly this morning.
    Last edited by xming; 01-09-2010 at 03:53 AM.

  3. #13
    Join Date
    May 2007
    Posts
    319

    Default

    Quote Originally Posted by Obscene_CNN View Post
    here are some kernel 2.6.33 module performance patches for r6xx/7xx chipsets that I wrote.

    http://pastebin.ca/1743103
    http://pastebin.ca/1743100

    It made Torcs playable on my laptop.

    The benchmark x11perf -aa10text shows more than 5% improvement on my lap top

    Please give me a before and after benchmark with the command x11perf -aa10text if you could.
    Is this under KMS? or non-KMS? the non-KMS x11perf path is really kernel heavy since we flush after every operation, under KMS with latest -ati its a lot different profile, since it batches operations.

    Dave.

  4. #14
    Join Date
    Jan 2010
    Posts
    21

    Default

    Quote Originally Posted by airlied View Post
    Is this under KMS? or non-KMS? the non-KMS x11perf path is really kernel heavy since we flush after every operation, under KMS with latest -ati its a lot different profile, since it batches operations.

    Dave.
    Mine results are both under KMS.

  5. #15
    Join Date
    May 2009
    Location
    Richland, WA
    Posts
    134

    Default

    Quote Originally Posted by airlied View Post
    Is this under KMS? or non-KMS? the non-KMS x11perf path is really kernel heavy since we flush after every operation, under KMS with latest -ati its a lot different profile, since it batches operations.

    Dave.
    My results are non kms. So I guess with xmings tests its a win either way

  6. #16
    Join Date
    Dec 2007
    Posts
    2,371

    Default

    It won't make a difference for KMS as Obscene_CNN's patches only affect the non-KMS paths.

  7. #17
    Join Date
    May 2007
    Posts
    319

    Default

    Quote Originally Posted by Obscene_CNN View Post
    My results are non kms. So I guess with xmings tests its a win either way
    non-kms is overusing those functions, we hit the ring for every single drawing operation under X, would be better trying to fix the DDX to batch like we do for KMS. I just don't care enough for non-kms to do it. You'll probably get a 30% or so increase and then these hacks will be a lot less useful.

    Dave.

  8. #18
    Join Date
    Jan 2010
    Posts
    21

    Default

    Quote Originally Posted by agd5f View Post
    It won't make a difference for KMS as Obscene_CNN's patches only affect the non-KMS paths.

    so I must be imagining things

  9. #19
    Join Date
    Dec 2008
    Posts
    988

    Default

    Quote Originally Posted by xming View Post
    so I must be imagining things
    You probably experienced something that's similar to the placebo effect

  10. #20
    Join Date
    Jan 2010
    Posts
    21

    Default

    Quote Originally Posted by monraaf View Post
    You probably experienced something that's similar to the placebo effect
    Yes both me and my computer x11perf *is* giving better results

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •