No announcement yet.

Lets tune that unpatched, pre-3.12 ondemand governor to perform admirably!

  • Filter
  • Time
  • Show
Clear All
new posts

  • Lets tune that unpatched, pre-3.12 ondemand governor to perform admirably!

    Below is my experience how to make unpatched ondemand behave acceptably on non-Intel , before Kernel 3.12 hits the road.
    Ofc, if you are Intel, use PState.

    uname: linux 3.10-3-amd64 debian, from 3.10.11-1 (2013-09-10)
    radeon kernel: 2.33
    mesa: 9.1.7
    Debian Testing @ 24.10.13

    Hardware: Athlon II x4 630, HD5850.

    Ondemand governor - unpatched, old style.

    What to seek:
    The ideal mode, is when there is highest FPS under load (glxgears), yet everything is completely idle with normal desktop.

    Having sampling down factor set to default or small value is bad - because less CPU switches mean more power towards productive load.
    However, too much sampling down factor is equally bad - the cores will run with actual load long done.

    Also, if upthreshold is set loo low, then the CPU will spend most of the time boosting background tasks.
    If its too high, then CPU will skip minor, yet important tasks, causing lags in areas such as scrolling windows, playing videos or light 3D loads.

    Less powerful or powerful, but very energy efficient CPU will hit lower barrier more often, hence cause more stuttering in easy tasks.
    CPU that is powerful enough, but not efficient at power-saving, will hit lower barrier less often, causing lags everywhere, except on high load.
    Intel CPU with pstate governor can ignore all this altogether - pstate acts upon actual CPU load and not upon polling, so its always efficient.

    This means - parameters with untuned ondemand differ from CPU to CPU. Patched ondemand in upcoming 3.12 should not have this behavior anymore - but I have not been able to test that, yet.

    Glxgears is not a benchmark
    Its not. Its a light 3D load.
    While comparing glxgears results driver vs driver, platform vs platform is meaningless;
    comparing glxgears results against same platform does act as a CPU thoughput benchmark.

    Set up
    several terminal windows.
    one of them, running "vblank_mode=0 glxgears" as needed;
    another, tasting CPU state via "watch -n 1 "grep MHz /proc/cpuinfo""
    and the last one for entering power commands:
    Sampling down factor is set by: echo _value_ > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
    Up Threshold is set by: echo _value_ > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold

    Results are immediately seen.
    Idle desktop is - those terminals, running nothing and moving mouse around a lot.

    Observed CPU graph dynamic behavior
    CPU stays low, all the time
    Most CPU cores stay low all the time.
    One core may spike high very shortly, about once in 5 seconds.
    For "CPU,load" field, its allowed to spike more often.
    Cores jump all over the place, various frequencies.
    At least one core is in high mode, nearly constantly.
    Cores stay mostly in high state. 1-2 cores may drop to low shortly.
    CPU cores stay high, all the time
    CPU cores will jump into various states, unpredictable.
    Notable, inability to take middle states - its ether "all or nothing".

    So, basically, I am seeking the settings, where it will staticL on CPU idle, dynH on load - and at same time giving maximum glxgears fps.

    Initial tests
    Goal: Take a sample of different default behavoir and test sampling down role
    Case:                 (1)      (2)     (3)     (perf)
    sampling down (delay): 1*      10      100     -NA-
    upthreshold (barrier): 95*     95*     95*     -NA-
    CPU, load:             staticL staticL chaotic staticH
    CPU, idle:             staticL dynL    chaotic staticH
    GLX fps(avg), x1000:   1,75**  1,8     2,6-5   6,3**
    * defaults
    untuned ondemand,default values
    **this are default lowest (ondemand, untuned) and default highest (performance) values.
    They represent a value, because they act as a base value - lowest and topmost, accordingly.
    When comparing driver releases, compare lowest/topmost as percentage within same driver.
    Thus, it is possible to compare how power management actually affects them, omitting the driver advancements altogether.

    The Case (3) proves that overdoing sampling down is no good:
    Cpu fluctutates fairly often. The load switches to full power for 1 second on all cores.
    Then all cores go down to low power. Then only one core is up, rest is down.
    FPS fluctuate correspodingly - chaotic. Stuttering everywhere.

    CPU barrier test
    Goal: find ideal CPU barrier load.
    Case:                 (4)      (5)     (6)     (7)     (8)     (9)     (10)    (11)    ((12))  (13)
    sampling down (delay): 1*      1*      1*      1*      1*      1*      1*      1*      1*      1*
    upthreshold (barrier): 90*     80*     70*     60*     50*     40*     30*     20*     19*     15*
    CPU, load:             staticL dynL    dynM    dynM    dynM    dynM    dynH    staticH staticH staticH
    CPU, idle:             staticL staticL staticL staticL staticL staticL staticL dynL    dynL    dynM
    GLX fps(avg), x1000:   1,75    2,1-2,3 3,8     4,4     5,0     5,5     5,9     6,2***  6,3 !   6,3***
    *** we are hitting the top performance headroom here (compare to performance)

    Following 4-13 use cases, I created one precision case - 12, and renamed to ex-12 to 13.
    With case 11 we are already nearly hitting the headroom, so it makes sense to find a bit more "saturating" setting, that still does not case high CPU stress in idle. Upthreshold of "19" seem to present optimal value - top performance with very acceptable idle. Past 19, 18-- give picture similar to 15.

    Finetuning via sampling down
    What I dislike about 19 upthershold, is that it still gives too much strain in idle. Moving one window does not have to cause CPU perform with 2 cores going top and one core half, with only one resting. Its too much.

    Still, what happens if 30 upthreshold, which is very NEAR to ideal and has (still) more stable idle, is boosted by increasing the sampling down delay? This should efficiently prevent too fast switch and allow more data to be sent.

    Goal: make 30% behave like 19%
    Case:                  ((10))   (5)     (6)    (7)       (8)
    sampling down (delay): 1*       5*      10*    7*      [B]6*[/B]
    upthreshold (barrier): 30*      30*     30*    30*     [B]30*[/B]
    CPU, load:             dynH     dynH    dynH   dynH    [B]dynH[/B]
    CPU, idle:             staticL  staticL dynL** staticL [B]staticL[/B]
    GLX fps(avg), x1000:   5,9      6,2     6,3    6,3     [B]6,25[/B]
    ** this dynL has more and more chaos, the CPU core will more often lock straight into high state and stay there more time. Sometimes, additional core might power up with previous still in high state, resulting in two cores running high although for very short time. Not kind of idle I'd like.... although FPS was superb.

    From case 6, I started to finegrain the sampling down. Case 7 was good, but a bit too chaotic on idle.
    It appears case (8) is most efficient combination of performance AND idle.

    So, for my CPU, the magic values are - sampling down:6, upthreshold: 30

    So,.... whats *optimum* for your system, till 3.12 arrives ?

  • #2

    Small update upcoming,
    I have compared actual energy drain of my system under various conditions and come to a bit better settings,
    additionally, I will post how the pre 3.12 ( radeon/open radeon + amd cpu/cpufreq ) energy drain differs from windows xp (catalyst + powernow).

    Stay tuned.


    • #3
      Apparently, ondemand governor has been slowing games under WINE for 40-70% since 2009 already.

      Watch here, how it slowed starcraft II, and that ain't glxgears!


      • #4
        Okay, as said before, here is the updated test, combined with power usage:

        This thread is not actual anymore!


        • #5
          It would be really nice if someone could do a pstate versus ondemand power savings. In absolute load tests, both seem evenly matched even though my Ivybridge and Haskell i7s behave better with pstate under varying load. Also with Thermal daemon, I don't get BIOS induced throttling at high load/high temp scenarios.