Announcement

Collapse
No announcement yet.

Tuning pre-3.12 ondemand governor - version 2, now with energy efficiency

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tuning pre-3.12 ondemand governor - version 2, now with energy efficiency

    You will find outdated values to this test here.
    You can see, how ondemand has slowed down Linux gaming for decades here.

    This guide should not be (at least directly) applicable to 3.12, as 3.12 should get updated ondemand.
    Still, one must retest it, until then - can't say anything.

    Settings:
    uname: linux 3.10-3-amd64 debian, from 3.10.11-1 (2013-09-10)
    radeon kernel: 2.33
    mesa: 9.1.7
    Debian Testing @ 24.10.13
    Hardware: Athlon II x4 630, HD5850 (r600g class)

    Point 1:
    Questions

    Q: Is it possible with opensource radeon drivers under Linux reach same energy efficiency as in windows?
    A: Yes it is, it has been measured, vs xp.
    It should not affect Intel CPUs with much more efficient PState driver.
    And it should not affect 3.12+ Kernels anymore.

    Q: Glxgears is not a benchmark!
    A: Its not. Its nothing, but a light 3D load, capable to show CPU overhead/throughput of specific driver.
    However, while comparing results between different drivers and systems is meaningless,
    comparing results withing scope of same driver and on same system shows efficiency of CPU throughput.
    Under this conditions it may very well act as a benchmark. More fps - more efficiently does CPU contribute to performance.

    Q: Whats the meaning of having light load optimized, outside of glxgears?
    A: Scrolling, window switching, video playback, 2D operations, UI response time, light games including 2D.
    Also note - performance is not everything. While having good performance is nice, its not so nice to have CPU sit and burn energy for really minor tasks. Analogious to delivering pizza in a tank, its also important to watch idle power consumption and idle clocks.

    Q: How to find ideal parameters
    A:
    Open several terminal windows.
    In one of them, run "vblank_mode=0 glxgears";
    In another, issue CPU state watch via "watch -n 1 "grep MHz /proc/cpuinfo""

    As of governor, there are two values - up threshold and sampling down factor.
    First defines which load should force CPU to come out of sleep,
    second - how much should it stays awake.
    They can be manipulated in following way:
    Sampling down factor is set by:
    echo _value_ > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
    Read value by:
    cat /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

    Up Threshold is set by:
    echo _value_ > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
    Read value by:
    cat /sys/devices/system/cpu/cpufreq/ondemand/up_threshold

    These values can be set on every boot by means of init system, writing to /etc/sysfs.conf(sysfsutil), or anything similar.

    To simulate idle desktop test I had only those terminals running, while moving a mouse a lot.
    You also need power usage tester, available for around 10$.

    Q: What states and factors are to be considered?
    A: Two states: idle and load. Factors are: CPU throughput (higher - better), energy consumption (lower - better), CPU frequency.
    For idle, CPU throughput is irrelevant, CPU frequency should mostly stay constant and low and consumption must be low.
    For performance task, CPU throughput must be maximum, all other factors play little role.

    Q: What CPU behavioral schemes were observed?
    A: At testing time, I encountered different CPU behavior dynamics.
    These are - static ones, with CPU keeping consistent state; and dynamic one - with switching CPU. Finally - chaotic, this state is simply constant switching and not good as there is latency as result of constant switching.
    Obviously for idle states, the static low is optimal, while for 3D there should be high fps. CPU behavior plays no role, except it shouldn't be chaotic.

    The CPU behavior legend:
    staticL - CPU cores stay low, all time.
    staticH - CPU cores stay high, all time.
    dynL - most CPU cores stay low, one-two cores may spike shortly.
    dynM - cores maintain different frequencies, at least one core is in top position.
    dynH - most CPU cores stay high, one-two cores may drop shortly.
    chaotic - CPU cores will jump from lowest state into highest, without pattern. Notable inability to take middle states.

    Point 2:
    The analysis.

    Code:
    windows xp sp3 x32, catalyst; amd cpu driver installed
    Case:			1				2
    Mode:			desktop(eq. CPU drv off)	portable(eq. CPU drv on)
    State:			idling				idling
    Watts, on idle:		~149				~132

    Linux stock measurement table with default values
    kernel 3.10-3-amd64 debian testing, mesa 9.1.7 (open radeon)

    Code:
    Case:			1	2		3		4		5
    Governor:		*boot*	ondemand,def.	ondemand,def.	performance	performance
    Sampling down(delay):	NA	1		1		NA		NA
    Upthreshold(barrier):	NA	95%		95%		NA		NA
    CPU behavior, idle:	NA	staticL		staticL		staticH		staticH
    CPU behavior, load:	NA	staticL		staticL		staticH		staticH
    GPU profile:		NA	high		low		high		low
    Watts, on idle:		~200	~170		~135		~188		~151
    GLX fps(avg),x1000:	NA	1,75		1,73		6,3		4,45
    Notice similarity in watt usage between w(1) and L(5); w(2) and L(3).
    This shows, that both stacks are actually pretty close to idle usage, whether CPU driver was on or off.

    Notice, that with default ondemand(pre-3.12), regardless if GPU is in high or low profile, the CPU sleeps.


    Linux effect of decreasing upthreshold (barrier) on CPU throughput and idle usage.
    Code:
    Case:			1	2	3	4	5	6	7	8	9
    Governor:		ondemnd	ondemnd	ondemnd	ondemnd	ondemnd	ondemnd	ondemnd	ondemnd	ondemnd
    Sampling down(delay):	1	1	1	1	1	1	1	1	1
    Upthreshold(barrier):	90	80	70	60	50	40	30	20	15
    CPU behavior, idle:	staticL	staticL	staticL	staticL	staticL	staticL	staticL	dynL	dynM
    CPU behavior, load:	staticL	dynL	dynM	dynM	dynM	dynM	dynH	staticH	staticH
    GPU profile:		high	high	high	high	high	high	high	high	high
    Watts, on idle:							~170	~184	~188
    GLX fps(avg),x1000:	1,75	2,2	3,8	4,4	5	5,5	5,9	6,2	6,3
    The performance up to Upbarrier 30 was adequate, afterwards, the CPU behavior became to loaded on idle.
    The idle power usage was okay till 40, after which it moved close to bare performance.

    Lets test difference with sampling down factor.

    Code:
    Case:			1	2	3	4	5	6	7	8
    Governor:		ondemnd	ondemnd	ondemnd	ondemnd	ondemnd	ondemnd	ondemnd	ondemnd
    Sampling down(delay):	1	5	10	7	1	6	6	5
    Upthreshold(barrier):	30	30	30	30	40	40	35	35
    CPU behavior, idle:	staticL	staticL	dynL	staticL	staticL	staticL	staticL	staticL
    CPU behavior, load:	dynH	dynH	dynH	dynH	dynH	dynH	dynH	dynH
    GPU profile:		high	high	high	high	high	high	high	high
    Watts, on idle:		~184	~180	~188	~182	~170	~171	~170	~170
    GLX fps(avg),x1000:	5,9	6,2	6,3	6,3	5,5	6,2	6,3	6,2
    case 2 showed that increasing delay helps boost performance, at same time lowering usage due to less CPU state switching.
    case 3 delivered good performance, but so increased the power usage, as CPU started to stay longer in high performance.

    On previous testing, I stayed with case 3 for production - until I measured actual idle usage, that is. It turns out, CPU behaviour and GLX performance are not everything; one also needs to measure actual power usage.

    lowering the delay to 7, in case 4, improved the energy efficiency and CPU idle state without affecting performance.
    because case 4 still has high usage, I decided to experiment with barrier more.

    Indeed, having barrier at 40 with 6 delay already showed similar performance to case 4, but with 12 watts less usage.
    Lowering the barrier just a bit, case 7, allowed to get both ideal CPU idle usage and performance under load.


    After tests, I booted into 3.11 kernel with radeon.dpm=1 switch. The dmesg responded with:
    Code:
    $ dmesg|grep -i  "radeon"|grep -i "initialized"
    [drm] radeon: irq initialized.
    [drm] radeon: dpm initialized.
    Meaning that radeon is running in DynPM mode.

    With dynamic reclocking enabled, combined with tuned ondemand - this is how Linux+Radeon(r600g) stays compared to windows xp with catalyst:
    Code:
    Case:			w1				w2				l1		l2
    Mode:			desktop(eq. CPU drv off)	portable(eq. CPU drv on)	performance	ondemand (6:35)
    State:			idling				idling				idling		idling
    Watts, on idle:		~149				~132				~150		~135
    As one can see, in power usage it is now quite similar. But there is extra - the performance on Linux is much more smooth and fast.
    Last edited by brosis; 03 November 2013, 08:42 AM. Reason: typo

  • #2
    Why's this not a guest article with graphs? Same audience as posting it here in the fora

    Comment

    Working...
    X