You will find outdated values to this test here.
You can see, how ondemand has slowed down Linux gaming for decades here.
This guide should not be (at least directly) applicable to 3.12, as 3.12 should get updated ondemand.
Still, one must retest it, until then - can't say anything.
uname: linux 3.10-3-amd64 debian, from 3.10.11-1 (2013-09-10)
radeon kernel: 2.33
Debian Testing @ 24.10.13
Hardware: Athlon II x4 630, HD5850 (r600g class)
Q: Is it possible with opensource radeon drivers under Linux reach same energy efficiency as in windows?
A: Yes it is, it has been measured, vs xp.
It should not affect Intel CPUs with much more efficient PState driver.
And it should not affect 3.12+ Kernels anymore.
Q: Glxgears is not a benchmark!
A: Its not. Its nothing, but a light 3D load, capable to show CPU overhead/throughput of specific driver.
However, while comparing results between different drivers and systems is meaningless,
comparing results withing scope of same driver and on same system shows efficiency of CPU throughput.
Under this conditions it may very well act as a benchmark. More fps - more efficiently does CPU contribute to performance.
Q: Whats the meaning of having light load optimized, outside of glxgears?
A: Scrolling, window switching, video playback, 2D operations, UI response time, light games including 2D.
Also note - performance is not everything. While having good performance is nice, its not so nice to have CPU sit and burn energy for really minor tasks. Analogious to delivering pizza in a tank, its also important to watch idle power consumption and idle clocks.
Q: How to find ideal parameters
Open several terminal windows.
In one of them, run "vblank_mode=0 glxgears";
In another, issue CPU state watch via "watch -n 1 "grep MHz /proc/cpuinfo""
As of governor, there are two values - up threshold and sampling down factor.
First defines which load should force CPU to come out of sleep,
second - how much should it stays awake.
They can be manipulated in following way:
Sampling down factor is set by:
echo _value_ > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
Read value by:
Up Threshold is set by:
echo _value_ > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
Read value by:
These values can be set on every boot by means of init system, writing to /etc/sysfs.conf(sysfsutil), or anything similar.
To simulate idle desktop test I had only those terminals running, while moving a mouse a lot.
You also need power usage tester, available for around 10$.
Q: What states and factors are to be considered?
A: Two states: idle and load. Factors are: CPU throughput (higher - better), energy consumption (lower - better), CPU frequency.
For idle, CPU throughput is irrelevant, CPU frequency should mostly stay constant and low and consumption must be low.
For performance task, CPU throughput must be maximum, all other factors play little role.
Q: What CPU behavioral schemes were observed?
A: At testing time, I encountered different CPU behavior dynamics.
These are - static ones, with CPU keeping consistent state; and dynamic one - with switching CPU. Finally - chaotic, this state is simply constant switching and not good as there is latency as result of constant switching.
Obviously for idle states, the static low is optimal, while for 3D there should be high fps. CPU behavior plays no role, except it shouldn't be chaotic.
The CPU behavior legend:
staticL - CPU cores stay low, all time.
staticH - CPU cores stay high, all time.
dynL - most CPU cores stay low, one-two cores may spike shortly.
dynM - cores maintain different frequencies, at least one core is in top position.
dynH - most CPU cores stay high, one-two cores may drop shortly.
chaotic - CPU cores will jump from lowest state into highest, without pattern. Notable inability to take middle states.
windows xp sp3 x32, catalyst; amd cpu driver installed
Case: 1 2
Mode: desktop(eq. CPU drv off) portable(eq. CPU drv on)
State: idling idling
Watts, on idle: ~149 ~132
Linux stock measurement table with default values
kernel 3.10-3-amd64 debian testing, mesa 9.1.7 (open radeon)
Notice similarity in watt usage between w(1) and L(5); w(2) and L(3).
Case: 1 2 3 4 5
Governor: *boot* ondemand,def. ondemand,def. performance performance
Sampling down(delay): NA 1 1 NA NA
Upthreshold(barrier): NA 95% 95% NA NA
CPU behavior, idle: NA staticL staticL staticH staticH
CPU behavior, load: NA staticL staticL staticH staticH
GPU profile: NA high low high low
Watts, on idle: ~200 ~170 ~135 ~188 ~151
GLX fps(avg),x1000: NA 1,75 1,73 6,3 4,45
This shows, that both stacks are actually pretty close to idle usage, whether CPU driver was on or off.
Notice, that with default ondemand(pre-3.12), regardless if GPU is in high or low profile, the CPU sleeps.
Linux effect of decreasing upthreshold (barrier) on CPU throughput and idle usage.
The performance up to Upbarrier 30 was adequate, afterwards, the CPU behavior became to loaded on idle.
Case: 1 2 3 4 5 6 7 8 9
Governor: ondemnd ondemnd ondemnd ondemnd ondemnd ondemnd ondemnd ondemnd ondemnd
Sampling down(delay): 1 1 1 1 1 1 1 1 1
Upthreshold(barrier): 90 80 70 60 50 40 30 20 15
CPU behavior, idle: staticL staticL staticL staticL staticL staticL staticL dynL dynM
CPU behavior, load: staticL dynL dynM dynM dynM dynM dynH staticH staticH
GPU profile: high high high high high high high high high
Watts, on idle: ~170 ~184 ~188
GLX fps(avg),x1000: 1,75 2,2 3,8 4,4 5 5,5 5,9 6,2 6,3
The idle power usage was okay till 40, after which it moved close to bare performance.
Lets test difference with sampling down factor.
case 2 showed that increasing delay helps boost performance, at same time lowering usage due to less CPU state switching.
Case: 1 2 3 4 5 6 7 8
Governor: ondemnd ondemnd ondemnd ondemnd ondemnd ondemnd ondemnd ondemnd
Sampling down(delay): 1 5 10 7 1 6 6 5
Upthreshold(barrier): 30 30 30 30 40 40 35 35
CPU behavior, idle: staticL staticL dynL staticL staticL staticL staticL staticL
CPU behavior, load: dynH dynH dynH dynH dynH dynH dynH dynH
GPU profile: high high high high high high high high
Watts, on idle: ~184 ~180 ~188 ~182 ~170 ~171 ~170 ~170
GLX fps(avg),x1000: 5,9 6,2 6,3 6,3 5,5 6,2 6,3 6,2
case 3 delivered good performance, but so increased the power usage, as CPU started to stay longer in high performance.
On previous testing, I stayed with case 3 for production - until I measured actual idle usage, that is. It turns out, CPU behaviour and GLX performance are not everything; one also needs to measure actual power usage.
lowering the delay to 7, in case 4, improved the energy efficiency and CPU idle state without affecting performance.
because case 4 still has high usage, I decided to experiment with barrier more.
Indeed, having barrier at 40 with 6 delay already showed similar performance to case 4, but with 12 watts less usage.
Lowering the barrier just a bit, case 7, allowed to get both ideal CPU idle usage and performance under load.
After tests, I booted into 3.11 kernel with radeon.dpm=1 switch. The dmesg responded with:
Meaning that radeon is running in DynPM mode.
$ dmesg|grep -i "radeon"|grep -i "initialized"
[drm] radeon: irq initialized.
[drm] radeon: dpm initialized.
With dynamic reclocking enabled, combined with tuned ondemand - this is how Linux+Radeon(r600g) stays compared to windows xp with catalyst:
As one can see, in power usage it is now quite similar. But there is extra - the performance on Linux is much more smooth and fast.
Case: w1 w2 l1 l2
Mode: desktop(eq. CPU drv off) portable(eq. CPU drv on) performance ondemand (6:35)
State: idling idling idling idling
Watts, on idle: ~149 ~132 ~150 ~135