Announcement

**sdack** · 01 June 2013, 05:10 PM

Benchmarks!

I have run some benchmarks to see how big the differences are, and they are really really small.

The top two pictures show the numbers for a Linux kernel compilation with 4 jobs on an AMD Phenom I 9850 (quad core), 3 times each. The numbers show real time spend and user time spend as they are reported by time(1). The differences are tiny and the variation in the results is strong. The difference is somewhere between 1 and 3 seconds, while it takes a whole 3 1/2 minutes for the entire compilation!

The bottom two pictures show the numbers for the Unigine OpenGL benchmark. On all settings did it report exactly the same average frame rate of 73.7 FP/s. The only differences are in the minimum and maximum framerate as well as in the final score. The score itself differs by only 3 points. The graphics card is a GTX260 with 308.88 driver and was set to a fixed clock rate (performance).

The two numbers for the Ondemand governor are the threshold values (95% default, 50%, 25%) and the sampling down factors, which for my CPU are 18ms or 55 samples/s (factor 1x default) and 180ms or 5.5 samples/s (factor 10x).

OS is Debian Wheezy with a kernel 3.9.3 and Mesa 8.0.5.

**sdack** · 01 June 2013, 08:18 PM

More

Some more benchmarks. Here one can see how awesome glxgears truly is. It beats the frame rates of Lightsmark by a factor of 30x!

**brosis** · 02 June 2013, 09:10 AM

Originally posted by sdack View Post

You have to be the first person ever to have a problem with glxgears.

A lot of developers stated that glxgears is not a GL benchmark, but is "graphics driver CPU overhead benchmark". And you have proven it all by yourself:

Originally posted by sdack View Post

Some more benchmarks. Here one can see how awesome glxgears truly is. It beats the frame rates of Lightsmark by a factor of 30x!

Just look how glxgears begins to anomalously starve on cycles when CPU is powerswitching on 25% load trigger.
My theory is that with such low barrier, the raw switching process steals CPU cycles from glxgears. This is because the load routines of glxgears are light and thus result in a lot of spikes. For example, "conservative" governor should perform a lot better here.

On the contrary, lightsmark IS the GL benchmark and demonstrates how graphics pipeline is starving at 95%. So this counters your own claim that ondemand plays no role.

The current state of powermanagement in Linux kernel is awkward. For example, the policy naming ondemand and conservative do not reflect their actual nature.
Naming them "static" and "dynamic" would do more sense.
"Static" mode could have "frequency" variable settable at: "always high", "always low" and uint(custom).
"Dynamic" mode could have "behavior" and "threshold" variables; "behavior" settable at "sharp" and "smooth"; and "threshold" set at "high"(95%), "medium" (66%), "low" (33%) and uint (custom).

Feel free to suggest this at LKML.

I have two questions for you: what benchmark software are you using here; what distribution and what kernel/modifications? and Could you redo tests with governor "conservative" instead ?
I would like to retest your configuration but with 5850 on opensource drivers instead...

**duby229** · 02 June 2013, 03:44 PM

Originally posted by ChrisXY View Post

Does randr provideroffloadsink not work better yet?

I'm not sure what that means exactly.

4200 integrated + 6850 discreet using xrandr? - Phoronix Forums

http://phoronix.com/forums/showthread.php?78262-4200-integrated-6850-discreet-using-xrandr

Technical support and discussion of the open-source AMD Radeon graphics drivers.

This is the thread where I learned how to set it up. Since then I moved the 1280x1024 screen and the 1680x1050 screen to the 6850 and plugged in a sony 32" 720p lcd to the 4200 using an hdmi cable. Sound is plugged into a sony receiver using optical spdif which itself is plugged into the tv using optical spdif.

**duby229** · 02 June 2013, 03:52 PM

Originally posted by sdack View Post

I have run some benchmarks to see how big the differences are, and they are really really small.

The top two pictures show the numbers for a Linux kernel compilation with 4 jobs on an AMD Phenom I 9850 (quad core), 3 times each. The numbers show real time spend and user time spend as they are reported by time(1). The differences are tiny and the variation in the results is strong. The difference is somewhere between 1 and 3 seconds, while it takes a whole 3 1/2 minutes for the entire compilation!

The bottom two pictures show the numbers for the Unigine OpenGL benchmark. On all settings did it report exactly the same average frame rate of 73.7 FP/s. The only differences are in the minimum and maximum framerate as well as in the final score. The score itself differs by only 3 points. The graphics card is a GTX260 with 308.88 driver and was set to a fixed clock rate (performance).

The two numbers for the Ondemand governor are the threshold values (95% default, 50%, 25%) and the sampling down factors, which for my CPU are 18ms or 55 samples/s (factor 1x default) and 180ms or 5.5 samples/s (factor 10x).

OS is Debian Wheezy with a kernel 3.9.3 and Mesa 8.0.5.

Thanks for using proper benchmarks. Using proper benchmarks to argue that glxgears is a benchmark too is kinda silly tho.

**ChrisXY** · 02 June 2013, 04:21 PM

Originally posted by duby229 View Post

I'm not sure what that means exactly.

If you have randr and xrandr 1.4 (xorg 1.13 I think) or later, in xrandr --help you should have:
--listproviders
--setprovideroutputsource <prov-xid> <source-xid>
--setprovideroffloadsink <prov-xid> <sink-xid>

xrandr --list-providers should show two gpus. For me:
Provider 0: id: 0x70 cap: 0xb, Source Output, Sink Output, Sink Offload crtcs: 3 outputs: 8 associated providers: 0 name:Intel
Provider 1: id: 0x45 cap: 0xd, Source Output, Source Offload, Sink Offload crtcs: 6 outputs: 0 associated providers: 0 name:radeon

With either xrandr --setprovideroutputsource 1 0 or xrandr --setprovideroutputsource 0 1 you can tell randr to use the outputs of one gpu to be "added" to the outputs of the other.

**sdack** · 02 June 2013, 04:57 PM

Originally posted by brosis View Post

A lot of developers stated that glxgears is not a GL benchmark, but is "graphics driver CPU overhead benchmark". And you have proven it all by yourself:

Just look how glxgears begins to anomalously starve on cycles when CPU is powerswitching on 25% load trigger.
My theory is that with such low barrier, the raw switching process steals CPU cycles from glxgears. This is because the load routines of glxgears are light and thus result in a lot of spikes. For example, "conservative" governor should perform a lot better here.

On the contrary, lightsmark IS the GL benchmark and demonstrates how graphics pipeline is starving at 95%. So this counters your own claim that ondemand plays no role.

The current state of powermanagement in Linux kernel is awkward. For example, the policy naming ondemand and conservative do not reflect their actual nature.
Naming them "static" and "dynamic" would do more sense.
"Static" mode could have "frequency" variable settable at: "always high", "always low" and uint(custom).
"Dynamic" mode could have "behavior" and "threshold" variables; "behavior" settable at "sharp" and "smooth"; and "threshold" set at "high"(95%), "medium" (66%), "low" (33%) and uint (custom).

Feel free to suggest this at LKML.

I have two questions for you: what benchmark software are you using here; what distribution and what kernel/modifications? and Could you redo tests with governor "conservative" instead ?
I would like to retest your configuration but with 5850 on opensource drivers instead...

glxgears is not influenced by dark magic. glxgears reacts to fine influences and some developers do not want to deal with these, because it gets too much work for what you get in return. It does not make glxgears a bad a benchmark. At best does it make developers picky, but neither is bad. I have placed lightsmark next to glxgears and posted them seperately, because they share the same fate.

Where do you see lightsmark in the future?! The problem for Linux is not the 3D hardware. The hardware is well, but the problems are found with the drivers and libraries. Sadly, lightsmark with its few polygons is unable to measure this. Ironically are you defending it as a tool for measuring CPU-GPU starvation when really it gives very little to the GPU in the first place, but only feeds it the same over and over again. It is a bit like the definition of insanity, and so you get an insane framerate of 600+ fps, just like glxgears gives one 18,000 fps now. I assume this is also the root of the problem observed by the OP - the CPU has got too little to do and stays throttled. I do not think it is the governors, but rather is the benchmark getting old.

The 25% measurement of glxgears is not more than a derivation in the results. I was not actually being serious when I posted its values here. If I wanted to be serious about glxgears then I'd have to run more samples and cut off the minima and maxima before averaging, and I did not do this for any of the benchmarks as it gets too much work.

Also, you only seem to have a problem with the naming of the power-saving governors, but not with the performance governor... Why does it not bother you that the performance governor does not actually give you performance as its name suggests? It really is a "no governor" of a governor, a placebo, a NOP, a don't-do-anything-at-all-and-waste-kernel-space of a governor. So why only your beef with the power-saving governors?!

Oh, and fell free to run your own tests.

**sdack** · 03 June 2013, 04:02 AM

Originally posted by Can2fieldSD

What its developers forgot is to put large "Glxgears is NOT a benchmark" as background...!

Oh sweet fail. The first thing I have noticed after quoting your comment are a bunch of images within your comment that did not get displayed. It appears you forgot to host your images properly!

One can use ping to measure the speed of light and yet is it not a benchmark or a tool designated for such measurements. It is what one makes of it! It is the same with glxgears, x11perf or dhrystone. Just pressing a button and watching numbers run across a screen does not make you an expert in benchmarking. It makes you an expert in watching benchmarks.

**brosis** · 03 June 2013, 09:24 AM

Originally posted by sdack View Post

glxgears is not influenced by dark magic. glxgears reacts to fine influences and some developers do not want to deal with these, because it gets too much work for what you get in return. It does not make glxgears a bad a benchmark. At best does it make developers picky, but neither is bad. I have placed lightsmark next to glxgears and posted them seperately, because they share the same fate.
Where do you see lightsmark in the future?! The problem for Linux is not the 3D hardware. The hardware is well, but the problems are found with the drivers and libraries. Sadly, lightsmark with its few polygons is unable to measure this. Ironically are you defending it as a tool for measuring CPU-GPU starvation when really it gives very little to the GPU in the first place, but only feeds it the same over and over again. It is a bit like the definition of insanity, and so you get an insane framerate of 600+ fps, just like glxgears gives one 18,000 fps now. I assume this is also the root of the problem observed by the OP - the CPU has got too little to do and stays throttled. I do not think it is the governors, but rather is the benchmark getting old.

1) I didn't claim that glxgears is a bad benchmark. I claimed its a useless benchmark, in a manner that its detached from actual workload these chips are made for. I have two AMD cards - HD5850(AthlonII) and mobility X1900(T7200). Under same resolutions X1900 produced more frames in glxgears than 5850; and it produced more frames if composite was ON. If you want to optimize driver CPU overhead, then feel free to optimize it instead of attacking developers that nether you, nor I pay regularly.

2) What exact fate are you talking about? Even 3DMark01 didn't loose its actuality if one tests DX7 level. Beside Lightsmark that has always returned very correct results for whole variety of cards since whole opensource thing was started, and including nvidia driver; we have Unigine (that has unpatched flaws in itself, as Vadim Grilin stated and as it was confirmed by tests), we have Xonotic, we have OpenArena 0.8.8, we have a bunch of other games; and lastly we have many commercial closed native games and wine-platinumstatus games that have built-in FPS indicator(although wine WILL be slowed down due to DX to GLSL shader translations on CPU).

3) You posted very interesting benchmark data, but you make very strange assumptions. It was clear to me instantly that Lightsmark was indeed starving at 95% barrier, where glxgears was just fine running in clocked down CPU, yet started to go crazy once barrier was so low that constant power switching started to interfere with glxgears process.

4) The root of the problem is that we need better named power management policies; as well we need application profiling software - that is able to set corresponding modes based upon executable name instead of wild guessing along the line.

Originally posted by sdack View Post

The 25% measurement of glxgears is not more than a derivation in the results. I was not actually being serious when I posted its values here. If I wanted to be serious about glxgears then I'd have to run more samples and cut off the minima and maxima before averaging, and I did not do this for any of the benchmarks as it gets too much work.

Also, you only seem to have a problem with the naming of the power-saving governors, but not with the performance governor... Why does it not bother you that the performance governor does not actually give you performance as its name suggests? It really is a "no governor" of a governor, a placebo, a NOP, a don't-do-anything-at-all-and-waste-kernel-space of a governor. So why only your beef with the power-saving governors?!

Oh, and fell free to run your own tests.

5) When performance is dropping to 10% you don't call it deviation in results, and supplying whole data was meaningful from your side - averaging brings us nowhere. Its same thing as with frame latency, one gets 120fps constantly, but every 500msec there is framedrop to 1fps. The averaging 100fps bring us nowhere, because they are not useful. One can think about this as averaging results equals blurring of bugs.

6) performance governor would not pass to my naming scheme, but its fine by itself. Performance means performance, and it does not play any role how it achieves that in that sense. Powersave is also very matching. Both have definite meaning that they achieve maximum performance and maximum powersave, which is valid. For example, on opensource radeon driver that would correspond to power profiles high and low, which also matches. Its a statical profile.
But ondemand does not match, because it does not start on demand, rather than when CPU load reaches 95%; it also reacts very sharply that is not reflected by name. I doubt ANY one word caption can cover its behavior. Conservative is too very inaccurate. Conservative what? Does it conserve power? Firstly it creates naming conflict with ondemand - no one can intuitively say how they differ from name, without looking into manual. Secondly, both of them only refer to how fast the powerswitching occurs, but they omit the very important barrier value, resulting in power inefficiency on standard desktop.

7) Yes, I will do my tests, I will take my time, perform them thoroughly, seriously, because I care and not for pointless trolling. When I said "feel free to report that to LKML" I had no irony - I don't have time just of yet, and posting this on LKML would mean "patches welcome" and I am no expert to patch this and maintain. But if I could, I would do.

8) And I recommend you to stop talking to copy-paste spambots please ( Can2fieldSD )...

9) Also, I would be extremely thankful to you if you name the software you used for benchmarking and creating such graphs. I can imagine the backend is maybe PTS, but how did you draw the graphs ?,because min/avg/max spit in one graph is very interesting.

**vadimg** · 03 June 2013, 11:02 AM

Whatever you think of glxgears, it's pretty good at demonstrating the issue with ondemand governor and OpenGL applications. Though it also depends on the drivers, CPU/GPU performance balance, etc. E.g. for me glxgears with ondemand governor results in ~1300 fps, with performance ~8000fps, more than 6x difference (with r600g and high power profile for gpu in both cases).

My understanding of the issue is that even though glxgears is a single-threaded app and one might expect 100% load on a single core, that thread has to wait while kernel driver and GPU process the command streams (it's offloaded to separate thread in r600g). If for example kernel + gpu processing takes about 15% of time, main thread will spend these 15% waiting, and the load of main thread will never be higher than 100 - 15 = 85%, thus it will never trigger frequency increase with the ondemand governor because default threshold is 95%. Also I think that these waits encourage the scheduler to move the waiting thread between cpus, distributing the load and reducing average load per cpu even more. In the end typical distribution of load per cpu with glxgears for me looks like the following:

Code:

# sudo cpupower monitor sh -c "vblank_mode=0 glxgears"

    |Mperf               
CPU | C0   | Cx   | Freq 
   0| 25.87| 74.13|   824
   1| 19.12| 80.88|   805
   4| 10.21| 89.79|   805
   3| 48.68| 51.32|   805
   5| 52.97| 47.03|   815
   2|  3.14| 96.86|   805

That is, the load per cpu is mostly below 50%, frequency never goes up from minimal 800MHz. Ondemand governor only sees these low loads per core, it doesn't understand that in fact we have single-threaded app that could greatly benefit from higher frequencies.

Also I think the arguments that ondemand is intended for powersaving are not very correct, I think the described behavior is not what the most users expect. Users expect that the frequency will be raised if the application needs it, so that they'll always have the best performance with performance-sensitive apps. And in windows cpu governor works more like this expectation, at least with 3d apps, though I suspect they simply detect 3d activity and raise the frequency in such cases. Anyway, even in windows forcing the cpus to max freq also helped me sometimes with 3d apps performance, though usually it doesn't provide noticeable benefits.

Announcement

Ondemand governor dramatically slows down mesa perfomance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment