Announcement

**anth** · 22 September 2021, 04:59 PM

It'd be useful to include the power supply and cooling in the hardware specs when benchmarking this sort of thing.

It'd also be interesting to see tests like this comparing the same CPU etc but with different cooling to see the effect of different software settings when power use is the limiting factor to performance.

**Teggs** · 22 September 2021, 06:05 PM

Looks like it will be a bit before this is ready, but at least Valve helped get the ball rolling. I suppose it is a good sign for both performance governors that they agree with each other. Since laptops and such are involved, I wonder if it is a goal to make the Powersave governor not suck as well? Or is that outside the bounds of this project?

**yump** · 22 September 2021, 06:20 PM

We should keep in mind that the purpose of a CPU governor is to save energy. The point of comparison should not be the performance governor running flat-out at maximum frequency. It should be the performance governor running fixed-frequency at a speed that uses the same power as the governor under test. Or going the other way, run a game at locked 60 FPS, and compare how much energy the governors use. When I tune the CPU frequency scaling on my machine, I loop playback of a youtube video on a difficult-to-decode section, figure out how low I can go with fixed-frequency without dropping frames, and then try to do better with the governor.

That said, these results should not reflect poorly on amd_pstate, which is just hooking up actual control of the CPU frequency to the cpufreq governor. They should reflect poorly on the governors. Where acpi_cpufreq performs better, it's just a matter of luck that giving the governors 3 discrete frequencies to work with, instead of the full range, tricks them into making slightly less stupid decisions.

The thing is, all of the governors are bad.

The old governors, ondemand and conservative, have no idea what to do with single-thread-bound applications on multi-core machines, which bounce around between cores behind the governor's back. You can sort-of get conservative to behave nicely if you set up_threshold and down_threshold to like, 23% and 22% on a 4-core system, but this obviously does not scale to big systems with 8+ cores and SMT. A controller trying to keep the CPU load below 6% won't save much energy.

The newfangled schedutil assumes that the timescale relevant to making DVFS decisions is the same as the timescale for making scheduling decisions, and that 80% CPU utilization is good for all use cases (consider that for an application that chooses not to pipeline GPU commands to minimize latency, that gives the GPU 20% of the frame to do it's work...). IMO the raison d'etre of schedutil is, "don't step on the scheduler's toes".

An ideal governor would,

Know the latency requirements of each process. (Probably needs hairy heuristics and/or per-application hints from userspace; everybody hated that.)
Know whether any thread is likely to miss its latency requirement. (Need to solve the halting problem, or estimate based on history & heuristics).
Be able to follow the dependency chain across IPC boundaries, to catch cases where the actual response latency is the result of several threads working in sequence. Or maybe classify "waiting on something that is not I/O" as CPU utilization for DVFS purposes. Perhaps some of Brendan Gregg's off-CPU profiling stuff could be re-purposed for that.
Use some strategy based in Actual Control Theory to bring number 2 in line with number 1.

It turns out that CPU frequency scaling is an extremely difficult problem.

**arQon** · 22 September 2021, 09:15 PM

Originally posted by S.Pam View Post

What's up with schedutil anyway? Seems to be worse than ondemand for both performance and power consumption?!

schedutil has consistently been much worse than any other governor, since day one. It may work someday, and it may not: I think it's anybody's guess, but it's still an utter failure after many years, so whoever's looking after it may well get tired of it and just abandon it as a lesson learned. Even if so though, I expect nothing of value will be lost: it's currently terrible, and if it was ever going to be worth using that should have happened by now in at least ONE case, and it hasn't.

I have no idea if schedutil is *conceptually* sound or not, and it may well be so - but on a practical level, it's a disaster.
amdpstate is barely at alpha quality, and STILL does a better job than schedutil most of the time, which I think tells you everything you need to know about schedutil: namely that you should simply ignore it completely for at least now. If it ever stops being completely useless that would be such a notable improvement that Michael would undoubtedly let us know.

**yump** · 22 September 2021, 11:03 PM

Originally posted by arQon View Post

amdpstate is barely at alpha quality, and STILL does a better job than schedutil most of the time,

You have a misunderstanding. The cpu frequency scaling system is split into a governor layer and a driver layer. The governor layer chooses a performance level for each CPU core, and the driver layer is responsible for figuring out what performance levels are available and conveying the governor's choice to the hardware. The governors are performance, powersave, ondemand, conservative, and schedutil. The (x86) drivers are acpi_cpufreq, intel_pstate, and (soon) amd_pstate. So (as you can see in article we are commenting on) any governor can be paired with either driver.

intel_pstate is kind of weird because for a long time (and still when configured to "active" mode, which is no longer the default) it overrode the governor layer with its own implementation of something very much like "ondemand".

**Linuxxx** · 23 September 2021, 03:30 AM

Originally posted by yump View Post

It turns out that CPU frequency scaling is an extremely difficult problem.

That's why it will be interesting to see how "schedutil" will behave on the Steam Deck, since Valve will most likely stick to it by default.

And simply using the 'performance' governor on an APU is not really feasible for gaming workloads, because it will starve the on-board GPU's power budget.

**niner** · 23 September 2021, 04:44 AM

Originally posted by arQon View Post

schedutil has consistently been much worse than any other governor, since day one. It may work someday, and it may not: I think it's anybody's guess, but it's still an utter failure after many years, so whoever's looking after it may well get tired of it and just abandon it as a lesson learned. Even if so though, I expect nothing of value will be lost: it's currently terrible, and if it was ever going to be worth using that should have happened by now in at least ONE case, and it hasn't.

Actually schedutil was the solution on our servers. It cut latency in half without measurably increasing power usage. Before schedutil we could choose between long request times and a large power bill. So maybe you simply haven't looked at the right scenario when judging its usefulness? Because honestly, when running a single application continuously at full power, you don't need that smart a governor. For that scenario, performance does everything you want. The hard part is getting medium or low utilization right.

**intelfx** · 23 September 2021, 04:52 AM

Originally posted by yump View Post

intel_pstate is kind of weird because for a long time (and still when configured to "active" mode, which is no longer the default) it overrode the governor layer with its own implementation of something very much like "ondemand".

Unfortunately I have to submit a correction to your correction

Originally posted by https://github.com/torvalds/linux/blob/v5.14/Documentation/admin-guide/pm/intel_pstate.rst#active-mode

This is the default operation mode of intel_pstate for processors with hardware-managed P-states (HWP) support. If it works in this mode, the scaling_driver policy attribute in sysfs for all CPUFreq policies contains the string "intel_pstate".

intel_pstate only defaults to passive mode for processors lacking HWP, which are really old families (like Haswell). This totally makes sense, because active mode means bypassing governors and deferring P-state selection to hardware and it is useless if hardware is in fact incapable of doing so. If active mode is used on a processor lacking HWP, intel_pstate uses a crude software fallback governor which is like schedutil but worse.

For processors with the HWP feature, intel_pstate defaults to active mode and I don't see that changing any time soon.

---

Incidentally, the current state of amd_pstate (no pun intended) is very similar to intel_pstate in passive mode.

**Adarion** · 23 September 2021, 08:18 AM

I really like that some AMD/ATI devs post here. They give you important hints and insights.
Highly appreciated.

**yump** · 23 September 2021, 08:27 AM

intelfx

Active+HWP is letting the CPU's embedded controller do the governing.

Active+no HWP is the only mode where intel_pstate is both the driver and the governor.

Coincidentally, my CPU is a "really old" Haswell. I would say that intel_pstate's powersave governor is like ondemand, but different. Schedutil's big thing is that threads take their contribution to load with them when they migrate between CPUs. Unfortunately, in practice its slightly smaller thing is that it uses the maximum p-state way too aggressively for bursty soft real-time workloads like gaming and video playback, and yet somehow doesn't manage to get itself out of the way for some batch jobs, such as the video encoders in this article.

Originally posted by niner View Post

Actually schedutil was the solution on our servers. It cut latency in half without measurably increasing power usage. Before schedutil we could choose between long request times and a large power bill. So maybe you simply haven't looked at the right scenario when judging its usefulness? Because honestly, when running a single application continuously at full power, you don't need that smart a governor. For that scenario, performance does everything you want. The hard part is getting medium or low utilization right.

It sounds like your workload is similar to the ones used by the developers of schedutil. Running a single application continuously at full power shouldbe an easy case for any governor, but look at schedutil's abysmal showing on the AOM AV1 video encode benchmark in this article, and similarly on intel. Clearly something is amiss.

There are characteristics that I might expect a server workload to have that would make it relatively easy to govern for. Specifically, I expect they have a negative sloping load-line: working faster causes the (non-frequency-adjusted) CPU load to decrease. Rather sharply, assuming you have a fixed volume of requests coming in. But a batch job has a flat load-line, and GUI workloads like video decoding or vsync'd games can even be locally positive-sloping. Consider the case where if the CPU is just slightly too slow, you get behind and drop a frame or miss a vblank interval, or Kerbal Space Program's behavior of varying its physics time step in response to the availability of CPU time.

The performance governor is a non-starter for any mobile device. As costly as your power bill is, delivering that power round-trip through a Li-ion battery would be many times more costly.

Announcement

An Early Look At The AMD P-State CPPC Driver Performance vs. ACPI CPUFreq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment