Linux 5.11 Intel P-State Schedutil Tuned For Better Efficiency. Avoid Running "Too Fast"
With this pull request for P-State is a rework to its passive-mode fast-switch path so it avoids "running some workloads too fast" and will now side with better energy efficiency in select cases while still providing sufficient power for handling the current work. There is also another change to P-State to allow the guaranteed performance value for a given CPU to be increased after boot time.
As for the P-State Schedutil work around some workloads running currently "too fast", Intel's Rafael Wysocki who also serves as the Linux power management maintainer wrote on the patch series he authored:
Using intel_pstate in the passive mode with HWP enabled, in particular under the schedutil governor, is still kind of problematic, because it has to assume that it should not allow the frequency to fall below the one requested by the governor. For this reason, it translates the target frequency into HWP.REQ.MIN which generally causes the processor to run a bit too fast.
Moreover, this allows the HWP algorithm to use any frequency between the target one and HWP.REQ.MAX that corresponds to the policy max limit and some workloads cause it to go for the max turbo frequency prematurely which hurts energy-efficiency without improving performance, even though the schedutil governor itself would not allow the frequency to ramp up so fast.
This patch series attempts to improve the situation by introducing a new driver callback allowing the driver to receive more information from the governor. In particular, this allows the min (required) and target (desired) performance levels to be passed to it and those can be used to give better hints to the hardware.
That series improving P-State Schedutil behavior is now in Linux 5.11 via this PR that contains those P-State changes plus also exposing CPPC frequency domain information via sysfs (that latter work driven by Arm).
Schedutil is the modern Linux scaling governor for CPUFreq/P-State to make use of the kernel's scheduler utilization data. Schedutil is becoming increasingly used -- and defaulted to -- in more environments. Schedutil still generally comes up short in our own tests of the "performance" governor but will be delivering some fresh power/performance governor tests soon with Intel/AMD CPUs.