P-State Algorithm Change, Schedutil IOWait Boosting

Written by Michael Larabel in Linux Kernel on 1 August 2016 at 02:44 AM EDT. 9 Comments

While still in early form and won't be merged for this next kernel cycle (v4.8), a series of patches were published on Sunday to improve CPU frequency selection under Linux, including an algorithm change for the Intel P-State scaling driver.

Rafael Wysocki posted the [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting patch series looking for feedback on some CPU frequency scaling related changes. Wysocki admits he hasn't even thoroughly tested the impact of the changes yet, but is looking to see if other developers agree it would be a step in the right direction.

The work includes adding iowait boosting to the CPUFreq Schedutil governor introduced in Linux 4.7. There's also a patch for better accessing scheduler utilization data.

Also notable to this patch series is the Change P-state selection algorithm for Core. Many Phoronix readers frequently complain of faulty experiences using the P-State driver over CPUFreq. This algorithm change is described by Rafael Wysocki as follows:

The PID-base P-state selection algorithm used by intel_pstate for Core processors is based on very weak foundations. Namely, its decisions are mostly based on the values of the APERF and MPERF feedback registers and it only estimates the actual utilization to check if it is not extremely low (in order to avoid getting stuck in the highest P-state in that case).

Since it generally causes the CPU P-state to ramp up quickly, it leads to satisfactory performance, but the metric used by it is only really valid when the CPU changes P-states by itself (ie. in the turbo range) and if the P-state value set by the driver is treated by the CPU as the upper limit on turbo P-states selected by it.

As a result, the only case when P-states are reduced by that algorithm is when the CPU has just come out of idle, but in that particular case it would have been better to bump up the P-state instead. That causes some benchmarks to behave erratically and attempts to improve the situation lead to excessive energy consumption, because they make the CPU stay in very high P-states almost all the time.

Consequently, the only viable way to fix that is to replace the erroneous algorithm entirely with a better one.

To that end, notice that setting the P-state proportional to the actual CPU utilization (measured with the help of MPERF and TSC) generally leads to reasonable behavior, but it does not reflect the "performance boosting" nature of the current P-state selection algorithm. It may be made more similar to that algorithm, though, by adding iowait boosting to it.

Specifically, if the P-state is bumped up to the maximum after receiving the UUF_IO flag via cpufreq_update_util(), it will allow tasks that were previously waiting on I/O to get the full capacity of the CPU when they are ready to process data again and that should lead to the desired performance increase overall without sacrificing too much energy.

For this reason, use the above approach for Core processors in intel_pstate.

Hopefully these CPUFreq and P-State improvements will get improved upon and tested so that they'll be ready for a future kernel release in the not too distant future. Of course, as the work gets closer to mainline, I'll surely be benchmarking their performance impact at Phoronix.

9 Comments