Linux 6.12 To Drop Old Code That Slows Down CPU Frequency Polling
The Linux 6.12 kernel cycle later this year has a change coming that will impact users of the "Schedutil" CPU frequency scaling governor. This change is dropping the "LATENCY_MULTIPLIER" that has been within the kernel code the past two decades to slowdown how frequent the CPU frequency evaluation occurs. In turn the revised logic can allow for that CPUFreq frequency re-evaluation to occur more often.
The CPUFreq LATENCY_MULTIPLIER causes the polling frequency to be 1000x the transition latency of the processor -- with some exceptions / limits to the maximum delay. That 1000x multiplier once made sense but not so much anymore with modern processors. Qais Yousef who pushed for the LATENCY_MULTIPLIER removal explained in his patch:
This patch to remove the latency multiplier to help with lowering the latency during CPU frequency evaluation/selection is being picked up by the power management subsystem changes intended for the Linux 6.12 kernel.
We'll see what more CPUFreq and P-State driver enhancements come for the power management code over the coming weeks to benefit Linux 6.12, which is likely to be this year's Long Term Support (LTS) kernel version.
The CPUFreq LATENCY_MULTIPLIER causes the polling frequency to be 1000x the transition latency of the processor -- with some exceptions / limits to the maximum delay. That 1000x multiplier once made sense but not so much anymore with modern processors. Qais Yousef who pushed for the LATENCY_MULTIPLIER removal explained in his patch:
"The current LATENCY_MULTIPLIER which has been around for nearly 20 years causes rate_limit_us to be always in ms range.
On M1 mac mini I get 50 and 56us transition latency, but due to the 1000 multiplier we end up setting rate_limit_us to 50 and 56ms, which gets capped into 2ms and was 10ms before e13aa799c2a6 ("cpufreq: Change default transition delay to 2ms")
On Intel I5 system transition latency is 20us but due to the multiplier we end up with 20ms that again is capped to 2ms.
Given how good modern hardware and how modern workloads require systems to be more responsive to cater for sudden changes in workload (tasks sleeping/wakeup/migrating, uclamp causing a sudden boost or cap) and that 2ms is quarter of the time of 120Hz refresh rate system, drop the old logic in favour of providing 50% headroom.
rate_limit_us = 1.5 * latency.
I considered not adding any headroom which could mean that we can end up with infinite back-to-back requests.
I also considered providing a constant headroom (e.g: 100us) assuming that any h/w or f/w dealing with the request shouldn't require a large headroom when transition_latency is actually high.
But for both cases I wasn't sure if h/w or f/w can end up being overwhelmed dealing with the freq requests in a potentially busy system. So I opted for providing 50% breathing room.
This is expected to impact schedutil only as the other user, dbs_governor, takes the max(2*tick, transition_delay_us) and the former was at least 2ms on 1ms TICK, which is equivalent to the max_delay_us before applying this patch. For systems with TICK of 4ms, this value would have almost always ended up with 8ms sampling rate.
For systems that report 0 transition latency, we still default to returning 1ms as transition delay.
This helps in eliminating a source of latency for applying requests..."
This patch to remove the latency multiplier to help with lowering the latency during CPU frequency evaluation/selection is being picked up by the power management subsystem changes intended for the Linux 6.12 kernel.
We'll see what more CPUFreq and P-State driver enhancements come for the power management code over the coming weeks to benefit Linux 6.12, which is likely to be this year's Long Term Support (LTS) kernel version.
14 Comments