A Fix For The Severe Linux Performance Regression Spotted By Torvalds
Prior to Linus Torvalds' Internet and electricity being knocked out by a snow storm and thus impacting the Linux 6.8 merge window, his weekend was already in rough shape due to encountering a performance regression with new Linux 6.8 code that was causing his Linux kernel builds to be as twice as long as with previous kernels. An AMD Linux engineer was able to reproduce the regression and with upstream developers there is now a believed fix for this issue in the latest scheduler code.
In the discussion over the big performance regression reported by Linus Torvalds that stemmed from the scheduler changes in Linux 6.8, for the bisected commit it wasn't immediately clear to the developer involved what was causing the regression. In the ensuing discussion, Wyes Karny of AMD reported that he too could reproduce the regression. Rather than a high-end AMD Ryzen Threadripper like used by Torvalds, Wyes was using a modest AMD Ryzen 5600G desktop. One important note he brought up was that this only reproduced if disabling ACPI CPPC from the BIOS and using ACPI CPUFreq with the Schedutil governor.
Most AMD Zen 2 and newer systems support ACPI CPPC and thus with modern kernels on the Ryzen side typically use the new AMD P-State driver. But for select Zen 2 / Zen 3 systems and older (or those disabling CPPC from the BIOS), the CPUFreq driver is still used and typically the default CPU frequency governor is "Schedutil" for leveraging the scheduler utilization data.
From that mailing list thread a patch was proposed and the particular issues around this regression discussed. In the end Vincent Guittot believes he has a fix to the regression and Wyes was able to successfully test the patch.
Guittot has now sent out sched/fair: Fix frequency selection for non invariant case as the patch to fix this nasty regression on the new Linux 6.8 code when using ACPI CPUFreq + Schedutil. He explains with the patch:
In the end it was a one-line code fix for addressing this performance regression that caused Linus Torvalds' empty kernel builds to go from 22 seconds to 44 seconds.
Assuming all continues to test well with the new patch, the fix should be working its way to the Linux 6.8 Git code once Linus Torvalds' Internet and electricity is restored.
In the discussion over the big performance regression reported by Linus Torvalds that stemmed from the scheduler changes in Linux 6.8, for the bisected commit it wasn't immediately clear to the developer involved what was causing the regression. In the ensuing discussion, Wyes Karny of AMD reported that he too could reproduce the regression. Rather than a high-end AMD Ryzen Threadripper like used by Torvalds, Wyes was using a modest AMD Ryzen 5600G desktop. One important note he brought up was that this only reproduced if disabling ACPI CPPC from the BIOS and using ACPI CPUFreq with the Schedutil governor.
Most AMD Zen 2 and newer systems support ACPI CPPC and thus with modern kernels on the Ryzen side typically use the new AMD P-State driver. But for select Zen 2 / Zen 3 systems and older (or those disabling CPPC from the BIOS), the CPUFreq driver is still used and typically the default CPU frequency governor is "Schedutil" for leveraging the scheduler utilization data.
From that mailing list thread a patch was proposed and the particular issues around this regression discussed. In the end Vincent Guittot believes he has a fix to the regression and Wyes was able to successfully test the patch.
Guittot has now sent out sched/fair: Fix frequency selection for non invariant case as the patch to fix this nasty regression on the new Linux 6.8 code when using ACPI CPUFreq + Schedutil. He explains with the patch:
"When frequency invariance is not enabled, get_capacity_ref_freq(policy) returns the current frequency and the performance margin applied by map_util_perf(), enabled the utilization to go above the maximum compute capacity and to select a higher frequency than the current one.
The performance margin is now applied earlier in the path to take into account some utilization clampings and we can't get an utilization higher than the maximum compute capacity.
We must use a frequency above the current frequency to get a chance to select a higher OPP when the current one becomes fully used. Apply the same margin and returns a frequency 25% higher than the current one in order to switch to the next OPP before we fully use the cpu at the current one."
In the end it was a one-line code fix for addressing this performance regression that caused Linus Torvalds' empty kernel builds to go from 22 seconds to 44 seconds.
Assuming all continues to test well with the new patch, the fix should be working its way to the Linux 6.8 Git code once Linus Torvalds' Internet and electricity is restored.
57 Comments