The Leading Cause Of The Recent Linux Kernel Power Problems

Written by Michael Larabel in Software on 26 June 2011 at 11:17 PM EDT. Page 2 of 3. 144 Comments.

Without further ado, the biggest cause of the 2.6.38 power issue (according to my testing software and the hardware I've been running) is due to a change in behavior regarding ASPM. ASPM is the Active-State Power Management for PCI Express. Namely, to blame is commit 2f671e2dbff6eb5ef4e2600adbec550c13b8fe72 that is titled "PCI: Disable ASPM if BIOS asks us to."

"We currently refuse to touch the ASPM registers if the BIOS tells us that ASPM isn't supported. This can cause problems if the BIOS has (for any reason) enabled ASPM on some devices anyway. Change the code such that we explicitly clear ASPM if the FADT indicates that ASPM is not supported, and make sure we tidy up appropriately on device removal in order to deal with the hot-plug case. If ASPM is disabled because the BIOS doesn't hand over control then we won't touch the registers."

This change is what was made during the Linux 2.6.38 merge window, pushed mainline in early January, and is causing increased power usage on a plethora of systems. Active-State Power Management is a feature that is supposed to save system power by setting a lower power state for unused PCI Express links. The downside to ASPM is that it can increase device latency due to the time required in switching PCI-E link power modes, but on the positive side it's capable of saving a fair amount of power, which can make it worthwhile for mobile systems. ASPM can work on desktops too, but it is not as common there due to the possible increase in latency when switching states. With the 2.6.38 commit that is noted, if the BIOS indicates it does not support ASPM, the behavior is changed.

Evidently, some BIOSes have their ASPM support misconfigured and thus problems can arise if the PCI-E link power mode is dropped on an unsupported device. There are a few mentions of hangs and other issues under Linux associated with this power management feature. It's not really a surprise though that the BIOSes would be misconfigured given all of the other BIOS-related problems under Linux and the once very poor suspend-and-resume support due to all of the workarounds and hacks that BIOS/hardware vendors have done to cater towards Microsoft Windows power management. In this case, it seems a large number of mobile systems are supporting ASPM but not properly advertising the support via the standard BIOS ACPI FADT (Fixed ACPI Description Table). Some Linux drivers even forcibly disable ASPM on Linux (e.g. this kernel patch).

With this being a PCI Express power management bug, the number of systems potentially affected is massive and doesn't isolate the problem to those just using a select driver or obscure configuration. PCI-E ASPM is mostly used on mobile systems, but on at least some desktop systems I have seen the power consumption affected by this issue.

Given the thousands of users having this 2.6.38 power regression by this change, there is a big ASPM problem at hand. Fortunately, as PCI-E ASPM problems are not new, a few boot options can be used. Namely, most people affected by this issue will want to add "pcie_aspm=force" to their boot command line. Simply adding this will force Active-State Power Management to be enabled. This is supported before the Linux 2.6.38 kernel (looks to be going back to circa 2.6.27) and it is still supported today in the latest upstream Git. Just adding that to 2.6.38+ kernels on the systems I have tested will workaround this problem-causing commit and lead to noticeable power savings.


Related Articles