A Proper Solution To The Linux ASPM Problem
At long last, it looks like there is an adequate solution to the Active State Power Management (ASPM) problem in the Linux kernel , a.k.a. the well-known and wide-spread power regression in the Linux 2.6.38 kernel, which has been causing many laptops to go through significantly more power than they should. This is not another workaround, but rather a behavioral change in the kernel to better decide when the PCI Express ASPM support should be toggled.
Since the release of the Linux 2.6.38 kernel in March of this year, a significant number of mobile and desktop systems using this release (or any post-2.6.38 kernel) have noticed a significant increase in power consumption. I had spotted Ubuntu 11.04 development releases going through much more power than earlier releases and then traced it down to being a regression within the Linux 2.6.38 kernel and affecting all distributions using this kernel. The Phoronix Test Suite stack automatically bisected the issue down to being a change in how ASPM is handled.
Active-State Power Management is part of the PCI Express specification, but not all hardware plays well with this power management method. The change in the Linux 2.6.38 kernel disabled ASPM unless the BIOS advertised support for it, but it turns out a vast number of systems supporting ASPM do not actually advertise it from the BIOS. As a result, systems that were running fine with ASPM and experience no issues were now running without ASPM. A whole range of motherboards do not handle this as expected. The solution from some motherboard vendors was just to use Windows. The 2.6.38 change was to fix system lock-ups on select hardware systems where ASPM would go awry.
For systems where ASPM is no longer enabled, the PCI-E ASPM support can be forced by passing "pcie_aspm=force" when booting the kernel. However, that assumes the user is aware of this power-sucking issue in the first place and is aware of editing their GRUB configuration, etc. It is a workaround for power users, but is not a solution to the fundamental issue.
The Linux PCI sub-system maintainers were not aware of the severity of the impact by this commit to the 2.6.38 kernel. The solution to the ASPM issue at the time was to have more Linux PCI-E drivers set the ASPM bits directly (basically black/white-listing within the drivers) or to figure out how Microsoft Windows is determining when to enable or disable the ASPM support. Many months have passed and there was no ASPM activity in the 2.6.39, 3.0, or 3.1 kernels. However, it seems like we may finally have a solution.
Matthew Garrett, the respected Red Hat engineer who's worked quite a lot on Linux power management and lately on the UEFI upbringing, proposed a patch on Thursday afternoon entitled "pci: Rework ASPM disable code."
This patch reworks how the kernel decides whether to enable/disable the PCI-E ASPM support. Due to the lack of documentation from Microsoft and others, the thinking was that hardware did not support ASPM unless it was advertised by the BIOS, but that may actually be wrong.