Linux Kernel Power Bug Now High Importance In Ubuntu
The bug introduced during the development of the Linux 2.6.38 kernel causing excessive power consumption is very real, is occurring on many different hardware platforms, and has just been deemed a bug of high importance by the Ubuntu Kernel Team. This serious regression was just made widely known on Friday in my Mobile Users Beware: Linux Has Major Power Regression article and then further detailed in The Tests Showing Ubuntu 11.04 On A Power Consumption Binge.
At first, some were trying to write this off as not being a bug or just a Phoronix story, but it's proved to be a very real issue. Worst of all, it's living in the upstream Linux kernel and impacting all Linux distributions using 2.6.38+, which is already released as stable and it's still living in the Linux 2.6.39 kernel. The bug has been in the mainline kernel since the second week of January, that's almost four months!
There is also this Launchpad bug report from a Canonical software engineer that was created earlier this month but now with the attention of Phoronix, 33 others have officially confirmed being 'affected' by this bug in its entry. This morning, the Ubuntu Kernel Team has now confirmed it as being a bug of high importance for Natty (11.04) and Oneiric (11.10). It should also be acknowledged in the Ubuntu 11.04 release notes that there is a power issue.
It doesn't appear that they are devoting any resources to getting the issue resolved but it looks like they will be waiting for a fix to appear upstream in the stable series or in 2.6.39 and then to have that back-ported into an Ubuntu 11.04 SRU update.
Besides the nearly three dozen independent confirmations from that power bug report, there's also many references of the power issue to be found within the Phoronix Forums, Ubuntu Forums, Reddit, other message boards, etc. It's happening not only for notebooks but also for desktops. And it's not limited to just a particular class of processors, but Intel and AMD hardware are both affected. As the articles on Friday and Saturday showed, distinctly different hardware over the course of several years/generations is also facing increased power consumption.
So what's causing this excessive power usage? I've spent the weekend looking into the issue and setting up Phoronix Test Suite / Phoromatic scripts to bisect and analyze the issue. I still don't have a definitive answer as it looks like issue may be affected by multiple commits. It appears the power regression starts on a merge to the Linus' tree around the 13~14 January and may be worsened by another commit on 15 January. So at the same time as tracking this particular issue, I've also been working on improvements to the Phoronix Test Suite, Phoromatic, and other PTS Commercial scripts for bisecting multi-point regressions still in a fully automated manner, etc.
Most of the commits turning up are pointing to the kernel's memory management sub-system. This may make some sense considering the vast array of hardware experiencing increased power consumption that the regression is likely within the core of the Linux kernel code and not one of the hardware drivers. It doesn't appear to be a trivial bug.
There were also invasive changes to the memory management subsystem during the 2.6.38 cycle when transparent huge-pages were merged (note though that our 2.6.38 builds aren't building this feature so it would just potentially be fallout related to it), among others. There was also some controversy even at that time for this work having been merged when it was so invasive and it never even living in -next. "This is insane. Having such a massively invasive change to the whole mm, barely tested on most architecture, and last I heard still generally controversial being merged like that without even some integration testing via -next makes no sense. Linus, wtf is going on?"
I am not yet confident that the 2.6.38 power regression is within that area, but for those that have been asking me, that's where it looks like at this point. If it's not, there's some other major regression within that mm code that the PTS code is latching onto. I have some more tests running at the moment so this afternoon look for more information on Phoronix. I've also been posting bits of information to my Twitter feed as the Linux testing continues.
I'm also still happy to provide more information to any ISV/IHV/developers interested in additional data, so contact me. Also happy to talk about better open-source continuous integration and improving benchmarking processes coming up soon in Munich, Nürnberg, Frankfurt, Berlin, or Budapest. [Or anyone simply wanting to buy some quality beer (Augustiner and Franziskaner are the favorites) to encourage further Linux testing expeditions.]
Latest Linux Hardware Reviews
Latest Linux Articles
Latest Linux News
Latest Forum Discussions