Speeding Up The Linux Kernel With Transparent Hugepage Support
Last month we reported on the 200 line Linux kernel patch that does wonders for improving the desktop responsiveness of the system. There was certainly much interest (over 100,000 views to both of our YouTube videos demonstrating the change) but this patch really didn't speed up the system per se but rather improved the desktop interactivity and reduced latency by creating task-groups per TTY so that the processes had more equal access to the CPU. There is though an entirely different patch-set now beginning to generate interest among early adopters that does improve the kernel performance itself in compute and memory intensive applications and it's the Transparent Hugepage Support patch-set. Here are our initial tests of the latest kernel patches that will hopefully be finding their way into the mainline Linux kernel soon.
As was pointed out in our forums, the Transparent Hugepage Support patch-set has been updated against the Linux 2.6.37 kernel code-base and is showing to improve the performance in compute/memory intensive applications by a couple percent at this time. The Transparent Hugepage Support in the Linux kernel works by reducing the number of TLB (Translation Lookaside Buffer) entries that such applications need and at the same time increasing the cap that a TLB cache can provide. Transparent Hugepages though have been in the works for sometime now in the Linux kernel and for those interested in a more technical explanation of this support can find a write-up from last year at LWN.net. There is also its kernel documentation.
The latest Transparent Hugepage Support patch that applies the Linux 2.6.37 kernel is just under 7,200 lines. Besides applying the patch, the kernel needs to be built with the new CONFIG_TRANSPARENT_HUGEPAGE option. The Transparent Hugepage support can then be toggled in the newly built kernel at boot-time with the transparent_hugepage option or via its sysfs interface /sys/kernel/mm/transparent_hugepage/enabled. The support can be enabled, disabled, or with the madvise option for controlling khugepaged. The khugepaged defrag support can also be controlled via its sysfs node (/sys/kernel/mm/transparent_hugepage/khugepaged/defrag) as well as how many pages to scan at each pass, how many milliseconds to wait between each pass, and how many milliseconds to wait in khugepaged if there is an allocation failure to throttle. For our simple purposes we just tested this Linux kernel support with it at its defaults and then when disabled. This was done on a Linux 2.6.37-rc4 kernel and we tossed in Canonical's stock Ubuntu 10.10 kernel based upon Linux 2.6.35 for additional reference.
Worth noting is that user-land applications can also be optimized for Transparent Hugepage Support as a step for taking greater advantage of the larger TLBs. There are some GCC patches floating around for such optimizations, but in our testing, we have just used the stock Ubuntu 10.10 user-land.
Transparent Hugepage Support maximizes the usefulness of free memory if compared to the reservation approach of hugetlbfs by allowing all unused memory to be used as cache or other movable (or even unmovable entities). It does not require reservation to prevent hugepage allocation failures to be noticeable from user-land. It allows paging and all other advanced VM features to be available on the hugepages. It requires no modifications for applications to take advantage of it.
Applications however can be further optimized to take advantage of this feature, like for example they've been optimized before to avoid a flood of mmap system calls for every malloc(4k). Optimizing userland is by far not mandatory and khugepaged already can take care of long lived page allocations even for hugepage unaware applications that deals with large amounts of memory.
In certain cases when hugepages are enabled system wide, application may end up allocating more memory resources. An application may mmap a large region but only touch 1 byte of it, in that case a 2M page might be allocated instead of a 4k page for no good. This is why it's possible to disable hugepages system-wide and to only have them inside MADV_HUGEPAGE madvise regions.
Embedded systems should enable hugepages only inside madvise regions to eliminate any risk of wasting any precious byte of memory and to only run faster.
Applications that gets a lot of benefit from hugepages and that don't risk to lose memory by using hugepages, should use madvise(MADV_HUGEPAGE) on their critical mmapped regions.
Our basic testing of this feature was done with a Lenovo ThinkPad T61 notebook boasting an Intel Core 2 Duo T9300 "Penryn" CPU with 4GB of system memory, a 100GB Hitachi 7200RPM SATA HDD, and NVIDIA Quadro NVS 140M graphics. It was running Ubuntu 10.10 with GNOME 2.32.0, X.Org Server 1.9.0, NVIDIA 260.19.21 binary driver, GCC 4.4.5, and an EXT4 file-system.