The "Memory Folios" Work Continues - Improving Linux Performance, 7% Faster Kernel Builds
Matthew Wilcox of Oracle today sent out his latest patch series implementing the new "memory folios" type for the Linux kernel in an effort to improve Linux memory management and allow for better efficiency that ultimately translates into measurable performance gains.
For those that missed Wilcox's prior patch series working on this new memory folios concept, here's how he sums it up:
Benchmark results are limited but from the data so far memory folios proves promising for the all important code compilation speed. Unfortunately, this is a massive patch series and not all of the work is in shape yet for upstream inclusion, thus will likely be a while before memory folios is ready for possible mainlining especially considering this fundamental change to the Linux memory management code.
But for those interested in this memory folios concept, on Monday the v11 patches were sent out of which is 33 patches out of the 200+ patches in total.
For those that missed Wilcox's prior patch series working on this new memory folios concept, here's how he sums it up:
Managing memory in 4KiB pages is a serious overhead. Many benchmarks benefit from a larger "page size". As an example, an earlier iteration of this idea which used compound pages (and wasn't particularly tuned) got a 7% performance boost when compiling the kernel.
Using compound pages or THPs [transparent hugepages] exposes a weakness of our type system. Functions are often unprepared for compound pages to be passed to them, and may only act on PAGE_SIZE chunks. Even functions which are aware of compound pages may expect a head page, and do the wrong thing if passed a tail page.
We also waste a lot of instructions ensuring that we're not looking at a tail page. Almost every call to PageFoo() contains one or more hidden calls to compound_head(). This also happens for get_page(), put_page() and many more functions. There does not appear to be a way to tell gcc that it can cache the result of compound_head(), nor is there a way to tell it that compound_head() is idempotent.
This patch series uses a new type, the struct folio, to manage memory. It provides some basic infrastructure that's worthwhile in its own right, shrinking the kernel by about 6kB of text.
The full patch series is considerably larger (~200 patches), and enables XFS to use large pages...An earlier version of this patch set found it was worth about a 7% reduction of wall-clock time on kernel compiles.
Benchmark results are limited but from the data so far memory folios proves promising for the all important code compilation speed. Unfortunately, this is a massive patch series and not all of the work is in shape yet for upstream inclusion, thus will likely be a while before memory folios is ready for possible mainlining especially considering this fundamental change to the Linux memory management code.
But for those interested in this memory folios concept, on Monday the v11 patches were sent out of which is 33 patches out of the 200+ patches in total.
11 Comments