Google Proposes Multi-Generational LRU For Linux To Yield Much Better Performance
Google engineer Yu Zhao sent out patches proposing a "multigenerational LRU" implementation for the Linux kernel's least recently used (LRU) handling for memory page replacement.
The engineers working on multi-generational LRU found the Linux kernel's current page reclaim code to be too expensive for CPU usage and making poor choices over what to evict. But with this new LRU implementation it's more "performant, versatile, and straightforward" with promising results.
Those interested in all of the technical details on this multigenerational LRU code can see this patch series but the results are the exciting part for end-users:
The patch comments also note, "The end result is generally a significant reduction in CPU usage, for most of the systems running cloud workloads. On Chrome OS, our real-world benchmark that browses popular websites in multiple tabs demonstrates 51% less CPU usage from kswapd and 52% (full) less PSI on v5.11...In addition, direct reclaim latency is reduced by 22% at 99th percentile and the number of refaults is reduced 7%. These metrics are important to phones and laptops as they are correlated to user experience."
Yes, please! These initial multi-generational LRU patches amount to 14 patches at the moment and in a patched kernel can be enabled via the LRU_GEN Kconfig switch. There is also a tunable of NR_LRU_GENS for configuring the maximum number of generations depending upon the device. The behavior can also be controlled at run-time via /sys/kernel/mm/lru_gen.
The engineers working on multi-generational LRU found the Linux kernel's current page reclaim code to be too expensive for CPU usage and making poor choices over what to evict. But with this new LRU implementation it's more "performant, versatile, and straightforward" with promising results.
Those interested in all of the technical details on this multigenerational LRU code can see this patch series but the results are the exciting part for end-users:
On Android, our most advanced simulation that generates memory pressure from realistic user behavior shows 18% fewer low-memory kills, which in turn reduces cold starts by 16%.
On Borg, a similar approach enables us to identify jobs that underutilize their memory and downsize them considerably without compromising any of our service level indicators.
On Chrome OS, our field telemetry reports 96% fewer low-memory tab discards and 59% fewer OOM kills from fully-utilized devices and no UX regressions from underutilized devices.
The patch comments also note, "The end result is generally a significant reduction in CPU usage, for most of the systems running cloud workloads. On Chrome OS, our real-world benchmark that browses popular websites in multiple tabs demonstrates 51% less CPU usage from kswapd and 52% (full) less PSI on v5.11...In addition, direct reclaim latency is reduced by 22% at 99th percentile and the number of refaults is reduced 7%. These metrics are important to phones and laptops as they are correlated to user experience."
Yes, please! These initial multi-generational LRU patches amount to 14 patches at the moment and in a patched kernel can be enabled via the LRU_GEN Kconfig switch. There is also a tunable of NR_LRU_GENS for configuring the maximum number of generations depending upon the device. The behavior can also be controlled at run-time via /sys/kernel/mm/lru_gen.
27 Comments