Meta's Transparent Memory Offloading Saves Them 20~32% Of Memory Per Linux Server

Written by Michael Larabel in Linux Kernel on 20 June 2022 at 06:15 PM EDT. 29 Comments

Meta's engineering team today published an interesting blog post about Transparent Memory Offloading (TMO) as a new Linux kernel feature they developed that is already used in production on Facebook/Meta servers. Within Meta's data centers this TMO functionality is saving 20~32% memory per server across their millions of servers.

Meta describes Transparent Memory Offloading as:

A new Linux kernel mechanism that measures the lost work due to resource shortage across CPU, memory, and I/O in real time. Guided by this information and without any prior application knowledge, TMO automatically adjusts the amount of memory to offload to a heterogeneous device, such as compressed memory or an SSD. It does so according to the device’s performance characteristics and the application’s sensitivity to slower memory accesses. TMO holistically identifies offloading opportunities from not only the application containers but also the sidecar containers that provide infrastructure-level functions.

TMO has been running in production for more than a year, and has saved 20 percent to 32 percent of total memory across millions of servers in our expansive data center fleet. We have successfully upstreamed TMO’s OS components into the Linux kernel.

The Linux kernel-side work includes the Pressure Stall Information (PSI) in the kernel already and then in user-space they have "Senpai" as a user-space agent.

Meta

The offloading is often being done to NVMe solid-state drives that are cheaper per-GB than server memory. Upcoming server platforms with Compute Express Link (CXL) also hold a lot of potential for Transparent Memory Offloading usage.

Those interested in learning more about the Facebook/Meta Transparent Memory Offloading (TMO) effort can see the Meta engineering blog for all the interesting technical details.

29 Comments