Linux 5.15 Has A Critical Improvement For Tiered Memory Servers
The current behavior with Linux right now is when the system memory (RAM) fills up under memory pressure, some of the DRAM contents will be tossed out. For recent and future servers with tiered memory like using Intel Optane DC persistent memory, the Linux kernel may eventually fall over to start using that persistent memory if necessary for allocations but not in any intelligent manner.
That status quo though is less than desirable since new allocations may end up going to the slower persistent memory for not having any other choice and that the kernel will be wiping out pages from system RAM even if there is plenty of persistent memory available.
With Linux 5.15 is now the notion of demoting pages during reclaim. This page migration on reclaim allows for the kernel to migrate pages from the primary system RAM over to slower tiers of memory when that fast tier is under memory pressure. This demoted reclaim is done prior to any swapping to disk and should be more desirable than just wiping out portions of the system memory when there is persistent memory available albeit slower.
Intel engineers have been working on this migrating of pages to slower memory tiers the past few months and is now part of Linux 5.15. While the code can demote pages to slower memory tiers, currently there isn't any means of promoting pages back into faster DRAM when capacity is available -- there are other patches currently working on this promotion handling.
Intel engineers found this functionality could improve the PostgreSQL performance by as much as 22% on tiered memory servers with persistent memory.
Controlling this behavior on tiered memory servers can be achieved via the /sys/kernel/mm/numa/demotion_enabled file on Linux 5.15+ -- more details in this patch among the other demoting of pages during reclaim patches that just landed.