Relaxed TLB Flushes Being Worked On For Linux As Another Performance Optimization

Written by Michael Larabel in Linux Kernel on 19 July 2022 at 05:35 AM EDT. Add A Comment

Nadav Amit who previously spearheaded work on reducing unnecessary TLB flushes, concurrent TLB flushes, and other low level optimizations over the years. The latest work is now on "relaxed" TLB flushes as another low-level performance improvement.

Nadav Amit of VMware has taken up work on "relaxed" TLB flushes for when permissions are added as follow-on work to his prior work around avoiding unnecessary TLB flushes. He explains:

This patch-set allows userfaultfd to map pages as writeable directly upon write-(un)protect ioctl, while addressing the undesired behaviors that occur when one uses userfaultfd write-unprotect or mprotect to add permissions. It also does some cleanup and micro-optimizations along the way.

The main change that is done in the patch-set - x86 specific, at the moment - is the introduction of "relaxed" TLB flushes when permissions are added. Upon a "relaxed" TLB flush, the mm's TLB generation is advanced and the local TLB is flushed, but no TLB shootdown takes place. If a spurious page-fault occurs and the local generation of the TLB is found to be out-of-sync with the mm generation, a full TLB flush is performed on the faulting core to prevent further spurious page-faults.

To a certain extent "relaxed flushes" are similar to the changes that were proposed some time ago for kernel mappings. However, it does not have any complicated interactions with with NMI handlers.

The relaxed TLB flushing was further summed up in its patch message:

Introduce the concept of strict and relaxed TLB flushes. Relaxed TLB flushes are TLB flushes that can be skipped but might lead to degraded performance. It is down to arch code (in the next patches) to deal with relaxed flushes correctly. One such behavior is flushing the local TLB eagerly and remote TLBs lazily.

The performance experiments are looking quite positive with up to 44% savings measured in cycles of mprotect(PROT_READ|PROT_WRITE) or around 6% less CPU cycles with just mprotect(PROT_READ).

More details in this LKML patch series.

Add A Comment