Linux 5.10 ARM64 Has A "8~20x" Performance Optimization Forgotten About For Two Years
Written by Michael Larabel in Arm on 22 October 2020 at 02:45 PM EDT. 23 Comments
ARM --
Last week was the main set of ARM 64-bit architecture updates for Linux 5.10 while today a second batch of changes were sent in for this kernel. That first round had the Memory Tagging Extension (MTE) and Pointer Authentication support among other improvements while this secondary pull has two notable performance optimizations.

First up is a performance optimization that the Arm developers acknowledge was seemingly forgotten about for some two years. Back in 2018 was a memory management speed-up by around 20x for the mremap system call on large memory regions. That work was merged but the feature never enabled for the ARM64 Linux kernel builds until now.

That patch by a Google engineer was an optimization for mremap given that Android relies on using it for large regions of memory during various operations. The mremap system call can be quite slow without transparent huge-pages (THP) while this patch makes things faster by copying at the PMD level when possible. The speed-up back in 2018 by the engineer was reported to be around ~20x faster on x86 (x86_64) with a 1GB mremap taking just 144~160 microseconds rather than 3.4~3.6 milliseconds. But for systems with THP support, there isn't likely to be much of a performance difference.

The mremap system call is used for expanding/shrinking an existing memory mapping. Mremap is used in particular on Android and thus Google's emphasis on making it faster. While the work was merged, taking this faster path for PMD-level remapping requires setting the HAVE_MOVE_PMD and that never got enabled as part of the ARM64 Kconfig -- initially it was delayed while pending other improvements but then forgotten about. A few days ago it was noticed about HAVE_MOVE_PMD being set for x86 but not ARM64 with no current blockers preventing it from being enabled. The tests carried out this month saw an 8x improvement with this system call on ARM64.

So today's pull request enables HAVE_MOVE_PMD for ARM64 on Linux 5.10, "this has been shown to improve mremap() performance, which is used heavily by the Android runtime [garbage collection], and it seems we forgot to enable this upstream back in 2018."

That pull also has several fixes along with another optimization: better Spectre V2 mitigation on Qualcomm Centriq "Falkor" CPUs. But with Qualcomm having divested from its ARM server chip ambitions, the Spectre V2 optimization will likely benefit few. That optimization comes from Falkor being able to mitigate Spectre Variant Two by calling into firmware or issuing a magic sequence of branches. That magic sequence is faster but requires special conditions and the ARM64 selection logic currently was only set to enable it if the firmware mitigation was unavailable.
Related News
About The Author
Author picture

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter or contacted via MichaelLarabel.com.

Popular News This Week