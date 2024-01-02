Show Your Support: Did you know that the hundreds of articles written on Phoronix each month are mostly authored by one individual? Phoronix.com doesn't have a whole news room with unlimited resources and relies upon people reading our content without blocking ads and alternatively by people subscribing to Phoronix Premium for our ad-free service with other extra features.
Patches Updated To Tackle vmap/vmalloc Lock Contention That Can Yield ~12x Throughput
Uladzislau Rezki with Sony has been working for months to eliminate locking contention within the Linux kernel's vmap/vmalloc code. This locking contention caused by a single spinlock protecting the global vmap space is leading to serious issues on today's increasingly high core count systems.
The patch series now up to its third iteration aim to make it more scalable:
"We introduce an effective vmap node logic. A node behaves as independent entity to serve an allocation request directly(if possible) from its pool. That way it bypasses a global vmap space that is protected by its own lock.
An access to pools are serialized by CPUs. Number of nodes are equal to number of CPUs in a system. Please note the high threshold is bound to 128 nodes.
Pools are size segregated and populated based on system demand. The maximum alloc request that can be stored into a segregated storage is 256 pages. The lazily drain path decays a pool by 25% as a first step and as second populates it by fresh freed VAs for reuse instead of returning them into a global space."
In a synthetic test stressing the vmalloc path, the Sony engineer found the throughput to be around ~12x higher on an AMD Ryzen Threadripper 3970X test system.
The v3 patches for dealing with this vmap/vmalloc locking contention is out for review on the Linux kernel mailing list. Hopefully this is just the tip of the iceberg we see for Linux performance optimizations in 2024.