XFS To Enjoy Big Scalability Boost With Linux 5.14
A big patch series out of Red Hat is now queued into the XFS file-system development Git branch that is part of the new material for the upcoming Linux 5.14 cycle.
The big set of patches that was queued this week into the xfs-5.14 for-next code focuses on CIL (Committed Item List) and log scalability improvements.
There are good performance numbers being seen out of this scalability work for the XFS file-system. The big numbers are seeing the transaction rate go up from around 700k to 1.7M commits per second and a reduction in flush operations by 2~x orders of magnitude less for metadata heavy workloads that don't enforce fsync.
The message on the merge further explains the code that's been reworked in the XFS driver for better scalability around the CIl and log:
This patch series was led by Dave Chinner of Red Hat.
The big set of patches that was queued this week into the xfs-5.14 for-next code focuses on CIL (Committed Item List) and log scalability improvements.
There are good performance numbers being seen out of this scalability work for the XFS file-system. The big numbers are seeing the transaction rate go up from around 700k to 1.7M commits per second and a reduction in flush operations by 2~x orders of magnitude less for metadata heavy workloads that don't enforce fsync.
The message on the merge further explains the code that's been reworked in the XFS driver for better scalability around the CIl and log:
The log write FUA/FLUSH optimisations reduce the number of cache flushes required to flush the CIL to the journal. It extends the old pre-delayed logging ordering semantics required by writing individual transactions to the iclogs out to cover then CIL checkpoint transactions rather than individual writes to the iclogs. In doing so, we reduce the cache flush requirements to once per CIL checkpoint rather than once per iclog write.
The async CIL pushes fix a pipeline limitation that only allowed a single CIL push to be processed at a time. This was causing CIL checkpoint writing to become CPU bound as only a single CIL checkpoint could be pushed at a time. The checkpoint pipeline was designed to allow multiple pushes to be in flight at once and use careful ordering of the commit records to ensure correct recovery order, but the workqueue implementation didn't allow concurrent works to be run. The concurrent works now extend out to 4 CIL checkpoints running at a time, hence removing the CPU usage limiations without introducing new lock contention issues.
The xlog_write() rework is long overdue. The code is complex, difficult to understand, full of tricky, subtle corner cases and just generally really hard to modify. This patchset reworks the xlog_write() API to reduce the processing overhead of writing out long log vector chains, and factors the xlog_write() code into a simple, compact fast path along with a clearer slow path to handle the complex cases.
The CIL commit scalability patchset removes spinlocks from the transaction commit fast path. These spinlocks are the performance limiting bottleneck in the transaction commit path, so we apply a variety of different techniques to do either atomic. lockless or per-cpu updates of the CIL tracking structures during commits. This greatly increases the throughput of the the transaction commit engine, moving the contention point to the log space tracking algorithms after
doubling throughput on 32-way workloads.
This patch series was led by Dave Chinner of Red Hat.
29 Comments