IO_uring FUTEX Support In Linux 6.7 For A Nice Performance/Efficiency Boost

Written by Michael Larabel in Linux Kernel on 7 November 2023 at 06:30 AM EST. 9 Comments

In addition to the continued FUTEX2 improvements that landed in Linux 6.7, another pull request merged last week for the new kernel bring FUTEX support to IO_uring.

The new code allows using futexes through IO_uring -- including the FUTEX vectored waits. This FUTEX IO_uring support can lead to some nice performance/efficiency wins. Jens Axboe explained in the pull request:

As far as I can recall, the first request for futex support with io_uring came from Andres Freund, working on postgres. His aio rework of postgres was one of the early adopters of io_uring, and futex support was a natural extension for that. This is relevant from both a usability point of view, as well as for effiency and performance. In Andres's words, for the former:

"Futex wait support in io_uring makes it a lot easier to avoid deadlocks in concurrent programs that have their own buffer pool: Obviously pages in the application buffer pool have to be locked during IO. If the initiator of IO A needs to wait for a held lock B, the holder of lock B might wait for the IO A to complete. The ability to wait for a lock and IO completions at the same time provides an efficient way to avoid such deadlocks."

and in terms of effiency, even without unlocking the full potential yet, Andres says:

"Futex wake support in io_uring is useful because it allows for more efficient directed wakeups. For some "locks" postgres has queues implemented in userspace, with wakeup logic that cannot easily be implemented with FUTEX_WAKE_BITSET on a single "futex word" (imagine waiting for journal flushes to have completed up to a certain point). Thus a "lock release" sometimes need to wake up many processes in a row. A quick-and-dirty conversion to doing these wakeups via io_uring lead to a 3% throughput increase, with 12% fewer context switches, albeit in a fairly extreme workload."

A 3% throughput boost and a 12% reduction in context switches is a nice little boost for the kernel. More details and the code via this pull that's been in the mainline tree the past few days.

9 Comments