IO_uring To Ring In Some Awesome Improvements With Linux 6.0
IO_uring lead developer and block subsystem maintainer Jens Axboe of Meta has submitted his various pull requests of Linux 6.0 changes. With the IO_uring updates for Linux 6.0 there are efficiency improvements to the task-work handling, provided-buffer improvements, improve the cancel hash locking, support for recv/revgmsg multi-shot support for better efficiency with applications doing a lot of receives on an instantiated socket, efficiency improvements for poll handling, and a lot of other clean-ups / improvements.
Jens Axboe at Kernel Recipes 2022.
An additional pull is adding buffered writes support to IO_uring. The IO_uring buffered writes are ready with support for XFS while Btrfs file-system support is in-progress. Axboe explains in that pull:
io_uring does support buffered writes on any file type, but since the buffered write path just always -EAGAIN (or -EOPNOTSUPP) any attempt to do so if IOCB_NOWAIT is set, any buffered write will effectively be handled by io-wq offload. This isn't very efficient, and we even have specific code in io-wq to serialize buffered writes to the same inode to avoid further inefficiencies with thread offload.
This is particularly sad since most buffered writes don't block, they simply copy data to a page and dirty it. With this pull request, we can handle buffered writes a lot more effiently. If balance_dirty_pages() needs to block, we back off on writes as indicated.
This improves buffered write support by 2-3x.
Another pull introduces zero-copy send support for IO_uring. The IO_uring zero-copy send is a big win for networking use-cases. This zero-copy send with IO_uring works for IPv4 and IPv6, both TCP and UDP.
Outside of IO_uring, the block changes have been submitted including various clean-ups, the new user-space block driver using IO_uring, and more.
Lastly are the block driver changes with the NVMe code now seeing support for in-band authentication, improved RAID5 lock contention in the MD code, and various other improvements/fixes/clean-ups.
Meanwhile Jens Axboe is teasing a new AMD EPYC server with 128 cores and 24 Optane drives (Dell PowerEdge R7525) he's been playing with:
122M IOPS in 2U, with > 80% of the system idle. Easy.#io_uring #linux pic.twitter.com/Ij9VzUrEhY
— Jens Axboe (@axboe) August 1, 2022
With the current Linux kernel he is currently pulliung off 122M IOPS with the 2U server while more than 80% of the system idle. It will be fun to see what further I/O performance optimizations he'll be able to explore with that hardware.