Announcement

**coder** · 02 June 2022, 01:50 AM

Originally posted by BillBroadley View Post

For anyone interested in IO_Uring updates there's a talk on it at kernel recipes 2022 about 32 minutes in: https://www.youtube.com/watch?v=v--rVT4RsCE

Thanks for posting, but I wonder if anyone can answer this seemingly simple question:

Does the kernel attempt to reorder operations to complete them in a more efficient manner? Or is there essentially just a single kernel thread reading the command queue and executing the commands serially? ...at least, the commands for a particular subsystem, like file I/O?

One thing I liked about the idea of async I/O was the seeming potential for the I/O scheduler to reorder the commands to complete more efficiently, based things like on the location of file fragments vs. disk head's position. Or, at least shipping a bunch of the commands off for the hard disk to potentially complete out-of-order. I don't know if Linux' AIO ever did that, but at least it seemed like it could.

If any of these questions are addressed by that talk, let me know and I'll definitely watch it.

**NobodyXu** · 02 June 2022, 02:10 AM

Originally posted by coder View Post

Thanks for posting, but I wonder if anyone can answer this seemingly simple question:

Does the kernel attempt to reorder operations to complete them in a more efficient manner? Or is there essentially just a single kernel thread reading the command queue and executing the commands serially? ...at least, the commands for a particular subsystem, like file I/O?

One thing I liked about the idea of async I/O was the seeming potential for the I/O scheduler to reorder the commands to complete more efficiently, based things like on the location of file fragments vs. disk head's position. Or, at least shipping a bunch of the commands off for the hard disk to potentially complete out-of-order. I don't know if Linux' AIO ever did that, but at least it seemed like it could.

If any of these questions are addressed by that talk, let me know and I'll definitely watch it.

io-uring itself does not guarantee on ordering of processing or order of issuing I/O unless you use linking to ensure sequential order.

Normally, io-uring acts like a batch, async io submitter, requires you to make syscall to push new requests into kernel's async executor and then put the result into the completion queue (CQE) (though the syscall io_uring_enter allows you to wait for a certain number of io to finish).

If you enable IORING_SETUP_SQPOLL, then the kernel will launch a dedicated thread for your io-uring instance, running in the kernel space only and ignore all signals sent to it, to constantly read from the IO submission queue (SQE) via busy reading and then place the result into the CQE.

If the io-uring SQE stays idle for too long, then the thread will enter sleep mode and will need another io_uring_enter call to wake it up.
This kind of situation can be detected by checking `struct io_uring*` setup by io_uring_setup.

**BillBroadley** · 02 June 2022, 03:14 AM

Originally posted by coder View Post

Thanks for posting, but I wonder if anyone can answer this seemingly simple question:

Does the kernel attempt to reorder operations to complete them in a more efficient manner? Or is there essentially just a single kernel thread reading the command queue and executing the commands serially? ...at least, the commands for a particular subsystem, like file I/O?

IO_uring is actually a pretty simple setup. Generally it allow async communications between user space and kernel space in a dramatically more efficient then a normal syscall. One ring for User -> Kernel and one ring for Kernel -> User. The data structure allows for multiple producers and multiple consumers without encountering a bunch of lock contention, but for that same reason things are not ordered. So generally it's unordered, but what you are accessing might require an order. For instance sending packets on a TCP connection, which has sequence number. Or reading directory entries (which generally use OpenDIr and linearly traverse a list). But if not say you want to read N blocks, you can't assume you'll get them in any order.

Generally IO_uring reduces CPU utilization, and allows you to async make requests, the Kernel will make progress as best it can while you do other things with the CPU. When ready you can efficiently collect the results. I can assure you that when making 13M-14M IOP per second per core that there's a big queue, many operations in flight, and the results are not ordered. Optanes do better at low queue depths than any other storage I know of, but for maximum throughput you still need a larger queue.

**coder** · 02 June 2022, 05:59 PM

Originally posted by BillBroadley View Post

IO_uring is actually a pretty simple setup. Generally it allow async communications between user space and kernel space in a dramatically more efficient then a normal syscall. One ring for User -> Kernel and one ring for Kernel -> User. The data structure allows for multiple producers and multiple consumers without encountering a bunch of lock contention, but for that same reason things are not ordered.

Thanks for the reply! I should've mentioned that I did read the whitepapers on Axboe's site, a couple years ago:

Therefore, I get the basic concept of what it is and how it works from a user's point of view. What I'm curious about is the kernel side of the implementation, specifically in regards to disk file I/O. That said, I just saw the new article about Axboe's talk at the Kernel Recipes conference and started paging through the slides. This talk seems relevant to my interests, so I plan to watch the video of his presentation (seems to be the same one you linked : )

Article: https://www.phoronix.com/scan.php?pa...R2022-IO_uring

Additional resource (also quoted Axboe's slides): https://blog.cloudflare.com/missing-...g-worker-pool/

**coder** · 02 June 2022, 06:13 PM

Originally posted by NobodyXu View Post

io-uring itself does not guarantee on ordering of processing or order of issuing I/O unless you use linking to ensure sequential order.

Yes, but the lack of ordering guarantees doesn't necessarily mean the implementation is trying to reorder them efficiently.

A long time ago, I had the idea to write a RTSP server that stored media streams pre-packetized and aligned frames (or frame slices) on disk block boundaries. You could issue async reads and, with some careful bookkeeping, send out data as the reads complete. You can take care to assign the packet sequence numbers so that the client will reorder them correctly, upon reception, even though they're actually sent out-of-order. This assumes that sequential reads from a single file would complete out-of-order, in actual practice. Otherwise, the whole scheme would be pointless.

These days, I think a lot of streaming is TCP-based and clients don't expect even RTP packets to be out-of-order.

Anyway, thanks for the details.

**NobodyXu** · 02 June 2022, 09:08 PM

Originally posted by coder View Post

Yes, but the lack of ordering guarantees doesn't necessarily mean the implementation is trying to reorder them efficiently.

A long time ago, I had the idea to write a RTSP server that stored media streams pre-packetized and aligned frames (or frame slices) on disk block boundaries. You could issue async reads and, with some careful bookkeeping, send out data as the reads complete. You can take care to assign the packet sequence numbers so that the client will reorder them correctly, upon reception, even though they're actually sent out-of-order. This assumes that sequential reads from a single file would complete out-of-order, in actual practice. Otherwise, the whole scheme would be pointless.

These days, I think a lot of streaming is TCP-based and clients don't expect even RTP packets to be out-of-order.

Anyway, thanks for the details.

Reading the man page of io_uring_setup, I think the internal works like this:

If polling mode is enabled (and only direct access is allowed), the uring would try poll the first request, if it's not ready, then poll next, etc.
However this mode probably isn't widely used as it disallow non-direct access.

So majority is probably going to use the default mode, where according to my understanding, the uring would first poll, if failed then would launch an async io operation and move on to the next request.

You could configure individual request to skip polling and go straight to async io though.

Announcement

Newest Linux Optimizations Can Achieve 10M IOPS Per-Core With IO_uring

Comment

Comment

Comment

Comment

Comment

Comment