Announcement

**cl333r** · 30 October 2020, 02:31 AM

Jens Axboe notes in his tests that with this improvement the IO_uring performance in his test jumped from 1.38M requests/second up to 1.67M requests/s. Besides achieving 22% higher throughput, the CPU usage was also lower thanks to less locking.

I imagine it's a synthetic test, it doesn't magically make the threaded part of apps 22% faster, right? If so, the real benefit would be around <= 1%, right?

**indepe** · 30 October 2020, 02:46 AM

Besides achieving 22% higher throughput, the CPU usage was also lower thanks to less locking.

I'm not sure that lower CPU usage is a good thing in this case. Although it suggests that throughput could be optimized even further, on a computer that doesn't have anything else to do, the CPUs should not remain unused part of the time (on a test that runs on localhost, I assume).

**Volta** · 30 October 2020, 04:28 AM

Originally posted by indepe View Post

I'm not sure that lower CPU usage is a good thing in this case. Although it suggests that throughput could be optimized even further, on a computer that doesn't have anything else to do, the CPUs should not remain unused part of the time (on a test that runs on localhost, I assume).

So higher CPU usage, more context switches and lower throughput is better for you?

**indepe** · 30 October 2020, 04:58 AM

Originally posted by Volta View Post

So higher CPU usage, more context switches and lower throughput is better for you?

I kind of expected this would not go over easily....

What matters here is throughput, and the lower CPU usage is a sign that an even higher throughput may be possible. In other words, although the new version is better than the old version, it probably has some kind of problem.

The lower CPU usage is an indication that something is non-optimal, since it is not a good thing to leave CPUs inactive if/when they could do something useful.

**Volta** · 30 October 2020, 05:47 AM

Originally posted by indepe View Post

I kind of expected this would not go over easily....

What matters here is throughput, and the lower CPU usage is a sign that an even higher throughput may be possible. In other words, although the new version is better than the old version, it probably has some kind of problem.

The lower CPU usage is an indication that something is non-optimal, since it is not a good thing to leave CPUs inactive if/when they could do something useful.

I knew what you were thinking. CPU have more time now for throughput, but maybe it reached some other (hardware?) limits.

Because of the shared ring buffers between the kernel and user space, io_uring can be a zero-copy system. Copying bytes around becomes necessary when system calls that transfer data between kernel and user space are involved. But since the bulk of the communication in io_uring is via buffers shared between the kernel and user space, this huge performance overhead is completely avoided.

With some clever use of shared ring buffers, io_uring performance is really memory-bound, since in polling mode, we can do away with system calls altogether.

What is io_uring? — Lord of the io_uring documentation

https://unixism.net/loti/what_is_io_uring.html

**indepe** · 30 October 2020, 07:30 AM

Originally posted by Volta View Post

I knew what you were thinking. CPU have more time now for throughput, but maybe it reached some other (hardware?) limits.

I certainly wasn't thinking that the old version would be better.

Originally posted by Volta View Post

https://unixism.net/loti/what_is_io_uring.html

The same paragraph that talks about io_uring performance being memory-bound, also mentions that io_uring is capable of 1.7M 4k IOPS. If that's true I'd be surprised if processing 1.7M echo packets runs into a memory bottleneck or any other hardware limit, but I wouldn't really know.

**Volta** · 30 October 2020, 09:05 AM

Originally posted by indepe View Post

I certainly wasn't thinking that the old version would be better.

What it seems from the article is less time CPU gets into the way the higher throughput you have. You're expecting CPU to speed things up, but it seems the CPU is not the one responsible for performance in this case.

**Jumbotron** · 30 October 2020, 10:19 AM

Well...I for one like ANY bit of uplift I can get because it's synergistic to the other pieces of the kernel that are also getting optimized. And outside the kernel such as the improvements to Mesa Marik has just sent in.

Added together...2-5% here....2-5% there...the synergistic effect is, hopefully, greater than the individual parts.

**ix900** · 30 October 2020, 10:44 AM

Looks too much like IO urin. Someone picked a winner there.

Announcement

Linux 5.11 To Land Optimization That Helps IO_uring Performance

Linux 5.11 To Land Optimization That Helps IO_uring Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment