Linux Kernel Getting io_uring To Deliver Fast & Efficient I/O

polarathene replied

15 February 2019, 08:48 AM
Originally posted by jabl View Post

Efficient network I/O has already been solved a long time ago with epoll. This patchset (io_uring) delivers efficient asynchronous file I/O, including for buffered file I/O which wasn't possible with the old file AIO interface (io_submit() etc.).

I thought there was quite a bit of discussion about how FreeBSD handles network I/O better than Linux? (no links, or specifics that I can recall though)
Leave a comment:
rene replied

15 February 2019, 05:27 AM
Originally posted by oiaohm View Post

Problem with your idea its already been tried in the Linux kernel did not provide the performance boost anywhere near expected. Linux kernel is no where near as simple as you think it is.

Bpfilter (and user-mode blobs) for 4.18 [LWN.net]

https://lwn.net/Articles/755919/

bpfilter is one of the next generation of Linux kernel drivers. This is a mix of user space and kernel mode and bpf in kernel mode all in a single Linux .ko driver. Linux kernel is tuning into a strange form of hybrid.

Audit-able intentionally turing incomplete(as in fails the turing test) language that ebpf is that jit to native code by kernel to run in kernel space provides many times the performance boost bundling syscalls can this is shown by ebpf and bundle syscalls being use to attempt performance boost fuse under Linux. It due to the fact some basic logic can be performed kernel side complete event runs can be completed without any context switches with bpf.

Also you miss one of the big causes of context switching in microkernels. There are many operations that a driver performs that in fact need ring 0. This is not want this is need. IOPL (I/O Privilege level) flag does not provide to rings other than ring 0 the right to mess with memory permissions and other things drivers need to do with DMA driven hardware so this is happening a lot.

Microkernel core as ring 0 and drivers at like ring1/ring2 with IOPL flag you get performance wrecked by memory permission operations that must happen at ring 0 resulting in a mandatory context switch. so killing performance. This makes spectre performance loses look minor.

Microkernel core could run as hypervisor ring -1. Then each driver need to run a individual ring 0 vm then hypervisor transfer over head kills you. This results in Microkernel being a watchdog at ring -1 over a big blob monolith at ring 0 as this does perform.

CPU we have today are not designed to run microkernels effectively. Linux kernel hybred experiments might show way out.

Controlling the Performance Impact of Microcode and Security Patches for CVE-2017-5754 CVE-2017-5715 and CVE-2017-5753 using Red Hat Enterprise Linux Tunables - Red Hat Customer Portal

https://access.redhat.com/articles/3311301

Red Hat Customer Portal Labs provides a Spectre And Meltdown Detector to help you detect if your systems are vulnerable to these CVEs. The recent speculative execution CVEs address three potential attacks across a wide variety of processor architectures and platforms, each requiring slightly different fixes. In many cases, these fixes also require microcode updates from the hardware vendors. Red Hat has made updated kernels available to address these security vulnerabilities. These patches are enabled by default (detailed below) because Red Hat prioritizes out of the box security. Speculative execution is a performance optimization technique. Thus, these updates (both kernel and microcode) may result in workload-specific performance degradation. Therefore, some customers who feel confident that their systems are well protected by other means (such as physical isolation), may wish to disable some or all of these kernel patches. If the end user elects to enable the patches in the interest of security, this article provides a mechanism to conduct performance characterizations with and without the fixes enabled.

Big thing you have missed Linux kernel did add spectre and other cpu bug mitigation but then also includes flags to turn those off for those who want speed.

This is the hard bit design a OS you have security and performance that are both at time mutually exclusive you have to allow the user to choose what one they need more of. So the spectre overhead as a arguement for micro-kernel does not fly. Linus is willing to take a performance over head for security as long as it has a off flag to return the lost performance while decreasing the security. See the micro-kernel fail here its designed for security so you are stuck with the overhead if it suites your current problem or not..

a short note to a long story: i386 and later have an i/o permission bitmap for supposedly fine grained i/o permission control, also modern hardware is anyway not using classic i/o ports, with (nearly) everything memory mapped ring 3 drivers can drive the hardware just fine without any extra i/o context switching. Also QNX is quite fast, even was a decade ago, so it is not like it is impossible to do. Also more elegant architecture and algorithms can vastly improve performance, e.g. look at the current graphic subsystem performance.

Last edited by rene; 15 February 2019, 05:52 AM.
Likes 1
Leave a comment:
jabl replied

15 February 2019, 03:36 AM
Originally posted by polarathene View Post

Oh? So this won't help improve network I/O much?

Efficient network I/O has already been solved a long time ago with epoll. This patchset (io_uring) delivers efficient asynchronous file I/O, including for buffered file I/O which wasn't possible with the old file AIO interface (io_submit() etc.).
Likes 2
Leave a comment:
polarathene replied

14 February 2019, 10:50 PM
Originally posted by jpg44 View Post

I've written servers before. Often network code uses a select loop and non blocking I/O with buffering. When the buffer is full, writes will not suceed so you keep the data queue'd and use select to watch for writeability. AIO also is non blocking and queues data in a buffer, and gives you notification when data hits the disk, useful for disk writes when you need to make sure it got to the disk, can be important for databases

Oh? So this won't help improve network I/O much?
Leave a comment:
oiaohm replied

14 February 2019, 05:22 PM
Originally posted by rene View Post

And I was just discussing bundled system calls for improved multi-server microkernel on my last night's livestream:

Problem with your idea its already been tried in the Linux kernel did not provide the performance boost anywhere near expected. Linux kernel is no where near as simple as you think it is.

Bpfilter (and user-mode blobs) for 4.18 [LWN.net]

https://lwn.net/Articles/755919/

bpfilter is one of the next generation of Linux kernel drivers. This is a mix of user space and kernel mode and bpf in kernel mode all in a single Linux .ko driver. Linux kernel is tuning into a strange form of hybrid.

Audit-able intentionally turing incomplete(as in fails the turing test) language that ebpf is that jit to native code by kernel to run in kernel space provides many times the performance boost bundling syscalls can this is shown by ebpf and bundle syscalls being use to attempt performance boost fuse under Linux. It due to the fact some basic logic can be performed kernel side complete event runs can be completed without any context switches with bpf.

Also you miss one of the big causes of context switching in microkernels. There are many operations that a driver performs that in fact need ring 0. This is not want this is need. IOPL (I/O Privilege level) flag does not provide to rings other than ring 0 the right to mess with memory permissions and other things drivers need to do with DMA driven hardware so this is happening a lot.

Microkernel core as ring 0 and drivers at like ring1/ring2 with IOPL flag you get performance wrecked by memory permission operations that must happen at ring 0 resulting in a mandatory context switch. so killing performance. This makes spectre performance loses look minor.

Microkernel core could run as hypervisor ring -1. Then each driver need to run a individual ring 0 vm then hypervisor transfer over head kills you. This results in Microkernel being a watchdog at ring -1 over a big blob monolith at ring 0 as this does perform.

CPU we have today are not designed to run microkernels effectively. Linux kernel hybred experiments might show way out.

Controlling the Performance Impact of Microcode and Security Patches for CVE-2017-5754 CVE-2017-5715 and CVE-2017-5753 using Red Hat Enterprise Linux Tunables - Red Hat Customer Portal

https://access.redhat.com/articles/3311301

Red Hat Customer Portal Labs provides a Spectre And Meltdown Detector to help you detect if your systems are vulnerable to these CVEs. The recent speculative execution CVEs address three potential attacks across a wide variety of processor architectures and platforms, each requiring slightly different fixes. In many cases, these fixes also require microcode updates from the hardware vendors. Red Hat has made updated kernels available to address these security vulnerabilities. These patches are enabled by default (detailed below) because Red Hat prioritizes out of the box security. Speculative execution is a performance optimization technique. Thus, these updates (both kernel and microcode) may result in workload-specific performance degradation. Therefore, some customers who feel confident that their systems are well protected by other means (such as physical isolation), may wish to disable some or all of these kernel patches. If the end user elects to enable the patches in the interest of security, this article provides a mechanism to conduct performance characterizations with and without the fixes enabled.

Big thing you have missed Linux kernel did add spectre and other cpu bug mitigation but then also includes flags to turn those off for those who want speed.

This is the hard bit design a OS you have security and performance that are both at time mutually exclusive you have to allow the user to choose what one they need more of. So the spectre overhead as a arguement for micro-kernel does not fly. Linus is willing to take a performance over head for security as long as it has a off flag to return the lost performance while decreasing the security. See the micro-kernel fail here its designed for security so you are stuck with the overhead if it suites your current problem or not..

Last edited by oiaohm; 14 February 2019, 07:47 PM.
Likes 4
Leave a comment:
jpg44 replied

14 February 2019, 01:33 PM
Originally posted by polarathene View Post

So this is I/O perf improvement for disk, memory and network? At least I think those 3(and any other kinds) are all handled differently.

Applications have to specifically utilize it? I suppose if the application offloads I/O to a lib that handles it(that the dev may not need to do anything explicitly to do on the platform to use) then it's a free improvement?(as in no extra work required), eg Apps on KDE that use KIO perhaps?(assuming libs like KIO would need to first update support for it before it's dependents benefit)

I've written servers before. Often network code uses a select loop and non blocking I/O with buffering. When the buffer is full, writes will not suceed so you keep the data queue'd and use select to watch for writeability. AIO also is non blocking and queues data in a buffer, and gives you notification when data hits the disk, useful for disk writes when you need to make sure it got to the disk, can be important for databases
Likes 2
Leave a comment:
polarathene replied

14 February 2019, 11:57 AM
So this is I/O perf improvement for disk, memory and network? At least I think those 3(and any other kinds) are all handled differently.

Applications have to specifically utilize it? I suppose if the application offloads I/O to a lib that handles it(that the dev may not need to do anything explicitly to do on the platform to use) then it's a free improvement?(as in no extra work required), eg Apps on KDE that use KIO perhaps?(assuming libs like KIO would need to first update support for it before it's dependents benefit)
Leave a comment:
discordian replied

14 February 2019, 11:38 AM
Originally posted by treba View Post

This sounds really exciting. Is it meant for general purpose use or rather for specific uae cases? Like, would it makes sense for a file manager to use it for copying? Or for firefox for profile data/on disk cache?

Primary for serving multiple outstanding IOs with various speeds/bottlenecks.. means mostly server stuff like sending files over the net.
filemanagers should use `sendfile`, caches are ideally mem-mapped.
Likes 1
Leave a comment:
danger replied

14 February 2019, 11:28 AM
The name of the ring sounds like a joke.
Likes 2
Leave a comment:
treba replied

14 February 2019, 10:25 AM
This sounds really exciting. Is it meant for general purpose use or rather for specific uae cases? Like, would it makes sense for a file manager to use it for copying? Or for firefox for profile data/on disk cache?
Likes 1
Leave a comment:

Announcement

Linux Kernel Getting io_uring To Deliver Fast & Efficient I/O

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: