Announcement

**ayumu** · 23 July 2022, 02:29 PM

Originally posted by oiaohm View Post

The fast IPC in the Linux kernel is not a few hundred cycles its 24 cycles. https://www.usenix.org/system/files/atc20-gu.pdf Yes this 24 cycles in figure 5 page 409. --- The reality when it comes to [/FONT]Cross-server IPC Linux kernel is insanely fast.

I sure do hope you realize that you're literally looking at a function call's overhead. This is not the cost of IPC, but rather, the advantage of not doing IPC. A function call doesn't cost any more (or less) to do within a microkernel.

But, often times, you have to do IPC instead of a function call. This is true even on a Linux system, which does run the kernel in supervisor mode so it won't need to use the IPC itself, but still runs everything else in user mode processes, and these will need to talk to each other.

Multi-server systems will, by design, need to do a lot of IPC, relative to non-multiserver systems. The effect this has in performance is a function of the increase in IPC and the cost of IPC.

Yes sel4 claim about being the fastest Microkernel that is also disproved in that 2020 paper I have referenced.

Refer to https://sel4.systems/About/seL4-whitepaper.pdf , Chapter 6 for the actual claim.

Notice that seL4 is a formally proven microkernel, whereas ChCore/UnderBridge is but a paper and PoC.

I can be faster than seL4 too, it doesn't actually take much effort to make a PoC with a constrained scenario. However, designing a kernel that is fast in general and practical for the industry to use is not trivial.

The Microkernel world is very guilty of attempt to sweep their issues under the rug and hope nobody notices. Sorry a party like me notices.
...
The reality Linus 1992 statement on Microkernel IPC being horrible slow is still just as true today.
...
Yes it not well hidden when you know what you are looking at.

Yes, the whole "Microkernel world" is conspiring to hide the truth, but Linus Torvalds our Lord and Savior is always right, especially where it comes to microkernels, a topic he is an expert on.

https://www.cosy.sbg.ac.at/~clausen/...-rebuttal.html

Totally not true. What you just describe is what the real time patch set to the Linux kernel provided and has been progressive merged.

Uh, do read my claim again.

Linux doesn't have any fast paths with guaranteed maximum latency. Everything is Best Effort..

Is absolutely true. Even if you use the realtime patch (as I do). Linux, with or without realtime patch, cannot guarantee latencies. It will allow Soft Realtime scenarios, but it is completely unsuitable for Hard Realtime, which requires these guarantees.

ayumu; like it or not you are writing a huge stack of stuff not based in fact.

You're calling me a liar, but you're not providing even a hint of counter evidence to support your claim.

But you you think that the Linux kernel and other monolithic kernels has not improved IPC over the years? This is the catch Linux kernel IPC cost has decreased faster than the Microkernel IPC cost.

Thanks to the addition of fine-grained locking, pretty much every path in the Linux kernel is now both significantly longer and dramatically less deterministic. Linux was way, way better at this before the huge SMP rework.

**oiaohm** · 23 July 2022, 04:12 PM

Originally posted by ayumu View Post

I sure do hope you realize that you're literally looking at a function call's overhead. This is not the cost of IPC, but rather, the advantage of not doing IPC. A function call doesn't cost any more (or less) to do within a microkernel.

Except 24 cycles is not the function call price at all. 24 cycles is the IPC cost in Linux between kernel threads. There are methods like RCU in the mix as well.

Function call overhead is 4 to 12 cycles. X86 7. Sorry Linux in-kernel IPC is twice as expensive best you might think by the 12 but not its about 3 to 4 times more expensive than doing function call. That the thing your Microkernel IPC is not coming in at 3 to 4 times the cost of a function call.

ayumu really no this arguement you just did is the common microkernel person arguement that that 24 has to be function call overhead it cannot be iPC. Problem is 24 cycles is Linux kernel in kernel memory space IPC. Remember linux kernel memory space has NUMA in it. So the monolithic kernel is not in fact in kernel mode operating in a single solid memory between CPUs. Yes SMP you see monolithic kernel gain threads and NUMA memory that is CPU core separated memory. Yes SMP you have to have IPC in the supervisor mode like it or not..

Originally posted by ayumu View Post

But, often times, you have to do IPC instead of a system call. This is true even on a Linux system, which does run the kernel in supervisor mode, and everything else in user mode.

You have just completely forgot that Linux kernel threads run in different NUMA domains on SMP systems. Rings(Circular buffer) and RCU are very popular Linux kernel IPC systems. Due to having very low overheads. That supervisor mode once you are SMP and NUMA its no longer a single piece.

Originally posted by ayumu View Post

Multi-server systems, by design, will have to do a lot of IPC, relative to non-multiserver systems. The effect this has in performance is a function of the increase in IPC and the cost of IPC.

Again this has you presuming Linux kernel is something that its not. Have you not notice that kthreads appear in the Linux kernel process list. Yes in [] of course. Reality these are really processes all running in the supervisor mode memory. Remember what IPC stands for Inter-process communication and the fact you have multi process in the Linux supervisor mode all communicating with each other is IPC. Linux kernel is monolithic in having a single memory domain for kernel space but its also not a non-multiserver system. There are in fact multi processes in the linux kernel supervisor mode and many of those are what are called by OS development servers.

The reality here the Linux kernel supervisor mode has a lot in common with a nommu multiserver system. Same short cut Windows Hybrid kernel pulls basically. Of course putting everything in a nommu/single memory domain basically removes the protections MMU can give. Trade off here is losing security for performance.

Going across memory domains is expensive. By the way being a multiserver system does not require you to have a Microkernel. Horrible reality of VMS being monolithic and also multiserver with no common shared IPC. VMS development starts on NOMMU systems.

Yes the Multiserver idea does not start with Microkernels it starts with monolithic kernels. Microkernel is attempt to formalise/improve the monolithic kernel multiserver solutions. Major goal with Microkernel was to improve the security and reduce the coding complexity.

Originally posted by ayumu View Post

Is absolutely true. Even if you use the realtime patch (as I do). Linux, with or without realtime patch, cannot guarantee latencies. It will allow Soft Realtime scenarios, but it is completely unsuitable for Hard Realtime, which requires these guarantees.

This is that you have not understood the Linux kernel realtime patch set requirements to work well. The Linux realtime patch set does not give you guarantee latencies with all hardware. There is hardware where you do have guarantee latencies with the Linux kernel real-time patches.

Originally posted by ayumu View Post

Thanks to the addition of fine-grained locking, pretty much every path in the Linux kernel is now both significantly longer and dramatically less deterministic. Linux was way, way better at this before the huge SMP rework.

There is one big problem with this statement. The fine-grained locking comes from the Real-time work with the Linux kernel.

Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe — The Linux Kernel documentation

https://docs.kernel.org/locking/preempt-locking.html

Making the locking fine grained allows is what allows the preempt to do hard real-time. Its the fine grained that allows background processes to be stalled intentionally in a lock. So the Linux kernel realtime patchset gets more deterministic the more fine-grained the locking is. Guess what is a big cause of Linux kernel for real-time stuff not behaving deterministic the answer is in fact areas of course locking making the stall values larger. Yes platform drivers locking quality is very important to how good or how bad Linux kernel behaves in real-time.

**ayumu** · 23 July 2022, 06:10 PM

Originally posted by oiaohm View Post

RCU are very popular Linux kernel IPC systems.

Shared memory is not exclusive to Linux or monolithic kernels. It is, indeed, available to microkernels, too.

But the best way to use shared memory is to design so that you don't have to use it at all. There are a lot of problems with using shared memory. There's the need for synchronization, which is a source of non-determinism and does not scale. There is also the complexity it introduces. It is infamous, and it very much does deserve so. It literally breeds bugs.

RCU is, sometimes, a clever use of shared memory. If you are reading, RCU is indeed fast. If you are reading, that is.

The reality here the Linux kernel supervisor mode has a lot in common with a nommu multiserver system. Same short cut Windows Hybrid kernel pulls basically. Of course putting everything in a nommu/single memory domain basically removes the protections MMU can give. Trade off here is losing security for performance.

Yup. And Apple deals with Mach's slowness similarly. For Linux, it is even possible to go all the way and run everything in kernel mode. That's what Kernel Mode Linux does.

In many cases, you might as well. Because in many scenarios, Linux turns out to be bigger than the whole of userspace you have, so you'd not be losing that much more security or reliability by doing so.

But while that sort of raw speed you can get from disregarding isolation can be useful, it seldom is. Because in most practical scenarios, there's a need for security or reliability. Unfortunately, it's often both.

Nevermind the high assurance world, we need the workstations and desktops we use to be secure and reliable. And the currently popular desktop operating systems are not up to the task. We are used to putting up with these shortcomings. But it could be done better. And it will be.

Going across memory domains is expensive. By the way being a multiserver system does not require you to have a Microkernel. Horrible reality of VMS being monolithic and also multiserver with no common shared IPC. VMS development starts on NOMMU systems.

Yes the Multiserver idea does not start with Microkernels it starts with monolithic kernels. Microkernel is attempt to formalise/improve the monolithic kernel multiserver solutions. Major goal with Microkernel was to improve the security and reduce the coding complexity.

Yes. They are separate things. That's why I am so verbose and say "Microkernel" and "multiserver" all the time.

However, it is no coincidence that multiserver systems tend to be implemented on top of microkernels; When you're doing everything with IPC, you'll want a kernel that gives you the right tools for that, is optimized for the purpose and doesn't force any other bloat upon you. Mach tried to do this, but it was slow.

Liedtke's research found how to do that correctly, and did provide such a kernel (L4). Decades of industry use of microkernels provided several iterations to polish these ideas further. SeL4 is the state of the art.

This is that you have not understood the Linux kernel realtime patch set requirements to work well. The Linux realtime patch set does not give you guarantee latencies with all hardware. There is hardware where you do have guarantee latencies with the Linux kernel real-time patches.

If you need hard realtime, you're going to need both deterministic hardware (Like e.g. ARM Cortex R series or Imatech's recent RISC-V realtime cores) and a deterministic RTOS, such as seL4.

The realtime patchset does not provide guarantees of any sort. It improves Linux behaviour dramatically, but it does not make Linux deterministic. It is thus not suitable for Hard Realtime.

Making the locking fine grained allows is what allows the preempt to do hard real-time. Its the fine grained that allows background processes to be stalled intentionally in a lock. So the Linux kernel realtime patchset gets more deterministic the more fine-grained the locking is. Guess what is a big cause of Linux kernel for real-time stuff not behaving deterministic the answer is in fact areas of course locking making the stall values larger. Yes platform drivers locking quality is very important to how good or how bad Linux kernel behaves in real-time.

The realtime patchset abuses (that's the word) the SMP locking mechanisms to increase preemption points.

This does help with latency. It helps a lot. It can be easily measured by using cyclictest from rt-test, in SCHED_FIFO or SCHED_RR scheduling class. The average and, more importantly, max columns dramatically drop relative to mainline kernel's behavior. And a desktop benefits, because audio latency can now be reasonably low (2-3ms) without Xruns, rather than outright bad (>20ms).

But... that's just soft realtime.

Unfortunately, hard realtime can't care less about measured averages or maximums. It only cares about having formal proofs of worst case execution time (WCET). The complexity of these proof does not grow linearly with complexity of code. It does unfortunately explode real quick.

Adding "soft realtime" support to Linux makes it more complex and less tractable from the hard realtime PoV. But yes, this really is a pointless statement, knowing that in Linux, the "providing proof" problem it is not tractable to begin with. Not even by a far stretch.

The industry has mostly dealt with complex hard realtime scenarios in two ways: Dedicated hardware (a cpu and entire system for each task) and strict time division (for 2 realtime tasks in the same system, they would get 50% reservation each minus system overhead, whether they use it or not... the system will just idle for the remaining of the slot). seL4 allows full utilization, and it also allows running critical and non-critical tasks in the same system (the new seL4 "MCS kernel", which will be the seL4 mainline soon). This is a game changer.

In the desktop scenario, we're used to not having any of this, and with putting up with the limitations of the systems we use. For that matter, Linux with realtime patchset is very survivable, and makes non-toy audio use possible; I've been using the patchset for a good decade now, with no major issues.

**oiaohm** · 24 July 2022, 03:51 AM

Originally posted by ayumu View Post

Shared memory is not exclusive to Linux or monolithic kernels. It is, indeed, available to microkernels, too.

But the best way to use shared memory is to design so that you don't have to use it at all. There are a lot of problems with using shared memory. There's the need for synchronization, which is a source of non-determinism and does not scale. There is also the complexity it introduces. It is infamous, and it very much does deserve so. It literally breeds bugs.

RCU is, sometimes, a clever use of shared memory. If you are reading, RCU is indeed fast. If you are reading, that is.

When talking security not using shared memory kind of make sense. When talking performance you have to use shared memory.. To be correct zero copy shared memory where possible. https://lwn.net/Articles/879724/ yes memory based IPC.

Also io_uring the user userspace version is slower than equal kernel to kernel on a cpu with a MMU. You know that Page Directory Table thing that you have per process and there is one to the kernel space..... it has a performance cost in updating.

Originally posted by ayumu View Post

But while that sort of raw speed you can get from disregarding isolation can be useful, it seldom is. Because in most practical scenarios, there's a need for security or reliability. Unfortunately, it's often both.

I like where you say most practical scenarios. Problem is there is a lot of cases where you need performance. Not want it but need it. Classic is high speed network processing. Yes these cases if you are slow in processing you will be basically in a denial of service event.

Yes you just wrote security or reliability. Horrible point here that those doing lot of hard realtime lose is that there are use cases where performance is critical to get reliability. Yes the denial of service event from being too slow in processing problem.

Originally posted by ayumu View Post

Nevermind the high assurance world, we need the workstations and desktops we use to be secure and reliable. And the currently popular desktop operating systems are not up to the task. We are used to putting up with these shortcomings. But it could be done better. And it will be.

Here you go again. Secure and reliable. The reality is you cannot have 100 secure and 100 percent reliable. Reliable is this horrible thing that depend on the case you either need secure to be reliable or performance to be reliable.

Originally posted by ayumu View Post

If you need hard realtime, you're going to need both deterministic hardware (Like e.g. ARM Cortex R series or Imatech's recent RISC-V realtime cores) and a deterministic RTOS, such as seL4.

https://trustworthy.systems/projects/TS/SMACCM/ Notice here for the flight controller they are not using sel4 but echronos. Turns out when you need high performance deterministic microkernel does not work. Yes echronos is a monolithic .

Originally posted by ayumu View Post

Unfortunately, hard realtime can't care less about measured averages or maximums. It only cares about having formal proofs of worst case execution time (WCET). The complexity of these proof does not grow linearly with complexity of code. It does unfortunately explode real quick.

Adding "soft realtime" support to Linux makes it more complex and less tractable from the hard realtime PoV. But yes, this really is a pointless statement, knowing that in Linux, the "providing proof" problem it is not tractable to begin with. Not even by a far stretch.

This is right and wrong. There are formal proof with Linux kernel with real-time patches with WCET only on particular hardware with particular drivers. Soft realtime does in fact lower the WCET value of the hard realtime parts while increasing utilization.

Originally posted by ayumu View Post

The industry has mostly dealt with complex hard realtime scenarios in two ways: Dedicated hardware (a cpu and entire system for each task) and strict time division (for 2 realtime tasks in the same system, they would get 50% reservation each minus system overhead, whether they use it or not... the system will just idle for the remaining of the slot). seL4 allows full utilization, and it also allows running critical and non-critical tasks in the same system (the new seL4 "MCS kernel", which will be the seL4 mainline soon). This is a game changer.

This is no different to the plan to make SCHED_DEADLINE a cgroup option under the Linux kernel. Yes to get full utilisation you need the soft realtime stuff. Because like it or not those tasks filling up the under utilised CPU time cannot be deterministic.

Its the SCHED_DEADLINE under Linux that gives you the WCET. Yes the Completely Fair Scheduler is used for the soft real-time filler. Yes that filler is dependant on fine locking so that it can fill in the small CPU time slices . Critical and non-critical are linked. Now there is something useful if you know that X deadline task that has complete before its WCET will be needing Y lock when it start up can can be place non-critical tasks that use that lock in to run before the WCET is hit.

See why fine locking is critical. Large the code area the lock covers the bigger time slice is required items using that lock. So the more course your locking the lower your utilisation goes. There is a point where you have too much locking and you start losing utilisation at that point but the reality is the Linux kernel is not at this point yet.

Originally posted by ayumu View Post

In the desktop scenario, we're used to not having any of this, and with putting up with the limitations of the systems we use. For that matter, Linux with realtime patchset is very survivable, and makes non-toy audio use possible; I've been using the patchset for a good decade now, with no major issues.

The reality here is on particular hardware Linux kernel with RT patch set truly does have proper WCET for hard realtime. Yes with formal proofs and all.

Remember before sel4 it was believed that it was not possible in a time effective way to formal proof even a Microkernel. So before that all formal proofed OS had to lag years to decades behind the leading edge of the OS development.

ayumu a lot of people like you pushing the idea of Microkernels like to forgot that the Linux kernel is at times a proper Hard Realtime OS. Yes Linux kernel is only Hard Realtime OS with formal proofs on very select hardware. So Linux kernel being a proper Hard realtime OS with the correct formal proofs is absolutely possible. The problem is the same as what existed before sel4 that its at this stage too human time consuming to be formal proofing. Yes just like sel4 making tools to formal proof the solution to the Linux problem here is tools. Yes the open question is can these tools be made. Yes why it a open question is the amount of processing time to perform the formal proof.

The Linux/monolithic problem for Hard-realtime equals we need better tools to perform the proofing on them. This explains the interest in rust in the Linux kernel. Rust code reduces what you have to run formal proof for because the language forbids stacks of stupidity that C allows.

Micro-kernel lower complexity equals need less complex tools to formal proof in time effective way.

In fact the Linux kernel need for better programming languages and better tools can be helpful to sel4 and other microkernels to formal proof more time effectively.

ayumu when it comes to hard-realtime microkernels are not be all and end all. Lot of RTOS kernel are in fact monolithic particularly in the cases were performance is absolutely critical. Yes were data security is more critical than performance you see microkernels with worse WCET.

WCET values on RTOS kernels are interesting same thing Monolithic RTOS have better wCET values than their Microkernel RTOS. So the problem how to validate a monolithic kernel does not go away. Of course the smaller and hardware restricted RTOS monolithic kernels are way more simple to validate/proof than the Linux kernel.

When it comes to formal proofing a OS the Linux kernel is kind the computer worlds equal to mount Everest. Remember even climbing mount Everest today you are not 100 percent sure to come back alive. Remember at this stage if we truthful we don't know if the formal proof problem of the Linux kernel is truly solvable but we also don't know if it not.

Microkernel developers basically are taking the belief that the formal proof problem of the Linux kernel cannot solved in a time effective way covering broad hardware. Of course the problem is the Microkernel choice comes with a performance cost. Reducing the complexity is not a free ride.

Announcement

X.Org Server Hit By New Local Privilege Escalation, Remote Code Execution Vulnerabilities

Comment

Comment

Comment

Comment