Intel Making Progress On Their "mOS" Modified Linux Kernel Running Lightweight Kernels

cyring replied

30 September 2020, 02:09 PM
Mercure Operating System
Leave a comment:
rolf replied

13 August 2020, 08:14 PM
Thanks chuckatkins Fugaku runs another multi-kernel called McKernel. It is being developed by a team at RIKEN. We have collaborated with that team for five years and written papers together. The fundamental ideas of both mOS and McKernel are the same: Isolate some of the many cores available on a modern compute node and run a specialized kernel on them, tailored for a specific set of applications. The biggest difference between the two OSes is how we implemented them. mOS is compiled into the Linux kernel while McKernel is a stand-alone lightweight kernel. A kernel module establishes communication between McKernel and Linux, while in mOS we get that for free. Both approaches have pros and cons.
Accelerators are indeed a conundrum ;-) But even there, there might be things that we can do on the host side to help out. Particularly for applications that also make use of the host side capabilities, or don't benefit greatly from being offloaded.
Likes 1
Leave a comment:
chuckatkins replied

13 August 2020, 06:02 PM
Originally posted by rolf View Post

The connection of mOS to ASCI Red and IBM's CNK is that a good number of people in the mOS team have come from Sandia Labs and IBM where they have worked on these operating systems in the past. There is no code from these earlier projects in mOS, but we are hoping to carry forward some of the spirit that went into these earlier efforts.

Originally posted by rolf View Post

In supercomputers with millions of cores it seems we don't need all of the OS functions on all these cores. That is where lightweight kernels come in.

I figured as much. The IPDPS paper is great; thank you for that! I appreciate the balanced approach you took in it of really giving a deep dive into the shortcommings of mOS. The entire project is of course very reminiscent of the BlueGene OS environments but those always kind of a PITA to deal with, largely because they were a one-off thing. It's almost like you said "Hey, the BlueGene OS stuff was cool, but it kinda sucked too. What if we made a thing like it that didn't suck?"

I expect this will be quite relevant on Fugaku since it follows the same many-core no-accelerator pattern with ARM as the KNL machines did. I'm curious though as to how these performance gains will translate to the upcoming DOE exsascale systems like Aurora, Frontier, and El Capitan which are all accelerator based so the vast majority of memory and compute management is on the GPUs and possibly wouldn't really gain much from the optimized implementations in mOS running on the CPU. Guess you'll need to find out :-)

Really interesting work!
Leave a comment:
rolf replied

13 August 2020, 03:28 PM
Hello Everyone,
the paper mentioned by zxy_thf was not written in MS Word. Why would we do that? I feel insulted ;-) But, it is rather old and mOS was still in the early design phase at that time. More recent work is in various ROSS workshops (https://www.mcs.anl.gov/events/workshops/ross/2020/) Some performance numbers are in this 2018 IPDPS paper. A detailed description of the design is in chapter 18 of this book. The book documents the evolution of HPC operating systems over the last three decades.
The connection of mOS to ASCI Red and IBM's CNK is that a good number of people in the mOS team have come from Sandia Labs and IBM where they have worked on these operating systems in the past. There is no code from these earlier projects in mOS, but we are hoping to carry forward some of the spirit that went into these earlier efforts.
mOS is additional code and some modifications in Linux that come into play on a subset of the cores of a compute node. The remaining cores run just the Linux kernel. But, it is more than just some modifications to the scheduler. Jitter (OS noise) is very important, but so is physical memory management in systems with many NUMA domains, and process and thread placement.
In essence, an OS is administrative overhead. It provides some important functions that we want most of the time and we are willing to pay the cost for that. In supercomputers with millions of cores it seems we don't need all of the OS functions on all these cores. That is where lightweight kernels come in. The idea is to hand the raw hardware performance to the application. Especially in the past, HPC applications needed to do floating point operations and send and receive messages. No other devices or services were needed. That is changing a little especially with AI and ML applications, but it still seems to be the case there is a lot of computation that needs to be carried out that don't need a lot of OS services.
mOS is trying to preserve the desired Linux services but also provide the raw hardware performance to applications that need it.
Likes 4
Leave a comment:
GDJacobs replied

12 August 2020, 09:53 PM
Originally posted by chuckatkins View Post

I'm wondering where this content actually came from. ASCI Red was the first machine of the DOE's major supercomputing initiative in the mid-90s, was decommissioned in 2006, and no longer realavant long before that. I expect there's a miscommunication here in that perhaps mOS is based on some design principals used in the OS that ran on ASCI Red.

It is. The nodes on ASCI Red ran an OS called Cougar which was a development of the software which ran on the Intel Paragon. A similar architecture was then used for Cray products like Red Storm and it's descendants beginning with Catamount, then Compute Node Linux (which was a little heavier and not as special purpose). The counterpart for Blue Gene is CNK.

Presumably the guys at the big labs find something less than satisfactory with CNL and want to cut it down further to squeeze out some extra performance.

Cgroups is actually less than optimal for this application. CNK, for instance, runs a single user process allocated to the compute CPU by SLURM without virtual memory or context switching at all.

Last edited by GDJacobs; 12 August 2020, 09:57 PM.
Likes 2
Leave a comment:
indepe replied

12 August 2020, 03:31 PM
Originally posted by jabl View Post

OS jitter is a well known problem for certain tightly synchronized massively parallel applications. LWK's are one way around the problem, though I suspect you could achieve much of the same with the various cpu isolation etc. approaches + cgroups.

Yes, for supercomputers it may be a lot about that expensive last % of throughput, but often it is about latency, or not even latency but the possible consequences of a task getting interrupted at a point where other tasks will have to wait for it, possibly leading to a cascade of threads going into wait-conditions and taking so long to recover that they cause others to block in the meantime perhaps even in a circular way. And so on.

isolcpus + cgroups seems a half-documented hack with lots of half-documented or undocumented obstacles. Last time I tried to remove OS jitter in order to make clean performance measurements, I got stuck trying to change the affinity of the NVMe driver interrupts, which refuse to accept the usual method of changing affinity because the driver doesn't like what the commonly used irqbalance daemon tries to do, and neither seem to have working methods of leaving isolated cpus alone, or having an alternate way to configure that. And I guess that wouldn't be the end of it. There seems to be a lack of willingness by the kernel people to provide a clean way to support isolating CPUs. The internal design seem to originate from single CPU computers, and stuck with the idea that fairness consists of distributing everything evenly onto all CPUs.
Leave a comment:
chuckula replied

12 August 2020, 03:10 PM
This is interesting but is probably more of a standardization of practices that are already being done on supercomputer clusters now instead of being completely new.

In general, when you hear that a supercomputer "RUNS LINUX" it tends to give the false impression that there's a full Ubuntu distro* slapped onto to every node in the system running just like your desktop would. That's not what happens in real systems. Instead, the compute nodes are running a very stripped-down OS for basic hardware support and just enough services to run whatever application code that is assigned to the node. The management nodes that control the distribution of jobs and general system monitoring/management are closer to standard Linux systems, but there are far fewer of them than there are compute nodes.

This mOS system seems to be a way to standardize the ultra stripped-down software that runs on the compute nodes where the "kernels" are the actual compute jobs that run with minimal overhead to optimize the nth degree of performance out of the hardware.

* Yes, I know that's an exaggeration but you get the point.
Leave a comment:
chuckatkins replied

12 August 2020, 02:36 PM
In fact, mOS can already be used on some supercomputers like ASCI Red, IBM Blue Gene, and others.

I'm wondering where this content actually came from. ASCI Red was the first machine of the DOE's major supercomputing initiative in the mid-90s, was decommissioned in 2006, and no longer realavant long before that. I expect there's a miscommunication here in that perhaps mOS is based on some design principals used in the OS that ran on ASCI Red.
Leave a comment:
zxy_thf replied

12 August 2020, 07:48 AM
This technical paper may elaborate mOS better: https://dl.acm.org/doi/abs/10.1145/2612262.2612263
(Although I've a feeling it's written with MS Word instead of latex)
Likes 2
Leave a comment:
onicsis replied

12 August 2020, 06:44 AM
Lightweight kernels?
How is different from a μ-kernel?
Leave a comment:

Announcement

Intel Making Progress On Their "mOS" Modified Linux Kernel Running Lightweight Kernels

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: