Announcement

**milkylainen** · 12 August 2020, 03:50 AM

I don't see why you need to modify the kernel for that purpose?
Linux already provides near perfect CPU isolation.
Squeezing the last few cycles would mean that the kernel should be completely unaware of the cores?
Ie. Viewing them as accelerators? Sure. But cumbersome.

CPU isolation still gets all the normal functionality of userspace (If you need to use syscalls for opening files etc).
Seems like a lot of work for 0.0001% more performance?

**jabl** · 12 August 2020, 05:16 AM

Originally posted by milkylainen View Post

I don't see why you need to modify the kernel for that purpose?
Linux already provides near perfect CPU isolation.
Squeezing the last few cycles would mean that the kernel should be completely unaware of the cores?
Ie. Viewing them as accelerators? Sure. But cumbersome.

CPU isolation still gets all the normal functionality of userspace (If you need to use syscalls for opening files etc).
Seems like a lot of work for 0.0001% more performance?

OS jitter is a well known problem for certain tightly synchronized massively parallel applications. LWK's are one way around the problem, though I suspect you could achieve much of the same with the various cpu isolation etc. approaches + cgroups.

**onicsis** · 12 August 2020, 06:44 AM

Lightweight kernels?

How is different from a μ-kernel?

**zxy_thf** · 12 August 2020, 07:48 AM

This technical paper may elaborate mOS better: https://dl.acm.org/doi/abs/10.1145/2612262.2612263
(Although I've a feeling it's written with MS Word instead of latex)

**chuckatkins** · 12 August 2020, 02:36 PM

In fact, mOS can already be used on some supercomputers like ASCI Red, IBM Blue Gene, and others.

I'm wondering where this content actually came from. ASCI Red was the first machine of the DOE's major supercomputing initiative in the mid-90s, was decommissioned in 2006, and no longer realavant long before that. I expect there's a miscommunication here in that perhaps mOS is based on some design principals used in the OS that ran on ASCI Red.

**chuckula** · 12 August 2020, 03:10 PM

This is interesting but is probably more of a standardization of practices that are already being done on supercomputer clusters now instead of being completely new.

In general, when you hear that a supercomputer "RUNS LINUX" it tends to give the false impression that there's a full Ubuntu distro* slapped onto to every node in the system running just like your desktop would. That's not what happens in real systems. Instead, the compute nodes are running a very stripped-down OS for basic hardware support and just enough services to run whatever application code that is assigned to the node. The management nodes that control the distribution of jobs and general system monitoring/management are closer to standard Linux systems, but there are far fewer of them than there are compute nodes.

This mOS system seems to be a way to standardize the ultra stripped-down software that runs on the compute nodes where the "kernels" are the actual compute jobs that run with minimal overhead to optimize the nth degree of performance out of the hardware.

* Yes, I know that's an exaggeration but you get the point.

**indepe** · 12 August 2020, 03:31 PM

Originally posted by jabl View Post

OS jitter is a well known problem for certain tightly synchronized massively parallel applications. LWK's are one way around the problem, though I suspect you could achieve much of the same with the various cpu isolation etc. approaches + cgroups.

Yes, for supercomputers it may be a lot about that expensive last % of throughput, but often it is about latency, or not even latency but the possible consequences of a task getting interrupted at a point where other tasks will have to wait for it, possibly leading to a cascade of threads going into wait-conditions and taking so long to recover that they cause others to block in the meantime perhaps even in a circular way. And so on.

isolcpus + cgroups seems a half-documented hack with lots of half-documented or undocumented obstacles. Last time I tried to remove OS jitter in order to make clean performance measurements, I got stuck trying to change the affinity of the NVMe driver interrupts, which refuse to accept the usual method of changing affinity because the driver doesn't like what the commonly used irqbalance daemon tries to do, and neither seem to have working methods of leaving isolated cpus alone, or having an alternate way to configure that. And I guess that wouldn't be the end of it. There seems to be a lack of willingness by the kernel people to provide a clean way to support isolating CPUs. The internal design seem to originate from single CPU computers, and stuck with the idea that fairness consists of distributing everything evenly onto all CPUs.

**GDJacobs** · 12 August 2020, 09:53 PM

Originally posted by chuckatkins View Post

I'm wondering where this content actually came from. ASCI Red was the first machine of the DOE's major supercomputing initiative in the mid-90s, was decommissioned in 2006, and no longer realavant long before that. I expect there's a miscommunication here in that perhaps mOS is based on some design principals used in the OS that ran on ASCI Red.

It is. The nodes on ASCI Red ran an OS called Cougar which was a development of the software which ran on the Intel Paragon. A similar architecture was then used for Cray products like Red Storm and it's descendants beginning with Catamount, then Compute Node Linux (which was a little heavier and not as special purpose). The counterpart for Blue Gene is CNK.

Presumably the guys at the big labs find something less than satisfactory with CNL and want to cut it down further to squeeze out some extra performance.

Cgroups is actually less than optimal for this application. CNK, for instance, runs a single user process allocated to the compute CPU by SLURM without virtual memory or context switching at all.

**rolf** · 13 August 2020, 03:28 PM

Hello Everyone,
the paper mentioned by zxy_thf was not written in MS Word. Why would we do that? I feel insulted ;-) But, it is rather old and mOS was still in the early design phase at that time. More recent work is in various ROSS workshops (https://www.mcs.anl.gov/events/workshops/ross/2020/) Some performance numbers are in this 2018 IPDPS paper. A detailed description of the design is in chapter 18 of this book. The book documents the evolution of HPC operating systems over the last three decades.
The connection of mOS to ASCI Red and IBM's CNK is that a good number of people in the mOS team have come from Sandia Labs and IBM where they have worked on these operating systems in the past. There is no code from these earlier projects in mOS, but we are hoping to carry forward some of the spirit that went into these earlier efforts.
mOS is additional code and some modifications in Linux that come into play on a subset of the cores of a compute node. The remaining cores run just the Linux kernel. But, it is more than just some modifications to the scheduler. Jitter (OS noise) is very important, but so is physical memory management in systems with many NUMA domains, and process and thread placement.
In essence, an OS is administrative overhead. It provides some important functions that we want most of the time and we are willing to pay the cost for that. In supercomputers with millions of cores it seems we don't need all of the OS functions on all these cores. That is where lightweight kernels come in. The idea is to hand the raw hardware performance to the application. Especially in the past, HPC applications needed to do floating point operations and send and receive messages. No other devices or services were needed. That is changing a little especially with AI and ML applications, but it still seems to be the case there is a lot of computation that needs to be carried out that don't need a lot of OS services.
mOS is trying to preserve the desired Linux services but also provide the raw hardware performance to applications that need it.

Announcement

Intel Making Progress On Their "mOS" Modified Linux Kernel Running Lightweight Kernels

Intel Making Progress On Their "mOS" Modified Linux Kernel Running Lightweight Kernels

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment