Announcement

**gamerk2** · 18 April 2016, 09:13 AM

The issue with per-core runqueues is that at some point, at least one runqueue will not have any runnable threads, while another has more then one. It's not an optimal design, but an easier one to implement that works "good enough" for most common tasks. For something like gaming though, it's woefully inadequate, as I've been complaining about here for several years now.

At the end of a day, if you have a runable thread, and you have an idle CPU core, there is ZERO reason to sit around doing nothing. Now yes, internal CPU headaches can be a pain here (per-core L1/L2 caches especially), but if you have a high priority thread waiting to run and a CPU core on which to run it, the benefits of doing so almost always outweigh the negatives.

**duby229** · 18 April 2016, 09:49 AM

Originally posted by gamerk2 View Post

The issue with per-core runqueues is that at some point, at least one runqueue will not have any runnable threads, while another has more then one. It's not an optimal design, but an easier one to implement that works "good enough" for most common tasks. For something like gaming though, it's woefully inadequate, as I've been complaining about here for several years now.

At the end of a day, if you have a runable thread, and you have an idle CPU core, there is ZERO reason to sit around doing nothing. Now yes, internal CPU headaches can be a pain here (per-core L1/L2 caches especially), but if you have a high priority thread waiting to run and a CPU core on which to run it, the benefits of doing so almost always outweigh the negatives.

I mostly agree with you.

In fact the whole entire freakin point of local caches is to increase transparency by reducing overall latency. The real truth is, unless something really stupid happens, the scheduler doesn't have to worry about it.

**Zan Lynx** · 18 April 2016, 02:58 PM

Originally posted by gamerk2 View Post

At the end of a day, if you have a runable thread, and you have an idle CPU core, there is ZERO reason to sit around doing nothing.

Unless you're trying to save power by shutting down a core. Or it is more efficient for other threads to share hot cache. Or the idle core is a hyperthread sibling of a high priority thread. Or if the extra calculations in the scheduler to make "smart" decisions eat up all the time savings.

There are a LOT of reasons that might end up leaving an idle core.

**SofS** · 18 April 2016, 03:01 PM

Well, I do happen to know something about this subject. My master dissertation was related to it.

First, there are I/O schedulers, the process scheduler, cgroups/cpusets and the domain scheduler. Besides power management.

What seems to be of concern here is the combination of everything, but particularly so the domain scheduler (quick reference https://lwn.net/Articles/80911/).

This will become more of a concern with new power management mechanisms being tightly integrated into the processor itself instead of the OS, which might be a good thing, Also, and more importantly, the new architectures are becoming more hybrid, particularly considering the network on chips aproach.

We have relied for too long on manufacturing improvements, this is why we are stuck with these generic architectures. Recently however, these improvements are becoming more and more difficult to achieve. There are already lots of hybrid techniques in use, from expanding the instruction set to dedicated integrated parts, not only the GPU.

It is inevitable that in the future we will have to care about the particularities of the tasks to be performed instead of just trowing more brute force into them, as such we will obvious need to seriously account for proper resource management for each use case. Frankly, about time, computers are tools, it is not reasonable to fit the task to the tool but to use the proper tool for the task to be done.

**gamerk2** · 20 April 2016, 06:52 AM

Originally posted by Zan Lynx View Post

Unless you're trying to save power by shutting down a core. Or it is more efficient for other threads to share hot cache. Or the idle core is a hyperthread sibling of a high priority thread. Or if the extra calculations in the scheduler to make "smart" decisions eat up all the time savings.

There are a LOT of reasons that might end up leaving an idle core.

Typically, cores get shut down when there's no work to assign to them, and wake up once there is. So that's not a valid point for this argument.

CPU cache management is a concern, but very few apps are bound by CPU cache performance, and those that are should probably lock threads to specific cores. At most, the scheduler should attempt to try and keep threads running on the same core(s) if possible, but that's not an excuse to keep a thread waiting for an indefinite period of time. Farther, given some CPUs have shared L2 caches and other don't, you end up having to expose a lot of low level CPU functionality in the scheduler that, frankly, shouldn't be there.

As for Hyperthreading, Intel has a CPUID bit to identify CPU cores with that functionality, and the scheduler should try to avoid allocating multiple high workload threads on shared cores whenever possible. But again, that's not an excuse to keep a runnable thread waiting for indefinite periods of time.

Let's be clear here: Per-Core runqueues are not a particularly optimal scheduling solution. While they do solve a few problems that other schedulers (Windows) has [Managing SMT cores, CPU cache management], it's not going to maximize performance once you have dozens of high workload threads are fighting for CPU time. A good example program: Games.

**SystemCrasher** · 20 April 2016, 11:50 PM

Originally posted by gamerk2 View Post

it's not going to maximize performance once you have dozens of high workload threads are fighting for CPU time. A good example program: Games

In games you care both about performance and latency. Somehow, they are often opposite of each other. I.e. best bulk performance achieved if you just run code and do not interrupt it at all, so less context switches happen, cache is hot, and so on. But guess what happens to latency? Everything else is stalled for a while, latency suxx. Trying to get better latency could reduce bulk performance a bit. As an obvious example, default ubuntu kernel is not preemptible. It warrants a bit better performance, sure. But this kills latency a lot and user experience coud be crappy under load. If kernel does something for a while and can't be interrupted, you get a really laggy system.

**duby229** · 21 April 2016, 09:29 AM

Originally posted by SystemCrasher View Post

In games you care both about performance and latency. Somehow, they are often opposite of each other. I.e. best bulk performance achieved if you just run code and do not interrupt it at all, so less context switches happen, cache is hot, and so on. But guess what happens to latency? Everything else is stalled for a while, latency suxx. Trying to get better latency could reduce bulk performance a bit. As an obvious example, default ubuntu kernel is not preemptible. It warrants a bit better performance, sure. But this kills latency a lot and user experience coud be crappy under load. If kernel does something for a while and can't be interrupted, you get a really laggy system.

That may be true in absolute terms, but 99.9% of people will never notice the difference between a preemptable kernel and a stock kernel as long as there is plenty of free RAM. A lot of people believe that if you have free RAM you should do something to fill it up, but really that's bullshit. Having available RAM is very, very important for highly responsive systems. (Only people with very bloated low RAM systems will notice it)

The attitude because it's there it should be full is total nonsense.

EDIT: Is a preemptable kernel even important on modern systems with many cores? I can't imagine any benefits.

**kebabbert** · 21 April 2016, 10:36 AM

It is interesting to see what Con Kolivas, the Linux kernel hacker who wrote Linux schedulers, say about the Linux source code:

Other schedulers? Illumos?

http://ck-hack.blogspot.se/2010/10/other-schedulers-illumos.html

I've been asked in past interviews what I've thought about schedulers from other operating system kernels that I may have studied, and if I ...

TL;DR he says that the Linux scheduler code quality sucks.

Also, among Unix and Mainframe sysadmins, Linux have always had a bad reputation of being unstable. During light loads, Linux is stable. Even Windows is stable if the server idles. But under very high load, Linux becomes very jerky and stuttery and some threads finish fast, other threads takes a very long time. Add in the "Ram overcommit syndrome" where Linux randomly kills processes when all of RAM is filled up (imagine Linux killing of the Oracle database process!!!) and it is understandable why Unix sysadmins would never let Linux into their high end Enterprise server halls. It is the same reason that Enterprise companies who use Linux, ALWAYS make sure Linux servers are lightly to medium loaded. They know that if load increases much, there is a high probability that Linux becomes unstable.

Some of these Linux stability problems that Unix sysadmins have talked about for decades, might be explained by the Linux scheduler.

Criticism of Linux - Wikipedia

https://en.wikipedia.org/wiki/Criticism_of_Linux#Linux_Kernel_criticisms

**SystemCrasher** · 21 April 2016, 10:56 AM

Originally posted by duby229 View Post

That may be true in absolute terms, but 99.9% of people will never notice the difference between a preemptable kernel and a stock kernel as long as there is plenty of free RAM.

Even if you have got plenty of RAM it is overlyoptimistic to assume kernel would complete all requests really fast.

A lot of people believe that if you have free RAM you should do something to fill it up, but really that's bullshit. Having available RAM is very, very important for highly responsive systems. (Only people with very bloated low RAM systems will notice it)

I would recommend 'em to switch off swap on mechanic drives and try zram. SSDs also drastically cut times of everything related to "disk" activity. However, it still could take a while to complete request under heavy load and inability to interrupt kernel leads to higher latency.

EDIT: Is a preemptable kernel even important on modern systems with many cores? I can't imagine any benefits.

Benefit is ability to kick kernel out of the way instead of waiting when it would complete operations. It really depends if there going to be benefit, or how large it going to be. Yet, humans notice jumpy mouse cursor and somesuch much easier than +/- few percents of bulk performance. Even reliably measuring difference between LL and non-LL kernel would take a lot, unless you figure out some strange corner cases (similar to mentioned 137x difference).

**duby229** · 21 April 2016, 12:06 PM

Originally posted by SystemCrasher View Post

Even if you have got plenty of RAM it is overlyoptimistic to assume kernel would complete all requests really fast.

I would recommend 'em to switch off swap on mechanic drives and try zram. SSDs also drastically cut times of everything related to "disk" activity. However, it still could take a while to complete request under heavy load and inability to interrupt kernel leads to higher latency.

Benefit is ability to kick kernel out of the way instead of waiting when it would complete operations. It really depends if there going to be benefit, or how large it going to be. Yet, humans notice jumpy mouse cursor and somesuch much easier than +/- few percents of bulk performance. Even reliably measuring difference between LL and non-LL kernel would take a lot, unless you figure out some strange corner cases (similar to mentioned 137x difference).

Certainly SSD's have more bandwidth and lower latency than HDD's. That's true, but regardless they are still horribly slow, in fact the slowest component of your computer by thousands of times. The best way to implement a highly responsive system is to reduce bloat by removing all the things you'll never use. Make sure there is free RAM and available cores

Announcement

Is The Linux Kernel Scheduler Worse Than People Realize?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment