Announcement

**HyperDrive** · 14 July 2021, 12:04 PM

Originally posted by intelfx View Post

This hack (while still basically a hack) is significantly more precise and self-contained than your hack which involves tuning several unrelated knobs with very wide area of effect.

Unrelated, swappiness and page-cluster, really? From the documentation I've read, they couldn't be more related…

**intelfx** · 14 July 2021, 12:24 PM

Originally posted by HyperDrive View Post

Unrelated, swappiness and page-cluster, really? From the documentation I've read, they couldn't be more related…

Yes, unrelated. vm.swappiness controls relative cost of swapping anonymous pages to writing out file-backed pages, and vm.page-cluster controls basically readahead. Neither of which is related to out-of-memory behavior whatsoever.

Even if it is possible to monkey around with them until the system behavior resembles something you want, it's basically a coincidence.

**M@GOid** · 14 July 2021, 12:25 PM

Originally posted by ezst036 View Post

There are corporations out there in their 2-3 year refresh cycles beginning to throw computers out to e-waste with 16-32 GB of memory in them.

Its crazy.

And Windows 11 is going to (at least in the current generation of hardware now existing) accelerate this rush to new hardware with a large amount of machines that will go out to e-waste and not be used despite having plenty of usable life left in them. These are not sluggish nor ancient machines.

These companies are not trowing it in the trash. First, the reason to renew your installed fleet of desktops is maintenance. Those machines are out of warranty and the staff know by heart now that after a predefined period, things start to break left and right, so is better for productivity simply replace the things from time to time.

Second, the thing is they don't threw them out, but sell those machines for companies that refurbish them en-masse for reselling. See all those sellers in Ebay with hundreds of used enterprise machines to sell? Those things only go to the trash once in the hand of lazy domestic owners, many years past the prime of those machines.

**hakavlad** · 14 July 2021, 12:25 PM

Originally posted by HyperDrive View Post

So you hit a VM corner case which manifests itself when there's no swap. I use swap on zram on all of my systems (no swap to rust/SSD, ever, even on machines with less than 2 GiB or RAM) and I've never, ever, seen the issue you describe in that email (I've seen oom kills, but hey, resources aren't infinite). To me, the solution is obvious: enable swap. No need to hack the kernel until a proper solution is found.

Ok, at the end you may have SwapFree=0.

In fact, when the swap is exhausted, the same problem will appear: the file cache will also be exhausted, and you will get thrashing.

**bple2137** · 14 July 2021, 12:29 PM

Originally posted by hakavlad View Post

le9 maintainer here, ask any questions.

I remember writing something dumb in Python. I was using multiple threads and appending some computed values to a list. The code should clean old values up, but by mistake it wasn't or it wasn't doing that quickly enough to make some free space. The result was, my 16G machine went out of RAM in seconds and became completely frozen. HDD led was constantly on like it was trying to utilize swap like crazy. Apart from that, no reaction for any sort of input. It was reproducible every time.

It was obviously something silly made for fun some time age so can't tell what it was doing exactly, but I believe I tried to make a script to compare speed of some operations after I bought a new CPU. I was really surprised how easy it is to jam the Linux kernel completely with just few lines of code. I was expecting such process to be killed, but I probably missed some configuration as it was pretty clean installation of Arch. I can't remember which kernel version it was, but 5.x for sure.

Do you think that the kernel with your patches (with no extra security or tweaks) would be better prepared to handle cases like that? Or could you elaborate on what was going on? (at least by suspecting based on my poor description)

**hakavlad** · 14 July 2021, 12:30 PM

Originally posted by HyperDrive View Post

So you hit a VM corner case which manifests itself when there's no swap. I use swap on zram on all of my systems (no swap to rust/SSD, ever, even on machines with less than 2 GiB or RAM) and I've never, ever, seen the issue you describe in that email (I've seen oom kills, but hey, resources aren't infinite). To me, the solution is obvious: enable swap. No need to hack the kernel until a proper solution is found.

>No need to hack the kernel until a proper solution is found.

The problem is the depletion of the file cache. Protecting cache is proper solution.

**HyperDrive** · 14 July 2021, 12:34 PM

Originally posted by intelfx View Post

vm.page-cluster controls basically readahead.

It controls swap read-ahead exclusively, not any other type of read-ahead. It's quite explicit in the documentation I linked.

**intelfx** · 14 July 2021, 12:43 PM

Originally posted by HyperDrive View Post

It controls swap read-ahead exclusively, not any other type of read-ahead. It's quite explicit in the documentation I linked.

Yes, which is exactly what we are talking about, so it still controls read-ahead in context of our discussion. It's still completely irrelevant (at best, incidental) to the problem of thrashing.

Maybe you will be able to delay the onset of thrashing (but not eliminate it) at the cost of disabling swap readahead system-wide, which is completely uncalled for. Hence "irrelevant knobs with wide effect".

**sb56637** · 14 July 2021, 01:03 PM

Originally posted by hakavlad View Post

le9 maintainer here, ask any questions.

Hi there! Thank you so much for working on this, and especially for recognizing the existing deficiencies in the Linux kernel. I've been an almost exclusively Linux user on the desktop for about 20 years now, and I hate Windows, but I have to admit that Linux is embarrassingly bad under low memory conditions, at least for desktop workloads. And I find it especially egregious that so much work has to go into userspace daemons to nudge the kernel OOM killer to do its job; it just seems so wrong to dedicate something as fundamental as memory management to a userspace process. I firmly believe that no condition should ever lead to a frozen system (even if the kernel is still technically running behind the scenes), and blaming the user for overloading it doesn't solve the fundamental issue of the kernel's bad handling of non-optimal conditions.

Unfortunately, although I depend on Linux for everything, I'm too busy and still too technically limited after all these years to roll my own kernel with patches. (I run openSUSE). So, what can be done to help you get this mainline'd ASAP?

Cheers!

**hakavlad** · 14 July 2021, 01:12 PM

Originally posted by bple2137 View Post

I remember writing something dumb in Python. I was using multiple threads and appending some computed values to a list. The code should clean old values up, but by mistake it wasn't or it wasn't doing that quickly enough to make some free space. The result was, my 16G machine went out of RAM in seconds and became completely frozen. HDD led was constantly on like it was trying to utilize swap like crazy. Apart from that, no reaction for any sort of input. It was reproducible every time.

It was obviously something silly made for fun some time age so can't tell what it was doing exactly, but I believe I tried to make a script to compare speed of some operations after I bought a new CPU. I was really surprised how easy it is to jam the Linux kernel completely with just few lines of code. I was expecting such process to be killed, but I probably missed some configuration as it was pretty clean installation of Arch. I can't remember which kernel version it was, but 5.x for sure.

Do you think that the kernel with your patches (with no extra security or tweaks) would be better prepared to handle cases like that? Or could you elaborate on what was going on? (at least by suspecting based on my poor description)

>Do you think that the kernel with your patches (with no extra security or tweaks) would be better prepared to handle cases like that?

Just watch this video https://youtu.be/1uhcZwuvXLI

8 threads with python (extending lists by adding random.random(), playing supertux), no hang.

nohang in background just detect OOM, not kills anything (kernel OOMk kills processess fastly).

Announcement

"le9" Strives To Make Linux Very Usable On Systems With Small Amounts Of RAM

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment