Announcement

**Xake** · 27 January 2023, 05:10 AM

Originally posted by _ReD_ View Post

No. No. No. Not OOMD not even any of the existing userspace OOM-managers. (because I din't even write OOMD)
I always talked about the whole situation of OOM-killing stuff on the desktop. (I will further explain the concept below.)

You are correct, it seems like there is barriers. Maybe in language. Maybe also in what was meant vs what was written.
Let me quote your first post:

Originally posted by _ReD_ View Post

While it can be a very good tool in tightly controlled environments such as servers, the current approach is a clusterfuck on the desktop.
Nobody likes it when their apps, or even their entire desktops, suddenly disappear with no warnings, no clues, no way to appeal, and no way to recover anything.

Here you talk about how systemd-oomd is a good tool on a server, and continue with saying the current approach is a clusterfuck.
I cannot parse this in any other way then you saying the current approach *of the tool* is a clusterfuck on the desktop.

To be clear, this quote is you answer someone asking what systemd-oomd is and what it does that other oom things does not. So the context is that "systemd-oomd is the tool", not that "oom is the tool".
So there is no way to parse this to be about anything else then systemd-oomd.

That was what I had issues with.

When it comes to meaning into it that is not outright written, it also says nothing about the kernel oom routine. Which makes it for me sound actually like you saw no problem until systemd-oomd came into the picture. With other words, it sounds like the linux kernel oom routine is not a clusterfuck.

That was for me the missing context, that the only thing you bashed was systemd-oomd without the context if it was worse, better or the same as the kernel oom routine (that is the linux oom routine as in "the alternative to systemd-oomd"). The way you wrote your message about how *the tool* was a clusterfuck made it sound like that was *the only* thing about oom you had problems with.

Now you are saying that you all along meant the full OOM-experience is a clusterfuck on the Desktop.

That I can fully agree with.

To be honest, I see nothing really more to discuss here.
This has more or less just became versions of namecalling, distrusting "fake" information because I choose to describe memory starved systems without telling all gory details about how I for different reasons know how I trigger this memory starvation so I know how it acts, and ways to guess what the other person means or why their system acted in a way that it later turned out it did not.

Everything just because you wrote aoom-handling tool is good on a server, but a clusterfuck on the desktop. And does not seem to understand how that can be misleading into thinking you mean that that tool made the situation on the desktop into a clusterfuck.

**oiaohm** · 27 January 2023, 12:08 PM

Originally posted by Xake View Post

Everything just because you wrote aoom-handling tool is good on a server, but a clusterfuck on the desktop. And does not seem to understand how that can be misleading into thinking you mean that that tool made the situation on the desktop into a clusterfuck.

Overview of oomd · oomd

https://facebookincubator.github.io/oomd/docs/overview

Out of memory (OOM) killing has historically happened inside kernel space—if a system runs out of physical memory, the Linux kernel is forced to OOM kill one or more processes. This is typically a slow and painful process because the kernel spends an unbounded amount of time swapping in and out pages and evicting the page cache. Furthermore, configuring policy is not very flexible while being somewhat complicated.

OOM Killer is unfriendly on server and desktop. Kernel out of memory killer waits until the very last min todo something then the system appears to freeze up because by the time it triggers the system is truly out of memory.

OOMD detects earlier and that the system is on a unsustainable path of memory usage. and starts killing processes. Core to both cases of unsustainalbe is the memory over commit being on. Over memory commit being off the kernel will start refusing to allocate memory that normally results in random processes crashing as well.

There is the simple thing of pushing your computer way to hard causing bad things to happen. Desktop users can do this so can server users. Linux kernel has controls on this stuff for a reason.

The desktop disappearing is a resilience problem. I am watching the KDE Wayland compositor restart stuff so that you can kill the compositor and restart it without terminating applications. There are cases where the x.org X11 server is leaking memory is the valid OOM target but due to it lack of resilience killing it loses all the users work. Yes a resilience problem that going to effect you with OOM killer disabled and over-commit disabled as well from time to time if you push system too hard.

Yes Linux kernel does also have kernel painic instead of OOM killer as option of course this normally results in even more user data loss.

People don't think about this. Force kill firefox, chrome and libreoffice then re run them. Those programs have resilience they attempt to recover where the user left off. Force kill x.org X11 server then rerun it in most cases everything is lost this has no resilience. I don't see the problem as OOM Killers like the kernel or oomd. The problem is more a fundmental one with resilience on desktop. Please note resilience features also results in system loses power you loss less of your incomplete work as well.

Kind of randomlly killing processes causing users major issues that OOM killers do is really not a sign of major problem with the OOM killers but a sign the programs the OOM killer happened to kill that caused user problem having a major problem with resilience.

The core Linux desktop software is not as resilient as it should be. Lot of the blame of OOM killer solutions is like being pulled over by the police having car impounded due to it being for not being roadworthy then blaming the police and trying to argue against repairing the car(yes police here would the OOM killer solution and car would be user chosen applications).

**ll1025** · 27 January 2023, 01:37 PM

I'm increasingly convinced that any english-language sentence that includes the word "systemd" is blatant trolling and should result in imprisonment and a fine.

**_ReD_** · 27 January 2023, 03:35 PM

Originally posted by ll1025 View Post

I'm increasingly convinced that any english-language sentence that includes the word "systemd" is blatant trolling and should result in imprisonment and a fine.

**_ReD_** · 27 January 2023, 04:27 PM

Originally posted by Xake View Post

You are correct, it seems like there is barriers. Maybe in language. Maybe also in what was meant vs what was written.
Let me quote your first post:

What you're doing is putting intentions on trial, while at the same time ascribing to me "implied" intentions that I never wrote or thought about and that don't even correspond to the general gist of the whole post.
You do it by artfully choosing fragments of sentence detached from the context. Single "offending" words whose meaning you continue to distort.

The truly shocking thing is that even if I explain to you that the message has a meaning that must be read en bloc and that individual words cannot be prosecuted, you come back here and insist on explaining to me (to me, the one who wrote those words) that I really didn't mean what I wrote, that in that specific excerpt you glimpse implications (which are not even reflected in the construction and meaning of the entire message) and that therefore, in reality I meant what you read in it and that your interpretation and not what I'm repeating ad nauseam, would be the true hidden meaning of my sentence...

Look, we're not talking about what Shakespeare meant in an aphorism. We're talking about what I mean. I, the author of the message!
And I'm not even dead. I am present here!

**Xake** · 28 January 2023, 11:33 AM

Originally posted by oiaohm View Post

Overview of oomd · oomd

https://facebookincubator.github.io/oomd/docs/overview

Out of memory (OOM) killing has historically happened inside kernel space—if a system runs out of physical memory, the Linux kernel is forced to OOM kill one or more processes. This is typically a slow and painful process because the kernel spends an unbounded amount of time swapping in and out pages and evicting the page cache. Furthermore, configuring policy is not very flexible while being somewhat complicated.

OOM Killer is unfriendly on server and desktop. Kernel out of memory killer waits until the very last min todo something then the system appears to freeze up because by the time it triggers the system is truly out of memory.

Yes? Nothing I say say anything different?
Any oom-killer is unfriendly because they are unfriendly by design. They are meant to be unfriendly, because a slap to an application can be what saves the filesystems and others that needs a version of a system reboot/shutdown without yanking the power cable, unless you like really broken file systems and that kind of data loss.

Also the thing you quoted was be commenting on was a post (https://www.phoronix.com/forums/foru...e2#post1369406) _ReD_ wrote to answer the following qustion:

Originally posted by jacob View Post

So what exactly does systemd-oomd do, other than oom-D?

And the answear from _ReD_ was:

Originally posted by _ReD_ View Post

Basically it intervenes sooner, it's aware of cgroups and has some further configurability.

While it can be a very good tool in tightly controlled environments such as servers, the current approach is a clusterfuck on the desktop.
Nobody likes it when their apps, or even their entire desktops, suddenly disappear with no warnings, no clues, no way to appeal, and no way to recover anything.

There's a missing link here. The tool is too blunt and there's no instrumentation in the desktop.

Now, for this or -any other- proactive oom-handler to be viable on the desktop, integration and more functionality are imperative.
I mean, currently It's criminal! There's not even a notification after the fact.

This is the full quote to keep _ReD_ from accusing me for:

Originally posted by _ReD_ View Post

What you're doing is putting intentions on trial, while at the same time ascribing to me "implied" intentions that I never wrote or thought about and that don't even correspond to the general gist of the whole post.
You do it by artfully choosing fragments of sentence detached from the context. Single "offending" words whose meaning you continue to distort.

When I say his first post painted it like systemd-oomd and "any other- proactive oom-handler" is a clusterfuck on the desktop.

But you know. I am apparently supposed to be a mind reader and suppposed to know that he also ment the kernel oom routine with "proactive oom-handler", as he later stated that his intention with this message clearly was targeting all oom handlers including the kernel internal.
Which I have a hard time getting from his post.

Yes, I agree with him on the most points that he made from when he clarified himself.
But he seems hellbent to misunderstand me, and accuse me of taking words out of context (when I quoted half of his post...).
So please do not do that you as well.

**oiaohm** · 28 January 2023, 12:12 PM

Originally posted by _ReD_ View Post

While it can be a very good tool in tightly controlled environments such as servers, the current approach is a clusterfuck on the desktop.
Nobody likes it when their apps, or even their entire desktops, suddenly disappear with no warnings, no clues, no way to appeal, and no way to recover anything.

Is this problem a constant.

People don't think about this. Force kill firefox, chrome and libreoffice then re run them. Those programs have resilience they attempt to recover where the user left off. Force kill x.org X11 server then rerun it in most cases everything is lost this has no resilience. I don't see the problem as OOM Killers like the kernel or oomd. The problem is more a fundmental one with resilience on desktop. Please note resilience features also results in system loses power you loss less of your incomplete work as well.

Its not a constant.
Particular programs you can kill at any time and when you re run them you recover your data.

Originally posted by Xake View Post

Yes, I agree with him on the most points that he made from when he clarified himself.

My problem is that the problem he is talking about is not really OOM Killer problem. If OOM Killer taking out a process is causing you to lose data. Device having power outage is also going to cause you to lose your work. This is a resilience problem.

Kind of randomlly killing processes causing users major issues that OOM killers do is really not a sign of major problem with the OOM killers but a sign the programs the OOM killer happened to kill that caused user problem having a major problem with resilience.

I did state this in my past post as well.

Pushing system hard enough to trigger OOMD or the kernel OOM killer you have exceed the system limits.

Originally posted by _ReD_ View Post

Now, for this or -any other- proactive oom-handler to be viable on the desktop, integration and more functionality are imperative.
I mean, currently It's criminal! There's not even a notification after the fact.

This is foolish. More integration does not make this event better. OOM Killers userspace or kernel are triggering because system is low on resources. Attempting to provide notifications could get you into infinity loop. More software need to have resilience like libreoffice, firefox, chrome does so that brute force killing the software by any means is not causing user data loss.

OOM Killer is more common problem than power outages and other problems that ruin your day with software that lack resilience as well. Yes doing all the intergeration of OOM killer controlls into desktop will not help users with power outages and opps I was in task manager and order the wrong program killed or I rebooted machine with something critical open. More resilience in software address stack of different problems that OOM KIllers is only small part of that problem set.

**_ReD_** · 29 January 2023, 07:34 PM

Originally posted by Xake View Post

This is the full quote to keep _ReD_ from accusing me for:

No, you moron, that's not he full quote and you're still playing the same idiotic game.

**_ReD_** · 29 January 2023, 08:23 PM

Originally posted by oiaohm View Post

This is foolish. More integration does not make this event better. OOM Killers userspace or kernel are triggering because system is low on resources. Attempting to provide notifications could get you into infinity loop.

Naaaahh! That's easy: Just (pre)load the notifications handler and all the necessary bits in mlocked memory. Done!
(and perhaps don't shave it too close before engaging)

More software need to have resilience like libreoffice, firefox, chrome does so that brute force killing the software by any means is not causing user data loss.

OOM Killer is more common problem than power outages and other problems that ruin your day with software that lack resilience as well. Yes doing all the intergeration of OOM killer controlls into desktop will not help users with power outages and opps I was in task manager and order the wrong program killed or I rebooted machine with something critical open. More resilience in software address stack of different problems that OOM KIllers is only small part of that problem set.

On this points I agree 100%, every time, all the time.
I'd go even farther and argue for whole-desktop checkpointing (in addition to apps-reslience). Which would allow to restore a full desktop session as-it-was before the crash/power-cut/panic/whatever event including apps, windows-positions, cursors positions, undo-buffers etc.

**oiaohm** · 29 January 2023, 09:04 PM

Originally posted by _ReD_ View Post

Naaaahh! That's easy: Just (pre)load the notifications handler and all the necessary bits in mlocked memory. Done!
(and perhaps don't shave it too close before engaging)

This is the problem. OOM Killers trigger when resource usage has gone too high for some reason. Applications having memory leaks is one of those things. Notification handler is application so it can memory leak. So preload the notifcaitons handler and it leaks might be very reason OOM Killer is going off. Yes of the OOM killer is having to provide notifications this means the notifcation system most likely need to bee OOM killer immune.

Notification should not come from the oom killer. OOM killer application should be like any other application crash. Maybe crash handler be smart enough to tell the difference.

Big thing here is preloading and immunity on anything the OOM killer has to directly interface with basically a mistake every time you do it. Yes even having OOM killer start something on demand is a really bad idea.

This is my problem. The issue with OOM killer when you look closer they are not OOM Killer 99.99% of the time.
1) Its crash handler not providing user with information this is general application crashes as well as OOM Killer taken down applications and watched processes being user killed.
2) the application not having the resilience it should.
3) System not configured for the types of application. Yes most of these like disable overcommit altering caching polices and so on are not OOM Killer settings.

OOM killer not directly talking to the crash handler means the OOM killer if it detects something gone wrong in the crash handler memory usage can kill it. Now if the system has enough resilience the crash handler will be restarted and it will pick up that it was not shutdown cleanly and that it was killed then can look into what killed it.

Graphical tool to configure either the kernel OOM Killer or OOMD is about as far as the tooling here should go.
1) Notifications about applications being killed by OOM Killer that crash management/bug reporting tools for any crashes. This should not be linked to OOM Killers at all. Application look at crash reports from the Linux kernel can in fact see that something was killed by a OOM Killer not a user.
2) Notifications about running out of memory that would be system monitor role. This again should not be linked to the OOM Killers.

I cannot think of Notification that should come from the OOM Killers other than the ones the kernel already does. Having OOM killers send Notifications that creates unkillable processes that come risk of leaking memory and triggering OOM Killer storm.

Kernel notifications about stuff being killed get qued if no application is running to pick it up. This is why you can kill a crash monitor and have it watchdog restarted without any problems. Yes indirect messaging.

Get the catch here. You want OOM Killers integrated into the system as little as possible. The more you integrate OOM Killer the more often you will have issues.

Originally posted by _ReD_ View Post

On this points I agree 100%, every time, all the time.
I'd go even farther and argue for whole-desktop checkpointing (in addition to apps-reslience). Which would allow to restore a full desktop session as-it-was before the crash/power-cut/panic/whatever event including apps, windows-positions, cursors positions, undo-buffers etc.

Restarting · Wiki · Plasma / KWin · GitLab

https://invent.kde.org/plasma/kwin/-/wikis/Restarting

Easy to use, but flexible, X Window Manager and Wayland Compositor

KDE is heading slowly down this route. Yes this include wrapping cgroups aground applications that will in future be able be used to checkpoint applications individually.

Announcement

systemd 253 RC1 Released With New "ukify" Tool

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment