Announcement

Collapse
No announcement yet.

Fedora 32 Looking At Using EarlyOOM By Default To Better Deal With Low Memory Situations

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • k1e0x
    replied
    Originally posted by timrichardson View Post

    there are some hints to your question on the earlyoom github page: https://github.com/rfjakob/earlyoom
    It is hard for the kernel to know what to kill. You are doing a disservice to the kernel devs if you think it is easy.

    As for other OS: I have stress tested win 10 and ubuntu + earlyoom (2GB, no swap, load 100 tabs in Chrome), and windows 10 does not do very well; most of the time the VM just crashed, sometimes chrome died (all of it). earlyoom kills tabs, the machine stays responsive. In other words, Linux with earlyoom is a much better experience than Windows 10, in my testing.

    Also, there is a new project, nohang, which is more sophisticated. It provides desktop notifications out of the box, and it can use the new memory presssure KPIs and messages from zram to give more warning about low memory. In default settings, it acts very much like earlyoom (but with desktop notifications about low memory). Desktop Linux needs a user space killer, the kernel devs have made that clear, and we have earlyoom and a newer one, nohang, plus gnome is building in notification support. Right now, anyone who complains about OOM deadlocks should install earlyoom, problem solved.

    The kernel killer only fails sometimes, too; there is a lot of exaggeration about how bad it is. I run countless 2GB linux servers, and I never get OOM deadlocks. Obviously it can happen, otherwise facebook wouldn't have added pressure stall KPIs to the kernel, but it is rare.
    Good, I'm glad to do a disservice to them. It's broken.

    The problem isn't avoiding the situation before it happens.
    the problem is also not choosing what to kill. That might be nice, but that's not the issue here.

    It's that the existing kernel oom killer doesn't work and hangs the system.

    Leave a comment:


  • cjcox
    replied
    Originally posted by Britoid View Post

    systemd should remain operational as long as pid 1 is still there, which should never be killed or swapped out.

    anything else sounds like a kernel bug.
    All I know is that it's "running" but you can't use it, communicate with it in any way shape or form. New services cannot stop, existing services cannot be shutdown. You can't "reboot". You have to hard kill the box. Call that whatever you want.

    Leave a comment:


  • timrichardson
    replied
    Originally posted by k1e0x View Post
    Why not just fix the kernel's OOM killer? Other OS's don't hang up like Linux does in that situation.. Wait.. I know.. systemd-oom-bandaid.
    there are some hints to your question on the earlyoom github page: https://github.com/rfjakob/earlyoom
    It is hard for the kernel to know what to kill. You are doing a disservice to the kernel devs if you think it is easy.

    As for other OS: I have stress tested win 10 and ubuntu + earlyoom (2GB, no swap, load 100 tabs in Chrome), and windows 10 does not do very well; most of the time the VM just crashed, sometimes chrome died (all of it). earlyoom kills tabs, the machine stays responsive. In other words, Linux with earlyoom is a much better experience than Windows 10, in my testing.

    Also, there is a new project, nohang, which is more sophisticated. It provides desktop notifications out of the box, and it can use the new memory presssure KPIs and messages from zram to give more warning about low memory. In default settings, it acts very much like earlyoom (but with desktop notifications about low memory). Desktop Linux needs a user space killer, the kernel devs have made that clear, and we have earlyoom and a newer one, nohang, plus gnome is building in notification support. Right now, anyone who complains about OOM deadlocks should install earlyoom, problem solved.

    The kernel killer only fails sometimes, too; there is a lot of exaggeration about how bad it is. I run countless 2GB linux servers, and I never get OOM deadlocks. Obviously it can happen, otherwise facebook wouldn't have added pressure stall KPIs to the kernel, but it is rare.

    Leave a comment:


  • Spam
    replied

    vm.dirty_background_ratio should be low so that the kernel starts writing dirty memory to disk earlier.
    vm.dirty_ratio should be high, so it doesn't stall the system when there is much dirty memory. The difference between the two is the "buffer" the kernel have to work with. If it is unsuccessful in writing out the dirty memory before vm.dirty_ratio is reached it will stall the system until it has done it's task.

    Of course setting vm.dirty_ratio too high will of course increase risk swapping.

    Leave a comment:


  • set135
    replied
    During these types of discussions, I tend to wonder why I don't seem to have similar problems, then I remember that I also have a set of sysctl settings to avoid them. I haven't done a fresh install on my main desktop for well over a decade, just migrating the old one to new machines and using a rolling release distro. One would think that at least the distributions that target desktops would make an attempt at matching these settings to ones hardware? Do they? My settings are a little different than those already shared: (32G 4core nvme root, 12tb spinning rust)

    Code:
    # Using a ratio instead of a fixed value as described above (you can only use one or the other)
    vm.dirty_background_ratio = 2
    vm.dirty_ratio = 5
    
    # Try to keep things in memory
    vm.swappiness = 5
    vm.vfs_cache_pressure = 25
    https://wiki.archlinux.org/index.php...Virtual_memory

    Contains some discussion on these values and others. For those that have swapping issues, vm.min_free_kbytes as described may help.
    Last edited by set135; 01-04-2020, 09:34 PM.

    Leave a comment:


  • pal666
    replied
    Originally posted by k1e0x View Post
    Why not just fix the kernel's OOM killer?
    because this "fix" is really just 10% reduction in available memory. not the best solution for memory-starved systems
    Originally posted by k1e0x View Post
    Other OS's don't hang up like Linux does in that situation..
    other oses never reach that situation, they die much earlier. but if you prefer inferior behaviour of other oses, why not just use them?
    Last edited by pal666; 01-04-2020, 09:13 PM.

    Leave a comment:


  • pal666
    replied
    Originally posted by tildearrow View Post
    Why not disable paging executable code out at all and making it resident on memory at almost all times?
    because executable code is not stored in memory. it is stored in filesystem, so it has to be paged in to run. so your clever idea should be reworded to something like "read all executable code to memory on boot and fail immediately due to memory exhaustion"

    Leave a comment:


  • 144Hz
    replied
    Britoid The new MR makes the RT priority default. So code and capability will be tested more extensively. I think most people would prefer a kernel fix though..

    Leave a comment:


  • 144Hz
    replied
    halo9en Yes. Problem solved.

    Leave a comment:


  • F.Ultra
    replied
    Originally posted by Raka555 View Post

    Have you tried running a kernel with overcommit off ?
    I tried that already back in 2012: https://ewontfix.com/3/
    You start getting memory allocation failures long before you reach the actual RAM limit in your machine. There is a reason why overcomit exits.
    Overcomit is not bad. It is our ability to deal with it in our apps that is bad.
    Well Linux is built for overcommit so I would think that the no-overcommit scenario is not that well tested in practice. What matters here though is that Windows does not have overcommit, there you can only allocate as much memory as there are available swap-space. Yes that swap-space can and do increase but that is different from the overcommit that happens in say Linux. So therefore the comparison that people make with Windows vs Linux is not really equal, when they tested a OoM scenario on Linux they put the VM-system through one hell of a lot more pressure than when they did the same on Windows due to Linux doing overcommitting.

    Originally posted by Raka555 View Post

    1) Languages with garbage collection could run the GC and give some memory back to the OS.
    2) Free some memory and see if it helps.
    3) Programs can try to shutdown as clean as is in their power with the memory they have.
    4) Those on an allocation spree might be made aware that they are the culprit.
    #1 Yeah I can give you that!
    #2 Why did your application allocate memory that it didn't need? Now this will only happen for special applications that use internal file buffering and those should be in the tiny minority if applications and totally changing the memory allocation routines for those few applications sounds like a waste of resources.
    #3 Then the problem is "faulty" to begin with, I'll explain a little bit more below.
    #4 And what should they do about it? Still as I've written over and over, the application allocated memory because it needed it and not just for the fun of it so what should it do with this information? It's not like every single userspace application runs a full AI that can rewrite the entire application logic when this happens. So in reality we are left with a single solution which is to kill the application outright and that is what the OoM-killer will do anyway so we have not solved anything.

    What I mean by "faulty" in #3 above is that our entire software industry is sadly still operating in the floppy+tape paradigm that we hardware wise left several decades ago. My claim here is that if your application have a "save" button then you're doing it wrong.

    In 2019 there does not exist a single reason why every single change to a document/spreadsheet/whatever is not immediately committed to disk with an infinite (where infinite is defined as the size of your total storage) undo disk buffer. There should be no reason for an orderly shutdown, nor should there be any "do you want to save?" option to the user when (s)he quits the application. You should be able to yank the power cord and boot your machine and be back exactly where you left of.

    That we in 2019 is not there is the major problem with software development and the real question to ask.
    Last edited by F.Ultra; 01-04-2020, 02:25 PM.

    Leave a comment:

Working...
X