Announcement

Collapse
No announcement yet.

Fedora 32 Looking At Using EarlyOOM By Default To Better Deal With Low Memory Situations

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by Raka555 View Post
    In general I hate that the amount of dirty buffers scales with the amount of memory that you have.
    I now always run with the follow settings in my /etc/sysctl.conf:
    vm.dirty_bytes = 16777216
    vm.dirty_background_bytes = 4194304


    I never have any issues while copying large files any more.
    Thanks! It really made a huge difference while moving a 10 GB directory (before: frozen desktop, after: can use all sorts of programs while files are being moved).

    Btw beware of tlp's defaults: https://www.mail-archive.com/kernel-...msg359847.html

    Comment


    • #62
      Originally posted by halo9en View Post

      Thanks! It really made a huge difference while moving a 10 GB directory (before: frozen desktop, after: can use all sorts of programs while files are being moved).

      Btw beware of tlp's defaults: https://www.mail-archive.com/kernel-...msg359847.html
      tlp should really be phased out in favour of tuned. tuned is a modern replacement, so has a dbus service, isn't built with shell scripts and can be adapt to suggestions from powertop.

      Comment


      • #63
        Originally posted by Britoid View Post
        tlp should really be phased out in favour of tuned. tuned is a modern replacement, so has a dbus service, isn't built with shell scripts and can be adapt to suggestions from powertop.
        Never heard of it before. I'm going to try it instead of tlp, thanks.

        Comment


        • #64
          Originally posted by polarathene View Post

          I wrote a naive program in rust for reading binary data into memory to process, and while I wrote this at work on Windows 10 in 2017 it ended up filling up 128GB RAM in short time frame. I think at one point Windows properly terminated it, but another it somehow caused Windows to end the session closing all my open programs and losing any unsaved state..

          Rewrote the program to use a buffer and process the data in a stream(writing back to disk the processed version and appending to it), didn't exceed 500MB.

          Windows(at least 10, pretty sure I had bad experiences on 7/8 and earlier) probably still does the best job compared to my experiences with macOS and Linux by default for desktop users.
          Since AFAIK Windows does still not have any OoM-killer what happened was that your application terminated when malloc() returned NULL, could be some library function or rust function that have an assert around malloc that simply made it call exit(). But that is pure speculation from my part, as I've already said my Windows experience is from a long time ago.

          Comment


          • #65
            Originally posted by Raka555 View Post

            Have you tried running a kernel with overcommit off ?
            I tried that already back in 2012: https://ewontfix.com/3/
            You start getting memory allocation failures long before you reach the actual RAM limit in your machine. There is a reason why overcomit exits.
            Overcomit is not bad. It is our ability to deal with it in our apps that is bad.
            Well Linux is built for overcommit so I would think that the no-overcommit scenario is not that well tested in practice. What matters here though is that Windows does not have overcommit, there you can only allocate as much memory as there are available swap-space. Yes that swap-space can and do increase but that is different from the overcommit that happens in say Linux. So therefore the comparison that people make with Windows vs Linux is not really equal, when they tested a OoM scenario on Linux they put the VM-system through one hell of a lot more pressure than when they did the same on Windows due to Linux doing overcommitting.

            Originally posted by Raka555 View Post

            1) Languages with garbage collection could run the GC and give some memory back to the OS.
            2) Free some memory and see if it helps.
            3) Programs can try to shutdown as clean as is in their power with the memory they have.
            4) Those on an allocation spree might be made aware that they are the culprit.
            #1 Yeah I can give you that!
            #2 Why did your application allocate memory that it didn't need? Now this will only happen for special applications that use internal file buffering and those should be in the tiny minority if applications and totally changing the memory allocation routines for those few applications sounds like a waste of resources.
            #3 Then the problem is "faulty" to begin with, I'll explain a little bit more below.
            #4 And what should they do about it? Still as I've written over and over, the application allocated memory because it needed it and not just for the fun of it so what should it do with this information? It's not like every single userspace application runs a full AI that can rewrite the entire application logic when this happens. So in reality we are left with a single solution which is to kill the application outright and that is what the OoM-killer will do anyway so we have not solved anything.

            What I mean by "faulty" in #3 above is that our entire software industry is sadly still operating in the floppy+tape paradigm that we hardware wise left several decades ago. My claim here is that if your application have a "save" button then you're doing it wrong.

            In 2019 there does not exist a single reason why every single change to a document/spreadsheet/whatever is not immediately committed to disk with an infinite (where infinite is defined as the size of your total storage) undo disk buffer. There should be no reason for an orderly shutdown, nor should there be any "do you want to save?" option to the user when (s)he quits the application. You should be able to yank the power cord and boot your machine and be back exactly where you left of.

            That we in 2019 is not there is the major problem with software development and the real question to ask.
            Last edited by F.Ultra; 04 January 2020, 02:25 PM.

            Comment


            • #66
              Originally posted by tildearrow View Post
              Why not disable paging executable code out at all and making it resident on memory at almost all times?
              because executable code is not stored in memory. it is stored in filesystem, so it has to be paged in to run. so your clever idea should be reworded to something like "read all executable code to memory on boot and fail immediately due to memory exhaustion"

              Comment


              • #67
                Originally posted by k1e0x View Post
                Why not just fix the kernel's OOM killer?
                because this "fix" is really just 10% reduction in available memory. not the best solution for memory-starved systems
                Originally posted by k1e0x View Post
                Other OS's don't hang up like Linux does in that situation..
                other oses never reach that situation, they die much earlier. but if you prefer inferior behaviour of other oses, why not just use them?
                Last edited by pal666; 04 January 2020, 09:13 PM.

                Comment


                • #68

                  vm.dirty_background_ratio should be low so that the kernel starts writing dirty memory to disk earlier.
                  vm.dirty_ratio should be high, so it doesn't stall the system when there is much dirty memory. The difference between the two is the "buffer" the kernel have to work with. If it is unsuccessful in writing out the dirty memory before vm.dirty_ratio is reached it will stall the system until it has done it's task.

                  Of course setting vm.dirty_ratio too high will of course increase risk swapping.

                  Comment


                  • #69
                    Originally posted by k1e0x View Post
                    Why not just fix the kernel's OOM killer? Other OS's don't hang up like Linux does in that situation.. Wait.. I know.. systemd-oom-bandaid.
                    there are some hints to your question on the earlyoom github page: https://github.com/rfjakob/earlyoom
                    It is hard for the kernel to know what to kill. You are doing a disservice to the kernel devs if you think it is easy.

                    As for other OS: I have stress tested win 10 and ubuntu + earlyoom (2GB, no swap, load 100 tabs in Chrome), and windows 10 does not do very well; most of the time the VM just crashed, sometimes chrome died (all of it). earlyoom kills tabs, the machine stays responsive. In other words, Linux with earlyoom is a much better experience than Windows 10, in my testing.

                    Also, there is a new project, nohang, which is more sophisticated. It provides desktop notifications out of the box, and it can use the new memory presssure KPIs and messages from zram to give more warning about low memory. In default settings, it acts very much like earlyoom (but with desktop notifications about low memory). Desktop Linux needs a user space killer, the kernel devs have made that clear, and we have earlyoom and a newer one, nohang, plus gnome is building in notification support. Right now, anyone who complains about OOM deadlocks should install earlyoom, problem solved.

                    The kernel killer only fails sometimes, too; there is a lot of exaggeration about how bad it is. I run countless 2GB linux servers, and I never get OOM deadlocks. Obviously it can happen, otherwise facebook wouldn't have added pressure stall KPIs to the kernel, but it is rare.

                    Comment


                    • #70
                      Originally posted by Britoid View Post

                      systemd should remain operational as long as pid 1 is still there, which should never be killed or swapped out.

                      anything else sounds like a kernel bug.
                      All I know is that it's "running" but you can't use it, communicate with it in any way shape or form. New services cannot stop, existing services cannot be shutdown. You can't "reboot". You have to hard kill the box. Call that whatever you want.

                      Comment

                      Working...
                      X