Announcement

Collapse
No announcement yet.

Facebook Developing "OOMD" For Out-of-Memory User-Space Linux Daemon

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Facebook Developing "OOMD" For Out-of-Memory User-Space Linux Daemon

    Phoronix: Facebook Developing "OOMD" For Out-of-Memory User-Space Linux Daemon

    While the Linux kernel has its own out-of-memory (OOM) killer when system memory becomes over-committed, Facebook developers have been developing their own user-space based solution for handling this situation...

    http://www.phoronix.com/scan.php?pag...ok-OOMD-Memory

  • #2
    I think the OOM analogy which Andries Brouwer came up with in 2004 is still the best one:
    An aircraft company discovered that it was cheaper to fly its planes with less fuel on board. The planes would be lighter and use less fuel and money was saved. On rare occasions however the amount of fuel was insufficient, and the plane would crash. This problem was solved by the engineers of the company by the development of a special OOF (out-of-fuel) mechanism. In emergency cases a passenger was selected and thrown out of the plane. (When necessary, the procedure was repeated.) A large body of theory was developed and many publications were devoted to the problem of properly selecting the victim to be ejected. Should the victim be chosen at random? Or should one choose the heaviest person? Or the oldest? Should passengers pay in order not to be ejected, so that the victim would be the poorest on board? And if for example the heaviest person was chosen, should there be a special exception in case that was the pilot? Should first class passengers be exempted? Now that the OOF mechanism existed, it would be activated every now and then, and eject passengers even when there was no fuel shortage. The engineers are still studying precisely how this malfunction is caused.
    https://lwn.net/Articles/104179/


    Comment


    • #3
      Can confirm. Kernel OOM killer sucks. My workstation has 32GB of ram, but i manage to run it to the ground sometimes. Simple programming errors do wonders sometimes. Anyhow once OS runs out of memory it will start to swap like crazy filling up 16GB swap partition and then system becomes unresponsive. Sometimes OOM killer manages to do the right thing and bring system back after several minutes, but sometimes system remains locked up. Why cant OS just nuke process that clearly consumes bigger part of resources of entire OS :| Why cant OS just not allow allocating entire memory to leave some room to breathe for itself and avoid lockups :|

      Comment


      • #4
        Originally posted by bitman View Post
        Can confirm. Kernel OOM killer sucks. My workstation has 32GB of ram, but i manage to run it to the ground sometimes. Simple programming errors do wonders sometimes. Anyhow once OS runs out of memory it will start to swap like crazy filling up 16GB swap partition and then system becomes unresponsive. Sometimes OOM killer manages to do the right thing and bring system back after several minutes, but sometimes system remains locked up. Why cant OS just nuke process that clearly consumes bigger part of resources of entire OS :| Why cant OS just not allow allocating entire memory to leave some room to breathe for itself and avoid lockups :|
        I wish under desktop environments a pop-up would appear "This application is slowing down your computer due to an abnormally high use of memory. Would you like to a) wait b) kill?"

        Comment


        • #5
          Originally posted by bitman View Post
          Can confirm. Kernel OOM killer sucks. My workstation has 32GB of ram, but i manage to run it to the ground sometimes. Simple programming errors do wonders sometimes. Anyhow once OS runs out of memory it will start to swap like crazy filling up 16GB swap partition and then system becomes unresponsive. Sometimes OOM killer manages to do the right thing and bring system back after several minutes, but sometimes system remains locked up. Why cant OS just nuke process that clearly consumes bigger part of resources of entire OS :| Why cant OS just not allow allocating entire memory to leave some room to breathe for itself and avoid lockups :|
          As usual - its not that easy. Processes typically reserve a lot space, that's not backed by physical memory:
          • mapping large files to memory
          • copy-on-write regions after a fork
          • actually everything that lives on the heap (malloc)
          These things are used often, and the only think you can reliably measure is the space reserved, not the used memory.
          The last part is especially important, as this typically means malloc will almost never fail, and you could write a program to allocate several times your memory,
          you will not run into issues until you touch that memory. (the kernel settings overcommit_memory, overcommit_ratio control this behaviour)
          This usually means that your applications scheme to deal with memory exhaustion (if it has one) will not work, and this is grave enough that the C++ consortium unanimously agreed to remove the user-visible error handling (exceptions during allocation). (http://www.open-std.org/jtc1/sc22/wg...18/p0709r0.pdf)

          This is not the only reason that the Kernels OOM Killer is crap at what it is supposed to do, but a fundamental issue. I doubt you will ever have a generic solution that works well out-of-the-box for everyone.

          Comment


          • #6
            Originally posted by bitman View Post
            Can confirm. Kernel OOM killer sucks. My workstation has 32GB of ram, but i manage to run it to the ground sometimes. Simple programming errors do wonders sometimes. Anyhow once OS runs out of memory it will start to swap like crazy filling up 16GB swap partition and then system becomes unresponsive. Sometimes OOM killer manages to do the right thing and bring system back after several minutes, but sometimes system remains locked up. Why cant OS just nuke process that clearly consumes bigger part of resources of entire OS :| Why cant OS just not allow allocating entire memory to leave some room to breathe for itself and avoid lockups :|
            For what it's worth, I found that increasing the size of vm.min_free_kbytes helps noticeably in dealing with system responsiveness during out of memory conditions, at least under desktop usage scenarios. With the default setting used in most distributions the system hangs up for minutes as you noticed too, but by increasing it to say 384-512 MB (with 32GB of RAM it should not be a problem) swapping does not appear to lock the computer anymore. The default setting that appears to be used in most distributions is 64 MB and might be too low for modern desktop usage scenarios.

            I am not claiming this is a definitive or particularly elegant solution, but it works for me; I researched about it when some time ago I needed the PC to remain responsive during complex 3D rendering tasks and Linux would hang up whereas under similar conditions Windows would remain usable.

            Some more information:

            https://github.com/torvalds/linux/bl.../sysctl/vm.txt

            min_free_kbytes:

            This is used to force the Linux VM to keep a minimum number of kilobytes free. The VM uses this number to compute a watermark[WMARK_MIN] value for each lowmem zone in the system. Each lowmem zone gets a number of reserved free pages based proportionally on its size.

            Some minimal amount of memory is needed to satisfy PF_MEMALLOC allocations; if you set this to lower than 1024KB, your system will become subtly broken, and prone to deadlock under high loads.

            Setting this too high will OOM your machine instantly.

            Comment


            • #7
              My hopes are with Amazon to develop an OOMD that will order (and install) RAM for you when it detects a low-memory scenario

              Comment


              • #8
                OOM is the worst thing on Linux. With my poor 8GB Ram it is quite easy to hit that limit.

                IMO Swapping is the worst thing to do in an OOM scenario. It never happened to me that swapping freed enough memory in a sane amount of time for the system to become responsive again.
                Actually if the system halted, showed a dialog with the top 10 RAM consuming processes and I could choose what to kill,that'd be ideal for my workstation scenario. Right now what I do in an OOM scenario is use the Magic SysReq OOM Killer (Alt+Print+f) to kill a process, 70% of the time that's the correct one, 30% it's firefox which gives me a few seconds to kill the rogue process.

                Comment


                • #9
                  Sounds like a more featureful version of the earlyoom daemon I use to keep leaks while I'm developing from from sometimes essentially locking up my 16GiB sytem. ("Essentially" because I don't notice until the system starts thrashing and, once it's begun thrashing, it can go for hours without becoming responsive again on its own.)

                  (Basically, I configured it so that, if memory consumption passes 90%, it picks one of the largest processes and kills it, preferring Firefox and Chrome content processes in the event there are several qualifying processes to choose from. That leaves me 10% of 16GiB guaranteed for disk cache on my system where all of the SATA ports plus one USB 3.0 port are occupied by high-capacity rotating platter drives.)
                  Last edited by ssokolow; 10-22-2018, 06:24 AM.

                  Comment


                  • #10
                    I gave a lot of thought to that problem. Here are my two cents:
                    • Didn't distribution investigate increasing default swappiness recently? Preemptively swapping out unused programs could make the system a lot more responsive
                    • I would like to see a few memory management-related signals being introduced: about to swap, under memory pressure, etc. This would be useful for application to delete some data when it can help, and I believe a similar mechanism exists on Android. Sometimes deleting cached data is cheaper than swapping it out.
                    • The shell (desktop environment or otherwise) should handle these signals and dispatch them accordingly (default handler should broadcast to children). It could decide to send them to arbitrary processes.
                    • A way to reliably estimate memory usage per application. As mentioned in this thread, it would be nice if the DE could detect high memory usage by an application, send it a SIGSTOP and prompt the user wether to continue (best used together with the above, as a last resort after notifying the other applications doesn't improve the situation)
                    • Desktop environment should be able to send hints about which program to swap or not to swap. Please stop swapping out the compositor for one. DEs likely have a good idea which programs are being used actively, which are background tasks, and which are required to retain interactivity.
                    • Edit: DEs could already set the OOM priority among their children processes, couldn't they?
                    Last edited by [email protected]; 10-22-2018, 07:16 AM.

                    Comment

                    Working...
                    X