Announcement

Collapse
No announcement yet.

Facebook Developing "OOMD" For Out-of-Memory User-Space Linux Daemon

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Sounds like a good idea if implemented properly, where processes to kill can be chosen. Kernel OOM is one of the most bonkers parts of Linux. Yes, I can see there are ways to disable it, or select some processes that shouldn't be killed, but it shouldn't be possible to get in that situation in the first place. The default should be to only kill processes that are identified as disposable, not select one at random.

    As far as I'm aware, the OOM killer is also a hard kill, and what's really needed is something that will allow for an orderly process shutdown.

    Comment


    • #12
      I agree with others that one of the things Linux is especially bad at is figuring out how to handle OOM situations. It's one of the very few things Windows seemed to have got right a looong time ago. That being said, the Windows approach isn't exactly perfect either, but at least the system [might] remain usable in the event a program has a memory leak.

      What I think the Linux default behavior should be is to sigstop a process that is about to make the total memory usage (not including buffers) exceed 99%, and, is a process that ranks within the top 10 memory-consuming processes. That way, your whole system should still remain usable, you didn't lose everything in the entire process (though any new incoming data might be lost), and you get a chance to recover while still seeing what's going on. Then, if you feel like risking it, you can sigcont the process whenever you're ready.

      Comment


      • #13
        What's the point when we can always download more ram?

        But seriously, things like better tuned default settings to your hardware (like min_free_kbytes suggested above), putting swap on ssd, or setting memory limits with cgroups, chpst, node limiting with numactl, and etc before running high-usage programs seems to be a better solution to me -- set yourself up so if it fails you're covered then you won't need failure mechanisms.

        Comment


        • #14
          My experience with OOM and low ram scenarios, my primary system used to be a Haswell based Xeon with 16 GB and an 8 GB swap file and I used to lock that system up very easily by opening up a bunch of Firefox tabs, while starting a video encoding job and trying to play a 4k video at the same time, I can't remember how many hard reboots I had to do to recover the system.

          When I went with my R5 1600, because I was on a tight budget and only bought 8 GB of DDR4, I set Ubuntu up with a 20 GB swap file on a fast SSD and a 20 GB swap file on a spinning rust drive. Guess what? No more lock ups or system freezes, even when it's using all available system ram, 10 GB of swap and nearly 2 GB of vram (our of 2 GB total), the system still stays responsive and even if one app freezes for a minutes, it recovers in less than a minute.

          I don't think there's anything wrong with the way the Linux kernel handles memory, I think the problem rests with the way people configure their systems.

          Comment


          • #15
            Originally posted by chithanh View Post
            I think the OOM analogy which Andries Brouwer came up with in 2004 is still the best one:
            https://lwn.net/Articles/104179/

            That's especially funny since airplane companies routinely sell more tickets to airplanes that they have seats for because on average sufficient amounts of people do late cancellation that everyone fits anyway. If plane is full, people get bumped to next plane (though not terminated)

            Comment


            • #16
              Originally posted by Spooktra View Post
              When I went with my R5 1600, because I was on a tight budget and only bought 8 GB of DDR4, I set Ubuntu up with a 20 GB swap file on a fast SSD and a 20 GB swap file on a spinning rust drive. Guess what? No more lock ups or system freezes, even when it's using all available system ram, 10 GB of swap and nearly 2 GB of vram (our of 2 GB total), the system still stays responsive and even if one app freezes for a minutes, it recovers in less than a minute.

              I don't think there's anything wrong with the way the Linux kernel handles memory, I think the problem rests with the way people configure their systems.
              Well let's put it this way - have you run out of swap space? Because if your swap ever gets maxed out, your system will be even more unresponsive than if you didn't have it at all. Keep in mind that the rate at which RAM fills up seems to be correlated to how unresponsive your system becomes, especially if you're using a swap drive. I assume you would probably get by just fine with 16GB of RAM, no swap, and a discrete GPU. Furthermore, if you increase your swappiness, that may help with long-term performance. By default, Linux only uses swap when there's insufficient RAM, but you can tell it to use swap sooner, much like how Windows does it. This is good if you expect to run out of RAM, but it obviously hurts short-term performance.

              Also I'm a little bit confused - if you have a 1600, how exactly are you dedicating 2GB of RAM to the GPU? I ask because the 1600 doesn't have an IGP, and GPUs aren't used for system memory*. Or am I just misunderstanding what you meant there?


              * There is one exception:
              https://wiki.archlinux.org/index.php/swap_on_video_ram
              Last edited by schmidtbag; 10-22-2018, 09:39 AM.

              Comment


              • #17
                So why not just contribute to the Kernel?

                Comment


                • #18
                  Is there anything like a `nice` command for OOM score, so it's possible to run a buildsystem safely?

                  I think the typical way I run out of memory is that I (momentarily) get a process tree a bit like this

                  Code:
                  make -j8
                  ├─ ninja -j8
                  │  └─ ...
                  ├─ ninja -j8
                  │  └─ ...
                  ├─ ninja -j8
                  │  └─ ...
                  ├─ ninja -j8
                  │  └─ ...
                  ├─ ninja -j8
                  │  └─ ...
                  ├─ ninja -j8
                  │  └─ ...
                  ├─ ninja -j8
                  │  └─ ...
                  └─ ninja -j8
                     └─ ...
                  ↑ That's 64 jobs instead of the intended 8.

                  And somebody needs to land a patch in make or ninja that relaxes the number of jobs when they start getting swapped out or something. There is no point in running more jobs than you have RAM for.
                  Last edited by andreano; 10-22-2018, 01:15 PM.

                  Comment


                  • #19
                    Originally posted by bitman View Post
                    Can confirm. Kernel OOM killer sucks. My workstation has 32GB of ram, but i manage to run it to the ground sometimes. Simple programming errors do wonders sometimes.
                    do your errors inside of memory-limited cgroup
                    Originally posted by bitman View Post
                    Sometimes OOM killer manages to do the right thing and bring system back after several minutes, but sometimes system remains locked up. Why cant OS just nuke process that clearly consumes bigger part of resources of entire OS :|
                    that's what oom killer does. as you can see, it is not always clearly
                    Originally posted by bitman View Post
                    Why cant OS just not allow allocating entire memory to leave some room to breathe for itself and avoid lockups :|
                    it can, but you have to specify limits

                    Comment


                    • #20
                      i wonder how much memory oomd uses for running itself

                      Comment

                      Working...
                      X