Announcement

Collapse
No announcement yet.

Clear Linux Set To Begin Offering EarlyOOM For Better Dealing With Memory Pressure

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by tomas View Post

    And what is the solution? Why do you see this as a workaround?
    The solution is fixing the root cause of this problem in the kernel.

    I see this as a workaround since it does not solve the actual issue but is only a prevention measure. It feels like patching a broken window with tape instead of replacing the window.
    Last edited by tildearrow; 08 January 2020, 11:04 AM.

    Comment


    • #12
      Originally posted by tildearrow View Post

      The solution is fixing the root cause of this problem in the kernel.

      I see this as a workaround since it does not solve the actual issue but is only a prevention measure. It feels like patching a broken window with tape instead of replacing the window.
      I'm afraid you do not seem to understand what "this problem" is.
      All systems can can run out of memory by applications requesting more and more memory.
      Either it's happening because the system is simply overloaded with too many applications requesting memory, or it might be an application that is buggy and has gone wild requesting more and more memory. So the problem can not easily be "fixed" in the kernel like it would be some known flaw. It's just simply a situation that can occur that a system must somehow handle. The OOM-killer in the kernel is one such attempt to somehow handle such a situation. This new EarlyOOM is another.

      Comment


      • #13
        Originally posted by tomas View Post

        I'm afraid you do not seem to understand what "this problem" is.
        All systems can can run out of memory by applications requesting more and more memory.
        Either it's happening because the system is simply overloaded with too many applications requesting memory, or it might be an application that is buggy and has gone wild requesting more and more memory. So the problem can not easily be "fixed" in the kernel like it would be some known flaw. It's just simply a situation that can occur that a system must somehow handle. The OOM-killer in the kernel is one such attempt to somehow handle such a situation. This new EarlyOOM is another.
        True, but it would be nice for the kernel to be able to trigger a kill processes when just ram (not ram + swap) is full and no caches/buffer can be freed.

        Comment


        • #14
          Originally posted by tomas View Post

          I'm afraid you do not seem to understand what "this problem" is.
          All systems can can run out of memory by applications requesting more and more memory.
          Either it's happening because the system is simply overloaded with too many applications requesting memory, or it might be an application that is buggy and has gone wild requesting more and more memory. So the problem can not easily be "fixed" in the kernel like it would be some known flaw. It's just simply a situation that can occur that a system must somehow handle. The OOM-killer in the kernel is one such attempt to somehow handle such a situation. This new EarlyOOM is another.
          The problem is not running out of memory.
          The problem is having the computer freeze for like 10 minutes BEFORE we run out of memory and the OOM killer kicks in.
          It should be kicking in immediately, with little to no freezes.

          Comment


          • #15
            Originally posted by birdie View Post

            That's funny and sad simultaneously. Before the earlyoom proposal no one in Fedora gave an f about this issue and no one worked on including FB's oomd in systemd. Now, when we do have a working solution without any if's Lennart starts opposing to it: "If it's not from me, it's "bad"".

            And what's wrong with a 100ms interval in earlyoom? There are systems and situations where this interval is 100% warranted and anything bigger than that will make the system unresponsive before earlyoom has enough time to react.
            Lennart is not developing it, FB is. They do provide the logic on why they think it is better and Lennart is paraphrasing them:

            "then also determine what to kill taking the swap use into account and little else (which it apparently does not). This doesn't make any sense to have though if there is no swap."

            "Don't bother with the OOM score the kernel calculates for processes, it doesn't take the swap use into account. That said, do take the configurable OOM score *adjustment* into account, so that processes which set that are respected, i.e. journald, udevd, and such. (or in otherwords, ignore /proc/$PID/oom_score, but respect /proc/PID/oom_score_adj)."

            "they also will do the systemd work necessary. time frame: half a year, maybe one year, but no guarantees."

            Comment


            • #16
              Originally posted by tomas View Post

              And what is the solution? Why do you see this as a workaround?

              He's pointing out that there shouldn't be a workaround. This is a problem in the Linux kernel itself, and it should be addressed at the level of the problem (kernel space) and not in user space with yet another daemon. The fact that systemd is going to integrate a competing system bugs me as well since this is the wrong place to be addressing the problem, even as a work around. Workarounds have particularly LONG half lifes.

              Comment


              • #17
                Originally posted by stormcrow View Post

                He's pointing out that there shouldn't be a workaround. This is a problem in the Linux kernel itself, and it should be addressed at the level of the problem (kernel space) and not in user space with yet another daemon. The fact that systemd is going to integrate a competing system bugs me as well since this is the wrong place to be addressing the problem, even as a work around. Workarounds have particularly LONG half lifes.
                Did you read my follow-up post?
                In order for something to be labeled a "workaround" there must be some notion of what a "proper solution" would be and what the "root cause" of the "problem" is. At least on a conceptual level. What is your perception of what the "problem" is and what a "proper" solution to that "problem" is? How can the "problem" be solved by the kernel? From my viewpoint this is about user space allocating too much of something that can be considered to be a finite resource, i.e. memory. The solution to that is either for user space to start releasing memory it does not need (cached etc) and hopefully it will be enough so that the system can continue functioning. But if user space anyway continues requesting more and more memory, the only option left will eventually be to somehow start killing processes, preferably "the offending ones" if that is possible to decide, and hopefully that will be enough in order for the system to continue functioning.

                Finally, if this problem would have been easy to solve, don't you think that would already have happened by now? I mean, it's not like other operating systems like Windows or MacOs handle this significantly better, do they?

                Comment


                • #18
                  My main worry with whatever systemd comes up with is whether it'll support an equivalent to -m 10 --avoid '(^|/)(init|X|sshd|leafpad|python(\d(.\d)?)?)$' --prefer '(^|/)firefox .*-contentproc( |$)'

                  (I have 16GiB of RAM and no SSD, so -m 10 to have it kick in when memory usage hits 90% before it evicts too much of my disk cache. The rest just causes it to avoid killing things that don't auto-save and prefer killing browser tabs that have gotten rude about their memory usage. If it is one of my Python creations leaking, better to notify me with a big "this tab died" message so I can manually decide what to do.)

                  It used to be much more necessary before I realized that having some swap was critical to the kernel's RAM defragmentation strategy and enabled zram. Now, it rarely triggers but, when it does, it's far preferrable to letting things slow to a crawl as disk cache and the runaway process fight for what remains in the lead-up to actually enterting an OOM state.

                  I do wish there was a simple, intuitive way to request that things be pinned in RAM so I could go through and request pinning for everything that gets accessed when I fire up a terminal window and run htop. As-is, prior to having earlyoom, I bought a drive-bay LCD specifically so I could have a "top memory consumers" readout that couldn't get covered up by other windows and printed a reference for triggering the OOM killer via SysRq to tape to the bottom of my center monitor.
                  Last edited by ssokolow; 08 January 2020, 04:22 PM.

                  Comment


                  • #19
                    Originally posted by grigi View Post
                    I say perceived because Linux manages memory much better than other desktop OS'es.
                    Especially once enabling transparent memory compression that can give one a 10-20% more effective ram which really makes a difference on a lowly 4G system.
                    10-20%?

                    For me, zram with zstd usually compresses RAM to better than half the size, so I would say it's closer to 70-80% more effective RAM.

                    Comment


                    • #20
                      The facebook solution is dependent on a yet-to-be-found default configuration that works well across all kinds of different situations. This is the hard part, of course. Killing stuff is easy, killing stuff intelligently is hard. Otherwise this debate would not exist (all the people who think this problem is due to negligence or arrogance of the kernel developers just look stupid in my eyes).

                      earlyoomd stops the deadlocking. a new project, nohang, can be tweaked to use memory pressure stats (Facebook work which is in recent kernels) and it can use zram stats too, both of which can provide more sophisticated warning of pending memory problems. It's important to remember that if your system becomes unusable because you ran out of ram, you already know what the fundamental problem is. You need more ram (or better code). The responsibility of the OS/user space is not to fix this, it is to fail gracefully. earlyoom does this, for a rough approximation of 'gracefully'. nohang is a bit more elegant (you can get desktop notifications out of the box, both for pending problems, and for what was killed)
                      Last edited by timrichardson; 08 January 2020, 06:59 PM.

                      Comment

                      Working...
                      X