Announcement

Collapse
No announcement yet.

Fedora's FESCo Has Deferred Any Decision On EarlyOOM By Default

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fedora's FESCo Has Deferred Any Decision On EarlyOOM By Default

    Phoronix: Fedora's FESCo Has Deferred Any Decision On EarlyOOM By Default

    One of the changes planned for Fedora 32 has been to enable EarlyOOM by default to better handle low memory situations either due to the system running with minimal RAM or under memory pressure. But the Fedora Engineering and Steering Committee has yet to reach a decision over this default...

    http://www.phoronix.com/scan.php?pag...efers-EarlyOOM

  • #2
    Yep, I am not convinced either. My suggestion is to find a way to implement a solution to the near-OOM lockups in the kernel.

    Comment


    • #3
      I'm not convinced OOM is the solution at all.

      With the recent discussions in mind my colleague built me a version of htop that enables delayacct. With this enabled it is possible to use htop to measure swapin% under memory pressure. I then forced memory pressure by starting a Virtualbox, qgit on the linux kernel git and chromium with 20 or so tabs open.

      I expected chromium and qgit to compete for physical memory each getting swapped in as they get scheduled.

      To my surprise only one application had a large percentage of time wasted on swapping in: X11. I found the same on other machines with different amount of RAM / swap space too. Apparently X11 needs work to prevent swapin's causing lockups and stutter.

      In any case if an OOM would have chosen to kill off the biggest memory hog (Virtualbox in this case) surely I would certainly not have been happy with that.

      Comment


      • #4
        Is there a good "benchmark" or utility to assist with reproducing the issue rather than manually opening up enough memory allocation via applications like ferry suggests?

        Originally posted by ferry View Post
        I expected chromium and qgit to compete for physical memory each getting swapped in as they get scheduled.

        To my surprise only one application had a large percentage of time wasted on swapping in: X11.
        Just to clarify, you are saying X11 was the problem and not qgit or Chromium allocating memory? How much was X11 swapping? Did you try with Wayland instead?(with no XWayland)

        Comment


        • #5
          Originally posted by ferry View Post
          I'm not convinced OOM is the solution at all.

          With the recent discussions in mind my colleague built me a version of htop that enables delayacct. With this enabled it is possible to use htop to measure swapin% under memory pressure. I then forced memory pressure by starting a Virtualbox, qgit on the linux kernel git and chromium with 20 or so tabs open.

          I expected chromium and qgit to compete for physical memory each getting swapped in as they get scheduled.

          To my surprise only one application had a large percentage of time wasted on swapping in: X11. I found the same on other machines with different amount of RAM / swap space too. Apparently X11 needs work to prevent swapin's causing lockups and stutter.

          In any case if an OOM would have chosen to kill off the biggest memory hog (Virtualbox in this case) surely I would certainly not have been happy with that.
          Well, you cannot really blame the application. If you are in a situation where the system is running out of memory, and *something* has to get paged out, the kernel uses heuristics to determine what are the best candidates. In this case X probably just had pages that were less 'hot' than the other applications you were running. But if you reach a situation where pages that were written out to swap are getting paged back in again, then back out, things are getting pathological. This is when things get tricky; do you want X to remain responsive? Who suffers in return (ie. gets paged out, or killed)? Someone might say I want my desktop to always be the priority, others might value their long running compute pig. It is a difficult problem, but unless an application is doing something 'wrong', the blame and the solution is probably in things like a better heuristic in the kernel, application hints, an omniscient OOMkiller, and tuneables

          Comment


          • #6
            Originally posted by set135 View Post

            Well, you cannot really blame the application. If you are in a situation where the system is running out of memory, and *something* has to get paged out, the kernel uses heuristics to determine what are the best candidates. In this case X probably just had pages that were less 'hot' than the other applications you were running. But if you reach a situation where pages that were written out to swap are getting paged back in again, then back out, things are getting pathological. This is when things get tricky; do you want X to remain responsive? Who suffers in return (ie. gets paged out, or killed)? Someone might say I want my desktop to always be the priority, others might value their long running compute pig. It is a difficult problem, but unless an application is doing something 'wrong', the blame and the solution is probably in things like a better heuristic in the kernel, application hints, an omniscient OOMkiller, and tuneables
            I do like the idea of a "Yo, release any memory you can" message being sent to applications. This is a rather good first step.

            For the Desktop use case Integration with an OOM daemon and a prompt for the user to decide what gets kill'd?

            There are times where I have done something silly and would love to see a dialog that says "It appears that APP_NAME_HERE is using all your memory, want to kill it?"

            Then finally letting the kernel OOM do its thing complete with heuristics and tunables being taken into account.

            Comment


            • #7
              Originally posted by set135 View Post

              Well, you cannot really blame the application. If you are in a situation where the system is running out of memory, and *something* has to get paged out, the kernel uses heuristics to determine what are the best candidates. In this case X probably just had pages that were less 'hot' than the other applications you were running. But if you reach a situation where pages that were written out to swap are getting paged back in again, then back out, things are getting pathological.
              X had swapin% of > 30% continously. I'm sure if I would have increased memory pressure it would go up further. Other applications had no visible swapin% (< 1%) or just short spikes.

              It looks like X swaps out own pages to swap in others.

              Originally posted by set135 View Post
              This is when things get tricky; do you want X to remain responsive? Who suffers in return (ie. gets paged out, or killed)? Someone might say I want my desktop to always be the priority, others might value their long running compute pig. It is a difficult problem, but unless an application is doing something 'wrong', the blame and the solution is probably in things like a better heuristic in the kernel, application hints, an omniscient OOMkiller, and tuneables
              No it's not difficult. Nobody wants their UI or GUI to become unresponsive. While everybody will accept that the memory hogging process is allowed less swapin% to protect other processes / users on the system.

              Comment


              • #8
                Originally posted by polarathene View Post
                Is there a good "benchmark" or utility to assist with reproducing the issue rather than manually opening up enough memory allocation via applications like ferry suggests?
                No, and you shouldn't do this on your main system.
                If you don't want to have a dedicated test rig you can test this inside a VM. The kernel inside the VM will act the same as one on bare metal.

                Comment


                • #9
                  Originally posted by set135 View Post
                  If you are in a situation where the system is running out of memory, and *something* has to get paged out, the kernel uses heuristics to determine what are the best candidates.
                  The main reason the OOM happens in his examples is because the page swapping to disk heuristic is bullshit and everyone sane has disabled it on desktops.

                  You can reliably lock up a system if you have swap enabled and fill up the RAM.

                  In this case X probably just had pages that were less 'hot' than the other applications you were running. But if you reach a situation where pages that were written out to swap are getting paged back in again, then back out, things are getting pathological.
                  Which is another reason I said it's bullshit. This blind "let's look at pages only" approach is very meh. If you look at how the userspace OOM daemons work you see that they use smarter heuristics than that.

                  do you want X to remain responsive?
                  User interface must have top priority in any desktop/laptop device. X isn't the only issue here.

                  When in heavy swapping situations (or low RAM) even fucking commandline was unresponsive (switching to tty2 with Ctrl+ Alt + F2 for example would take 5 minutes to actually print the login prompt in the friggin console), which is absolutely NOT acceptable, there is no significant resource use for a console tty screen.

                  others might value their long running compute pig
                  They should learn how to change the (sane) default settings that give GUI max priority then. (or upgrade their hardware)

                  Comment


                  • #10
                    Originally posted by starshipeleven View Post
                    The main reason the OOM happens in his examples is because the page swapping to disk heuristic is bullshit and everyone sane has disabled it on desktops.

                    You can reliably lock up a system if you have swap enabled and fill up the RAM.

                    Which is another reason I said it's bullshit. This blind "let's look at pages only" approach is very meh. If you look at how the userspace OOM daemons work you see that they use smarter heuristics than that.

                    User interface must have top priority in any desktop/laptop device. X isn't the only issue here.

                    When in heavy swapping situations (or low RAM) even fucking commandline was unresponsive (switching to tty2 with Ctrl+ Alt + F2 for example would take 5 minutes to actually print the login prompt in the friggin console), which is absolutely NOT acceptable, there is no significant resource use for a console tty screen.

                    They should learn how to change the (sane) default settings that give GUI max priority then. (or upgrade their hardware)
                    Fuckin eh, man

                    Comment

                    Working...
                    X