Announcement

Collapse
No announcement yet.

Yes, Linux Does Bad In Low RAM / Memory Pressure Situations On The Desktop

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by Neraxa View Post
    Many have suggested this is caused by disabling swap however I have seen it with gigabytes of swap enabled and with most of the swap space being unused. What brings it on is things coming to within 200 MB or so of running out of RAM space. Its usually Firefox, and killing Firefox unlocks the system (it can take hours to actually get that done considering the system is in a virtually locked up state). The OOM killer is obviously not getting rid of Firefox itself, it would not obviously since there are gigabytes of swap still free. So, it kind of looks like the OOM killer is not even involved here since there is plenty of swap space available/. Looks like could be a problem involving i/o scheduling, allocation, process scheduling or something.
    On my 16 GB laptop w/o swap where I do everything from multimedia to sw development to web browsing with FF and 20+ tabs open I haven't experienced any of the issues mentioned here. But if the default OOM killer is too slow, what about the already mentioned earlyoom or nohang (both with support for psi), or facebook's oomd - anybody tried them in these situations?
    Last edited by halo9en; 12 August 2019, 09:31 AM.

    Comment


    • I just tested this on FreeBSD 12. When a process hits the memory cap it instantly prints pid num, process name, uid num, was killed: out of swap space. Using moused to move the mouse around it doesn't even hickup. It took me less than 10 minutes to test this.

      FreeBSD's oom killer isn't very advanced but there is value in simplicity. This just works.. I know someone is out there writing a huge user land daemon to fix this.. But you don't need systemd-oombandaid. In fact the reason Linux is thrashing the disk is probably system-journal.

      Comment


      • Well I have something that works for me(ie. no disk thrashing), but only made it today(so not tested much) and that is a kernel patch(le9g.patch) to not evict `Active(file):`(see /proc/meminfo) if below 256 MiB (this should depend on your workload). I've tested it with linux-stable 5.2.4 because on linuxgit 5.3.0-rc4-gd45331b00ddb there's a yet-to-be-found-regression that freezes the whole system(without disk thrashing though) whether I use the patch or not, apparently before OOM-Killer would trigger.

        Code:
        diff --git a/mm/vmscan.c b/mm/vmscan.c
        index dbdc46a84f63..7a0b7e32ff45 100644
        --- a/mm/vmscan.c
        +++ b/mm/vmscan.c
        @@ -2445,6 +2445,13 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
                     BUG();
                 }
         
        +    if (NR_ACTIVE_FILE == lru) {
        +      long long kib_active_file_now=global_node_page_state(NR_ACTIVE_FILE) * MAX_NR_ZONES;
        +      if (kib_active_file_now <= 256*1024) {
        +        nr[lru] = 0; //don't reclaim any Active(file) (see /proc/meminfo) if they are under 256MiB
        +        continue;
        +      }
        +    }
                 *lru_pages += size;
                 nr[lru] = scan;
             }

        Comment


        • Originally posted by howaboutsynergy View Post
          Well I have something that works for me(ie. no disk thrashing), but only made it today(so not tested much) and that is a kernel patch(le9g.patch) to not evict `Active(file):`(see /proc/meminfo) if below 256 MiB (this should depend on your workload). I've tested it with linux-stable 5.2.4 because on linuxgit 5.3.0-rc4-gd45331b00ddb there's a yet-to-be-found-regression that freezes the whole system(without disk thrashing though) whether I use the patch or not, apparently before OOM-Killer would trigger.

          Code:
          diff --git a/mm/vmscan.c b/mm/vmscan.c
          index dbdc46a84f63..7a0b7e32ff45 100644
          --- a/mm/vmscan.c
          +++ b/mm/vmscan.c
          @@ -2445,6 +2445,13 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
          BUG();
          }
          
          + if (NR_ACTIVE_FILE == lru) {
          + long long kib_active_file_now=global_node_page_state(NR_ACTIVE_FILE) * MAX_NR_ZONES;
          + if (kib_active_file_now <= 256*1024) {
          + nr[lru] = 0; //don't reclaim any Active(file) (see /proc/meminfo) if they are under 256MiB
          + continue;
          + }
          + }
          *lru_pages += size;
          nr[lru] = scan;
          }
          Noice!

          Now it would be even greater if it either were a tunable parameter (via sysctl) or computed based on the amount of installed RAM.

          Comment


          • "..Linux Does BadLY ...". Love ye Michael, but your grammar is better than that.

            Comment


            • Originally posted by birdie View Post

              Noice!

              Now it would be even greater if it either were a tunable parameter (via sysctl) or computed based on the amount of installed RAM.
              Here you go my friend (up to date here: le9h.patch):

              Code:
              le9h.patch
              
              this is licensed under all/any of:
              Apache License, Version 2.0
              MIT license
              0BSD
              CC0
              UNLICENSE
              
              diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
              index 64aeee1009ca..d0f3f7080f03 100644
              --- a/Documentation/admin-guide/sysctl/vm.rst
              +++ b/Documentation/admin-guide/sysctl/vm.rst
              @@ -68,6 +68,7 @@ Currently, these files are in /proc/sys/vm:
               - numa_stat
               - swappiness
               - unprivileged_userfaultfd
              +- unevictable_activefile_kbytes
               - user_reserve_kbytes
               - vfs_cache_pressure
               - watermark_boost_factor
              @@ -848,6 +849,69 @@ privileged users (with SYS_CAP_PTRACE capability).
               The default value is 1.
               
               
              +unevictable_activefile_kbytes
              +=============================
              +
              +How many kilobytes of `Active(file)` to never evict during high-pressure
              +low-memory situations. ie. never evict active file pages if under this value.
              +This will help prevent disk thrashing caused by Active(file) being close to zero
              +in such situations, especially when no swap is used.
              +
              +As 'nivedita' (phoronix user) put it:
              +"Executables and shared libraries are paged into memory, and can be paged out
              +even with no swap. [...] The kernel is dumping those pages and [...] immediately
              +reading them back in when execution continues."
              +^ and that's what's causing the disk thrashing during memory pressure.
              +
              +unevictable_activefile_kbytes=X will prevent X kbytes of those most used pages
              +from being evicted.
              +
              +The default value is 65536. That's 64 MiB.
              +
              +Set it to 0 to keep the default behaviour, as if this option was never
              +implemented, so you can see the disk thrashing as usual.
              +
              +To get an idea what value to use here for your workload(eg. xfce4 with idle
              +terminals) to not disk thrash at all, run this::
              +
              +    $ echo 1 | sudo tee /proc/sys/vm/drop_caches; grep -F 'Active(file)' /proc/meminfo
              +    1
              +    Active(file):     203444 kB
              +
              +so, using vm.unevictable_activefile_kbytes=203444 would be a good idea here.
              +(you can even add a `sleep` before the grep to get a slightly increased value,
              +which might be useful if something is compiling in the background and you want
              +to account for that too)
              +
              +But you can probably go with the default value of just 65536 (aka 64 MiB)
              +as this will eliminate most disk thrashing anyway, unless you're not using
              +an SSD, in which case it might still be noticeable (I'm guessing?).
              +
              +Note that `echo 1 | sudo tee /proc/sys/vm/drop_caches` can still cause
              +Active(file) to go a under the vm.unevictable_activefile_kbytes value.
              +It's not an issue and this is how you know how much the value for
              +vm.unevictable_activefile_kbytes should be, at the time/workload when you ran it.
              +
              +The value of `Active(file)` can be gotten in two ways::
              +
              +    $ grep -F 'Active(file)' /proc/meminfo
              +    Active(file):    2712004 kB
              +
              +and::
              +
              +    $ grep nr_active_file /proc/vmstat
              +    nr_active_file 678001
              +
              +and multiply that with MAX_NR_ZONES (which is 4), ie. `nr_active_file * MAX_NR_ZONES`
              +so 678001*4=2712004  kB
              +
              +MAX_NR_ZONES is 4 as per:
              +`include/generated/bounds.h:10:#define MAX_NR_ZONES 4 /* __MAX_NR_ZONES */`
              +and is unlikely the change in the future.
              +
              +The hub of disk thrashing tests/explanations is here:
              +https://gist.github.com/constantoverride/84eba764f487049ed642eb2111a20830
              +
               user_reserve_kbytes
               ===================
               
              diff --git a/kernel/sysctl.c b/kernel/sysctl.c
              index 078950d9605b..c2726324a176 100644
              --- a/kernel/sysctl.c
              +++ b/kernel/sysctl.c
              @@ -110,6 +110,15 @@ extern int core_uses_pid;
               extern char core_pattern[];
               extern unsigned int core_pipe_limit;
               #endif
              +#if defined(CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING)
              +unsigned long sysctl_unevictable_activefile_kbytes __read_mostly =
              +#if CONFIG_RESERVE_ACTIVEFILE_KBYTES < 0
              +#error "CONFIG_RESERVE_ACTIVEFILE_KBYTES should be >= 0"
              +#else
              +  CONFIG_RESERVE_ACTIVEFILE_KBYTES
              +#endif
              +;
              +#endif
               extern int pid_max;
               extern int pid_max_min, pid_max_max;
               extern int percpu_pagelist_fraction;
              @@ -1691,6 +1701,15 @@ static struct ctl_table vm_table[] = {
                       .extra1        = SYSCTL_ZERO,
                       .extra2        = SYSCTL_ONE,
                   },
              +#endif
              +#if defined(CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING)
              +    {
              +        .procname    = "unevictable_activefile_kbytes",
              +        .data        = &sysctl_unevictable_activefile_kbytes,
              +        .maxlen        = sizeof(sysctl_unevictable_activefile_kbytes),
              +        .mode        = 0644,
              +        .proc_handler    = proc_doulongvec_minmax,
              +    },
               #endif
                   {
                       .procname    = "user_reserve_kbytes",
              diff --git a/mm/Kconfig b/mm/Kconfig
              index 56cec636a1fc..d21b737ca32e 100644
              --- a/mm/Kconfig
              +++ b/mm/Kconfig
              @@ -63,6 +63,39 @@ config SPARSEMEM_MANUAL
               
               endchoice
               
              +config RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING
              +    bool "Reserve some `Active(file)` to prevent disk thrashing"
              +    depends on IKCONFIG_PROC && SYSCTL
              +    def_bool y
              +    help
              +      Keep `Active(file)`(/proc/meminfo) pages in RAM so as to avoid system freeze
              +      due to the disk thrashing(disk reading only) that occurrs because the running
              +      executables's code is being evicted during low-mem conditions which is
              +      why it also prevents oom-killer from triggering until 10s of minutes later
              +      on some systems.
              +    
              +      Please see the value of CONFIG_RESERVE_ACTIVEFILE_KBYTES to set how many
              +      KiloBytes of Active(file) to keep by default in the sysctl setting
              +      vm.unevictable_activefile_kbytes
              +      see Documentation/admin-guide/sysctl/vm.rst for more info
              +
              +config RESERVE_ACTIVEFILE_KBYTES
              +    int "Set default value for vm.unevictable_activefile_kbytes"
              +  depends on RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING
              +    default "65536"
              +    help
              +      This is the default value(in KiB) that vm.unevictable_activefile_kbytes gets.
              +      A value of at least 65536 or at most 262144 is recommended for users
              +      of xfce4 to avoid disk thrashing on low-memory/memory-pressure conditions,
              +      ie. mouse freeze with constant disk activity (but you can still sysrq+f to
              +      trigger oom-killer though, even without this mitigation)
              +    
              +      You can still sysctl set vm.unevictable_activefile_kbytes to a value of 0
              +      to disable this whole feature at runtime.
              +    
              +      see Documentation/admin-guide/sysctl/vm.rst for more info
              +      see also CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING
              +
               config DISCONTIGMEM
                   def_bool y
                   depends on (!SELECT_MEMORY_MODEL && ARCH_DISCONTIGMEM_ENABLE) || DISCONTIGMEM_MANUAL
              diff --git a/mm/vmscan.c b/mm/vmscan.c
              index dbdc46a84f63..0dcd4e2dc02d 100644
              --- a/mm/vmscan.c
              +++ b/mm/vmscan.c
              @@ -2445,6 +2445,16 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
                           BUG();
                       }
               
              +#if defined(CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING)
              +    extern unsigned int sysctl_unevictable_activefile_kbytes; //FIXME: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
              +    if (NR_ACTIVE_FILE == lru) { //FIXME: warning: comparison between ‘enum node_stat_item’ and ‘enum lru_list’ [-Wenum-compare]
              +      long long kib_active_file_now=global_node_page_state(NR_ACTIVE_FILE) * MAX_NR_ZONES;
              +      if (kib_active_file_now <= sysctl_unevictable_activefile_kbytes) {
              +        nr[lru] = 0; //ie. don't reclaim any Active(file) (see /proc/meminfo) if they are under sysctl_unevictable_activefile_kbytes see Documentation/admin-guide/sysctl/vm.rst and CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING and CONFIG_RESERVE_ACTIVEFILE_KBYTES
              +        continue;
              +      }
              +    }
              +#endif
                       *lru_pages += size;
                       nr[lru] = scan;
                   }
              Last edited by howaboutsynergy; 05 November 2019, 09:32 AM. Reason: using archive org url because github account is deleted

              Comment


              • howaboutsynergy

                This patch looks like it could be merged with mainline. Why don't you try sending it to linux-mm?

                Comment


                • (the Quote button doesn't appear sometimes and on the next refresh I'm logged out too(seems like 1min auto-logout? probably Firefox's fault as usual), so using my imagination
                  birdie replied
                  This patch looks like it could be merged with mainline. Why don't you try sending it to linux-mm?
                  multiple reasons
                  * this patch is just a proof of concept really, and does not meet the quality I'd accept of myself for sending it upstream (have you read that help text? lol)
                  * sending patches to ML requires having read and knowing all the rules for submitting patches - <s>yuck </s>(ie. me lazy)
                  * they require real name and I don't want/care to provide one(did it in the past tho)
                  * they will want changes to the patch that I won't like to do while still keeping my name attached to the patch (as a example from my prev. time: moving a define whose place was clearly inside a .h near its siblings(CPU stuff), into the .c right above and in the <s>middle</s>(actually top) of the function of the code using it, just because it was the only place this define was used)
                  * lazy
                  * kernel is so bugged that I learned to not care anymore

                  But hey if anyone else wants to send it, be my guest, but use your own name (it's ok, you can pretend that you wrote it, you've my permission, or you can even modify it)
                  I don't care, I consider the patch in the public domain(and/or all other licenses, for ease of use).

                  <s>/me out</s>(actually I've decided to resume using this account(since today 03sept2019) - maybe because I'm too lazy to create yet another one everywhere, or I simply want to synergize on this one) - EDIT: nevermind, deleted everything at the end of oct. 2019, but my gists r still available on archive org tho.

                  Also thanks to latalante in the next comment for linking those patches which I was totally unaware of and haven't yet tested them but noticed that one of them also has an `Inactive(file)` threshold too, besides the `Active(file)` one.
                  Last edited by howaboutsynergy; 05 November 2019, 09:30 AM.

                  Comment


                  • Originally posted by birdie View Post
                    howaboutsynergy

                    This patch looks like it could be merged with mainline. Why don't you try sending it to linux-mm?
                    ChromeOS developers shipped it 9 years ago.

                    It has not been accepted, but it is probably still used in ChromeOS.
                    Total number of patches against baseline: 367 Merge log: ---------------------------------------------------------------- Ben Cheng (1): CHROMIUM: low_mem: fix file_size underflow Cheng-Yu Lee (1): CHROMIUM: Consider min_free_kbytes in low memory notification. Guenter Roeck...


                    In the days of memcg (cgroup) it doesn't seem necessary (at least to me). I always use cgroup.
                    Only no GNU distribution in 2019 offers such a default configuration to less experienced users. None in 2019 uses zRAM by default. Horror.

                    Comment


                    • It's been the same since so many days.

                      Comment

                      Working...
                      X