Announcement

Collapse
No announcement yet.

Yes, Linux Does Bad In Low RAM / Memory Pressure Situations On The Desktop

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by birdie View Post

    Noice!

    Now it would be even greater if it either were a tunable parameter (via sysctl) or computed based on the amount of installed RAM.
    Here you go my friend (le9h.patch):

    Code:
    le9h.patch
    
    this is licensed under all/any of:
    Apache License, Version 2.0
    MIT license
    0BSD
    CC0
    UNLICENSE
    
    diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
    index 64aeee1009ca..d0f3f7080f03 100644
    --- a/Documentation/admin-guide/sysctl/vm.rst
    +++ b/Documentation/admin-guide/sysctl/vm.rst
    @@ -68,6 +68,7 @@ Currently, these files are in /proc/sys/vm:
     - numa_stat
     - swappiness
     - unprivileged_userfaultfd
    +- unevictable_activefile_kbytes
     - user_reserve_kbytes
     - vfs_cache_pressure
     - watermark_boost_factor
    @@ -848,6 +849,69 @@ privileged users (with SYS_CAP_PTRACE capability).
     The default value is 1.
     
     
    +unevictable_activefile_kbytes
    +=============================
    +
    +How many kilobytes of `Active(file)` to never evict during high-pressure
    +low-memory situations. ie. never evict active file pages if under this value.
    +This will help prevent disk thrashing caused by Active(file) being close to zero
    +in such situations, especially when no swap is used.
    +
    +As 'nivedita' (phoronix user) put it:
    +"Executables and shared libraries are paged into memory, and can be paged out
    +even with no swap. [...] The kernel is dumping those pages and [...] immediately
    +reading them back in when execution continues."
    +^ and that's what's causing the disk thrashing during memory pressure.
    +
    +unevictable_activefile_kbytes=X will prevent X kbytes of those most used pages
    +from being evicted.
    +
    +The default value is 65536. That's 64 MiB.
    +
    +Set it to 0 to keep the default behaviour, as if this option was never
    +implemented, so you can see the disk thrashing as usual.
    +
    +To get an idea what value to use here for your workload(eg. xfce4 with idle
    +terminals) to not disk thrash at all, run this::
    +
    +    $ echo 1 | sudo tee /proc/sys/vm/drop_caches; grep -F 'Active(file)' /proc/meminfo
    +    1
    +    Active(file):     203444 kB
    +
    +so, using vm.unevictable_activefile_kbytes=203444 would be a good idea here.
    +(you can even add a `sleep` before the grep to get a slightly increased value,
    +which might be useful if something is compiling in the background and you want
    +to account for that too)
    +
    +But you can probably go with the default value of just 65536 (aka 64 MiB)
    +as this will eliminate most disk thrashing anyway, unless you're not using
    +an SSD, in which case it might still be noticeable (I'm guessing?).
    +
    +Note that `echo 1 | sudo tee /proc/sys/vm/drop_caches` can still cause
    +Active(file) to go a under the vm.unevictable_activefile_kbytes value.
    +It's not an issue and this is how you know how much the value for
    +vm.unevictable_activefile_kbytes should be, at the time/workload when you ran it.
    +
    +The value of `Active(file)` can be gotten in two ways::
    +
    +    $ grep -F 'Active(file)' /proc/meminfo
    +    Active(file):    2712004 kB
    +
    +and::
    +
    +    $ grep nr_active_file /proc/vmstat
    +    nr_active_file 678001
    +
    +and multiply that with MAX_NR_ZONES (which is 4), ie. `nr_active_file * MAX_NR_ZONES`
    +so 678001*4=2712004  kB
    +
    +MAX_NR_ZONES is 4 as per:
    +`include/generated/bounds.h:10:#define MAX_NR_ZONES 4 /* __MAX_NR_ZONES */`
    +and is unlikely the change in the future.
    +
    +The hub of disk thrashing tests/explanations is here:
    +https://gist.github.com/constantoverride/84eba764f487049ed642eb2111a20830
    +
     user_reserve_kbytes
     ===================
     
    diff --git a/kernel/sysctl.c b/kernel/sysctl.c
    index 078950d9605b..c2726324a176 100644
    --- a/kernel/sysctl.c
    +++ b/kernel/sysctl.c
    @@ -110,6 +110,15 @@ extern int core_uses_pid;
     extern char core_pattern[];
     extern unsigned int core_pipe_limit;
     #endif
    +#if defined(CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING)
    +unsigned long sysctl_unevictable_activefile_kbytes __read_mostly =
    +#if CONFIG_RESERVE_ACTIVEFILE_KBYTES < 0
    +#error "CONFIG_RESERVE_ACTIVEFILE_KBYTES should be >= 0"
    +#else
    +  CONFIG_RESERVE_ACTIVEFILE_KBYTES
    +#endif
    +;
    +#endif
     extern int pid_max;
     extern int pid_max_min, pid_max_max;
     extern int percpu_pagelist_fraction;
    @@ -1691,6 +1701,15 @@ static struct ctl_table vm_table[] = {
             .extra1        = SYSCTL_ZERO,
             .extra2        = SYSCTL_ONE,
         },
    +#endif
    +#if defined(CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING)
    +    {
    +        .procname    = "unevictable_activefile_kbytes",
    +        .data        = &sysctl_unevictable_activefile_kbytes,
    +        .maxlen        = sizeof(sysctl_unevictable_activefile_kbytes),
    +        .mode        = 0644,
    +        .proc_handler    = proc_doulongvec_minmax,
    +    },
     #endif
         {
             .procname    = "user_reserve_kbytes",
    diff --git a/mm/Kconfig b/mm/Kconfig
    index 56cec636a1fc..d21b737ca32e 100644
    --- a/mm/Kconfig
    +++ b/mm/Kconfig
    @@ -63,6 +63,39 @@ config SPARSEMEM_MANUAL
     
     endchoice
     
    +config RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING
    +    bool "Reserve some `Active(file)` to prevent disk thrashing"
    +    depends on IKCONFIG_PROC && SYSCTL
    +    def_bool y
    +    help
    +      Keep `Active(file)`(/proc/meminfo) pages in RAM so as to avoid system freeze
    +      due to the disk thrashing(disk reading only) that occurrs because the running
    +      executables's code is being evicted during low-mem conditions which is
    +      why it also prevents oom-killer from triggering until 10s of minutes later
    +      on some systems.
    +     
    +      Please see the value of CONFIG_RESERVE_ACTIVEFILE_KBYTES to set how many
    +      KiloBytes of Active(file) to keep by default in the sysctl setting
    +      vm.unevictable_activefile_kbytes
    +      see Documentation/admin-guide/sysctl/vm.rst for more info
    +
    +config RESERVE_ACTIVEFILE_KBYTES
    +    int "Set default value for vm.unevictable_activefile_kbytes"
    +  depends on RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING
    +    default "65536"
    +    help
    +      This is the default value(in KiB) that vm.unevictable_activefile_kbytes gets.
    +      A value of at least 65536 or at most 262144 is recommended for users
    +      of xfce4 to avoid disk thrashing on low-memory/memory-pressure conditions,
    +      ie. mouse freeze with constant disk activity (but you can still sysrq+f to
    +      trigger oom-killer though, even without this mitigation)
    +     
    +      You can still sysctl set vm.unevictable_activefile_kbytes to a value of 0
    +      to disable this whole feature at runtime.
    +     
    +      see Documentation/admin-guide/sysctl/vm.rst for more info
    +      see also CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING
    +
     config DISCONTIGMEM
         def_bool y
         depends on (!SELECT_MEMORY_MODEL && ARCH_DISCONTIGMEM_ENABLE) || DISCONTIGMEM_MANUAL
    diff --git a/mm/vmscan.c b/mm/vmscan.c
    index dbdc46a84f63..0dcd4e2dc02d 100644
    --- a/mm/vmscan.c
    +++ b/mm/vmscan.c
    @@ -2445,6 +2445,16 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
                 BUG();
             }
     
    +#if defined(CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING)
    +    extern unsigned int sysctl_unevictable_activefile_kbytes; //FIXME: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
    +    if (NR_ACTIVE_FILE == lru) { //FIXME: warning: comparison between ‘enum node_stat_item’ and ‘enum lru_list’ [-Wenum-compare]
    +      long long kib_active_file_now=global_node_page_state(NR_ACTIVE_FILE) * MAX_NR_ZONES;
    +      if (kib_active_file_now <= sysctl_unevictable_activefile_kbytes) {
    +        nr[lru] = 0; //ie. don't reclaim any Active(file) (see /proc/meminfo) if they are under sysctl_unevictable_activefile_kbytes see Documentation/admin-guide/sysctl/vm.rst and CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING and CONFIG_RESERVE_ACTIVEFILE_KBYTES
    +        continue;
    +      }
    +    }
    +#endif
             *lru_pages += size;
             nr[lru] = scan;
         }

    Comment


    • howaboutsynergy

      This patch looks like it could be merged with mainline. Why don't you try sending it to linux-mm?

      Comment


      • (the Quote button doesn't appear sometimes and on the next refresh I'm logged out too(seems like 1min auto-logout? probably Firefox's fault as usual), so using my imagination
        birdie replied
        This patch looks like it could be merged with mainline. Why don't you try sending it to linux-mm?
        multiple reasons
        * this patch is just a proof of concept really, and does not meet the quality I'd accept of myself for sending it upstream (have you read that help text? lol)
        * sending patches to ML requires having read and knowing all the rules for submitting patches - <s>yuck </s>(ie. me lazy)
        * they require real name and I don't want/care to provide one(did it in the past tho)
        * they will want changes to the patch that I won't like to do while still keeping my name attached to the patch (as a example from my prev. time: moving a define whose place was clearly inside a .h near its siblings(CPU stuff), into the .c right above and in the <s>middle</s>(actually top) of the function of the code using it, just because it was the only place this define was used)
        * lazy
        * kernel is so bugged that I learned to not care anymore

        But hey if anyone else wants to send it, be my guest, but use your own name (it's ok, you can pretend that you wrote it, you've my permission, or you can even modify it)
        I don't care, I consider the patch in the public domain(and/or all other licenses, for ease of use).

        <s>/me out</s>(actually I've decided to resume using this account(since today 03sept2019) - maybe because I'm too lazy to create yet another one everywhere, or I simply want to synergize on this one)
        Also thanks to latalante in the next comment for linking those patches which I was totally unaware of and haven't yet tested them but noticed that one of them also has an `Inactive(file)` threshold too, besides the `Active(file)` one.
        Last edited by howaboutsynergy; 09-03-2019, 02:06 PM. Reason: strikethrough attempt of some foolishness and fixed misremembering, also getting back to using this named account everywhere

        Comment


        • Originally posted by birdie View Post
          howaboutsynergy

          This patch looks like it could be merged with mainline. Why don't you try sending it to linux-mm?
          ChromeOS developers shipped it 9 years ago.
          https://lore.kernel.org/patchwork/patch/222042/
          It has not been accepted, but it is probably still used in ChromeOS.
          https://gitlab.freedesktop.org/seanp...5b97cf41e2ec6d

          In the days of memcg (cgroup) it doesn't seem necessary (at least to me). I always use cgroup.
          Only no GNU distribution in 2019 offers such a default configuration to less experienced users. None in 2019 uses zRAM by default. Horror.

          Comment


          • It's been the same since so many days.

            Comment


            • Not surprised. At my workplace (where we work with low cost embedded devices), we have to do some crazy tweaks to Linux to make it work nicely.

              Comment


              • +As 'nivedita' (phoronix user) put it:
                +"Executables and shared libraries are paged into memory, and can be paged out +even with no swap. [...] The kernel is dumping those pages and [...] immediately +reading them back in when execution continues." +^ and that's what's causing the disk thrashing during memory pressure.
                I wonder, what it takes to totally disable this attitude? Would it have some side effects or something, beyond failing to save pennies such an awkward way?

                Comment

                Working...
                X