Announcement

Collapse
No announcement yet.

"Nest" Is An Interesting New Take On Linux Kernel Scheduling For Better CPU Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Nest" Is An Interesting New Take On Linux Kernel Scheduling For Better CPU Performance

    Phoronix: "Nest" Is An Interesting New Take On Linux Kernel Scheduling For Better CPU Performance

    There has been a number of different efforts in recent time to further enhance the Linux kernel's scheduler to better adapt to modern hardware architectures whether it be for Intel hybrid CPU designs, adapting to new CPU cache configurations, or just better scaling with today's ever-increasing core counts. Another scheduler effort detailed this week is "Nest" that aims to keep tasks on "warm cores" with hopes of lower latency due to being already at higher clock/performance states and ideally operating at an optimal turbo/boost frequency. The Nest developers find that their scheduler "improves performance 10%-2x and can reduce energy usage" with modern hardware...

    https://www.phoronix.com/news/Nest-L...ling-Warm-Core

  • #2
    I was hopping for 6.1/6.2 to be the 'ultimate' kernel with features like RT, MGLRU and IO_uring_spawn, but now this.. Another great feature worth waiting for.

    Comment


    • #3
      Volta but is it coming to upstream?

      Comment


      • #4
        I've been thinking about that for years, it's strange no one has raised the issue earlier. The Linux kernel is notorious for its penchant for juggling tasks between CPU cores for no reasons and that results in emptying whatever you had in L1/L2 caches prior and non-zero delays considering new CPU cores could be at their absolute lowest power settings when they are given a task to execute.

        You could simply run:

        7z b -mmt1

        And see in top or any graphical process manager how the task is thrown between CPU cores.

        Comment


        • #5
          Basically they found out that you get better performance by tweaking the scheduling of the workload in a way that makes use of higher boost frequencies when using less cores. Wow, who would have thought?!

          I am not surprised that this only affects low to mid-level workloads the most either, as your frequencies fall down the cliff on core-heavy workloads. And yes, as I have a undervolted 18-Core-Xeon with an unlocked turboboost, that CPU is easily hitting its TDP limit, mind you. So I've made use of this behavior to get better performance out of the CPU for a long time now (and am not the only one doing so). It would be great if the Linux Scheduler would be aware of this behavior and make more optimal choices, of course. I only wonder why this is not the case yet, as things like TDP limitations and different base/turbo frequencies are nothing new.

          Volta Maybe some of the many custom schedulers are already more clever than CFS? I use BORE as of late, but I don't know if that already accounts for this.
          Last edited by ms178; 15 September 2022, 06:40 AM.

          Comment


          • #6
            I haven't tested it much but on my first run taskset -c 0 7z b -mmt1 is faster than just running 7z b -mmt1 (almost completely idle Ryzen 5800X here).

            Comment


            • #7
              Originally posted by birdie View Post
              I've been thinking about that for years, it's strange no one has raised the issue earlier. The Linux kernel is notorious for its penchant for juggling tasks between CPU cores for no reasons and that results in emptying whatever you had in L1/L2 caches prior and non-zero delays considering new CPU cores could be at their absolute lowest power settings when they are given a task to execute.
              Interesting.. What you described here is exactly the same as the behavior of the Windows scheduler someone described in another forum. In that forum I've even seen people calling the Windows scheduler "dumb", when it turns out the Linux scheduler actually works in a similar way.

              Comment


              • #8
                Originally posted by bezirg View Post
                Volta but is it coming to upstream?
                Who knows? We'll see.

                Comment


                • #9
                  Originally posted by ms178 View Post
                  Maybe some of the many custom schedulers are already more clever than CFS? I use BORE as of late, but I don't know if that already accounts for this.
                  Maybe there are, but it depends on workload. From what I'm seeing Nest is just CFS enhancement when comes to task placement decisions.

                  Comment


                  • #10
                    But this implies that the die layout has to be hardcoded for any cpu or at least any generation especially for the large server cpus...or maybe its given by staying in a numa node (for the lager ones)? maybe it is less problematic then i think

                    edit. There is another culprit aswell.

                    Lets imagine the scheduler chooses for performance the best high clockable core. Statistically one out of n is the best and will it always be. This would result in a potential overusage of one core and its neighbours. This will result in more thermal wearout on this particular core (and its neighbour.
                    Last edited by CochainComplex; 15 September 2022, 07:25 AM.

                    Comment

                    Working...
                    X