Announcement

Collapse
No announcement yet.

Looking At An Early Performance Regression In Linux 5.13 - Scheduler Related

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    But yes it would be trivial to turn back on more daily/weekly testing if I had the resources.
    Well, given that in my short time following Phoronix I have seen Michael find regressions in the kernel more than once, it would make sense to me that the Linux Foundation should be a sponsor for that work.

    Comment


    • #12
      Originally posted by Michael View Post

      Higher core count systems help with running things faster but not with all bugs... Really need a mix of systems. Like the CPPC freq invariance being spotted in part by its impact on smaller systems. Especially with many of the 'corporate' developers testing mostly on such high core count servers and less so on smaller or desktop systems.

      But yes it would be trivial to turn back on more daily/weekly testing if I had the resources.
      Sadly, that was my assumption.

      Do you or anyone else know anything about some form of crowdsourced testing program? Like what digital coins or [email protected] do, but for the kernel and potentially other open source programs. Even if you did have a lot of resources it seems like a multitude of different tests on a wide variety of hardware and software configurations would help and that a crowdsourcing program would be a great way to achieve that.

      I have an idea in my head, but doing that is way beyond the scope of what I can achieve from hacking together Bash snippets from Stack Overflow posts. About all I can do is suggest it and participate in it if it ever exists.

      Is that something that could be done with the PTS in the short-term? We all just run a certain config on a certain commit once a week? Like every Tuesday becomes Phoronix Crowdsource Day?

      Comment


      • #13
        Originally posted by Michael View Post
        I used to run my kernel benchmarks on various systems daily with PTS+Phoromatic at LinuxBenchmarking.com but ultimately too expensive and no corporate support that I had to quit the effort.
        well you are the only tech website who really do important development work.
        i already have life-time premium and more...

        solar power plants did become very cheap per Kwh maybe you should start to build one for you to reduce the energy price of your operation. maybe some more people would fund such an improvement.
        Phantom circuit Sequence Reducer Dyslexia

        Comment


        • #14
          Commit
          [c722f35b513f807629603bbf24640b1a48be21b5] sched/fair: Bring back select_idle_smt(), but differently
          merged to stable 5.12.3.

          Comment


          • #15
            Originally posted by nrndda View Post
            Commit merged to stable 5.12.3.
            The commit message of c722f35b513f807629603bbf24640b1a48be21b5 spells out the 30% number in plain sight but also says that it reduced cache invalidation by considering the idle SMT sibling first, which results in 10% increase in workloads that actually count.

            Context switch microbenchmarks are not that useful by themselves.

            I would like to see MariaDB and PostgreSQL results before/after, i.e. between 5.12.2 vs 5.12.3 or 5.12.0 vs 5.13-rc1.

            Code:
            sched/fair: Bring back select_idle_smt(), but differently
            
            Mel Gorman did some nice work in 9fe1f127b913 ("sched/fair: Merge
            select_idle_core/cpu()"), resulting in the kernel being more efficient
            at finding an idle CPU, and in tasks spending less time waiting to be
            run, both according to the schedstats run_delay numbers, and according
            to measured application latencies. Yay.
            
            The flip side of this is that we see more task migrations (about 30%
            more), higher cache misses, higher memory bandwidth utilization, and
            higher CPU use, for the same number of requests/second.
            
            This is most pronounced on a memcache type workload, which saw a
            consistent 1-3% increase in total CPU use on the system, due to those
            increased task migrations leading to higher L2 cache miss numbers, and
            higher memory utilization. The exclusive L3 cache on Skylake does us
            no favors there.
            
            On our web serving workload, that effect is usually negligible.
            
            It appears that the increased number of CPU migrations is generally a
            good thing, since it leads to lower cpu_delay numbers, reflecting the
            fact that tasks get to run faster. However, the reduced locality and
            the corresponding increase in L2 cache misses hurts a little.
            
            The patch below appears to fix the regression, while keeping the
            benefit of the lower cpu_delay numbers, by reintroducing
            select_idle_smt with a twist: when a socket has no idle cores, check
            to see if the sibling of "prev" is idle, before searching all the
            other CPUs.
            
            This fixes both the occasional 9% regression on the web serving
            workload, and the continuous 2% CPU use regression on the memcache
            type workload.
            
            With Mel's patches and this patch together, task migrations are still
            high, but L2 cache misses, memory bandwidth, and CPU time used are
            back down to what they were before. The p95 and p99 response times for
            the memcache type application improve by about 10% over what they were
            before Mel's patches got merged.
            The quoted commit in the above commit message was part of 5.12.0:
            Code:
            commit 9fe1f127b913318c631d0041ecf71486e38c2c2d
            Author: Mel Gorman <[email protected]>
            Date: Wed Jan 27 13:52:03 2021 +0000
            
            sched/fair: Merge select_idle_core/cpu()
            
            Both select_idle_core() and select_idle_cpu() do a loop over the same
            cpumask. Observe that by clearing the already visited CPUs, we can
            fold the iteration and iterate a core at a time.
            
            All we need to do is remember any non-idle CPU we encountered while
            scanning for an idle core. This way we'll only iterate every CPU once.
            
            Signed-off-by: Mel Gorman <[email protected]>
            Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
            Signed-off-by: Ingo Molnar <[email protected]>
            Reviewed-by: Vincent Guittot <[email protected]>
            Link: https://lkml.kernel.org/r/[email protected]
            Last edited by zboszor; 12 May 2021, 05:32 AM.

            Comment


            • #16
              Originally posted by skeevy420 View Post
              Do you or anyone else know anything about some form of crowdsourced testing program? Like what digital coins or [email protected] do, but for the kernel and potentially other open source programs. Even if you did have a lot of resources it seems like a multitude of different tests on a wide variety of hardware and software configurations would help and that a crowdsourcing program would be a great way to achieve that.
              Just create another Proof of Work cryptocurrency, where the 'work' is testing for regressions in the Linux kernel.

              Comment


              • #17
                Originally posted by Teggs View Post

                Just create another Proof of Work cryptocurrency, where the 'work' is testing for regressions in the Linux kernel.
                Like I said, that isn't something I could hack together from Stack Overflow posts. I know my abilities and limitations.

                Comment


                • #18
                  Affects AMD too - https://twitter.com/phoronix/status/1392412076086341632
                  Michael Larabel
                  http://www.michaellarabel.com/

                  Comment


                  • #19
                    Originally posted by Michael View Post
                    I liked that, but, well, no sir I don't like that.

                    Comment

                    Working...
                    X