No announcement yet.

Linux Schedutil Governor's Quirky Behavior Persists In 2023

  • Filter
  • Time
  • Show
Clear All
new posts

  • Linux Schedutil Governor's Quirky Behavior Persists In 2023

    Phoronix: Linux Schedutil Governor's Quirky Behavior Persists In 2023

    Earlier this week I posted benchmarks looking at how the AMD Ryzen Threadripper 3990X performance has evolved in the three years to the day since that 64-core / 128-thread HEDT chip launched. While overall the Threadripper 3990X performance has evolved nicely under Linux since 2020, when it came to the video encoding tests in particular they performed worse overall. As I had raised in that earlier article and now elaborated with some follow-up tests, that regression is driven by the default "schedutil" frequency scaling governor used by default.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Ah THX Michael, yes that looks much more like it. The rest probably comes down to some new mitigations.

    Linuxxx rant coming in 3, 2, 1, ...


    • #3
      Michael remember this as the simple days of phoronix benchmarking...
      Upcoming it will be a complicated battle: intel xeon max with 64GB on cpu memory vs xeon(plat/gold/silver/bronze) vs xeon using amx accelerator vs AMD vs AMD using xilinix in cpu accelerators.


      • #4
        I hope that the combination of AMD-Pstate in "guided" mode and schedutil will remove this performance discrepancy. Preliminary numbers in that regard are looking good so far.


        • #5
          Schedutil seems so great on paper. I haven't yet seen somewhere that it is number 1 in performance, power usage, or power efficiency. Maybe it wins in certain hardware environments. IIRC, Android uses it as the default scheduler. I trust Google and Android devs have tested it against other options and prefer it for good reason:


          • #6
            I've been running performance only since a very long time ago.
            Wonky frequency scaling has affected me everywhere, regardless of brand/cpu generation.
            It's been a constant fiddling and never perfect with a lot of caveats and whatnots.

            So I decided the best was max and back to cpu hardware idle asap.
            Found very little power difference between idling in performance and idling in whatever else.


            • #7

              You would think schedutil simply isn't properly clocking up the cores, but power draw is higher as well! What could possibly be going wrong here?

              Maybe its moving threads around when it shouldn't. Those video encoders are lightly threaded enough to fit onto a single compute die (maybe even a single ccx), so perhaps `performance` is keeping it on a single die (making maximum use of cache, speeding up cross thread communication and letting the other dies stay powered down), while schedutil is shuffling stuff around.

              This would not be surprising, since schedutil was initially designed for monolithic CPUs.

              Michael, you might want to try av1an in the same test setup. That will fully saturate all the cores, meaning the task energy might not regress so much. Another thing you could try is pinning the benchmark to 8 cores/threads on a single compute die.
              Last edited by brucethemoose; 09 February 2023, 02:02 PM.


              • #8

                Originally posted by phoronix View Post
                acpi-cpufrwq schedutil (Default)


                • #9
                  Originally posted by Mitch View Post
                  IIRC, Android uses it as the default scheduler. I trust Google and Android devs have tested it against other options and prefer it for good reason:
                  Good link, it contains the possible explanation:
                  For example, in the scenario shown below, schedutil sees that RenderThread only requires 50% of a CPU's capacity, so it sets the CPU frequency to 50% of the maximum. But RenderThread cannot run until the UI thread has done its work — the two tasks cannot run in parallel — so it misses its deadline.

                  Android currently implements a workaround called "TouchBoost" to deal with this misbehavior. When the user interacts with the device, TouchBoost sets the minimum frequency that the governor can choose to a higher value for a given amount of time. This brute-force solution successfully provides the resources required by the display pipeline when the user interacts with the device. The disadvantage of this approach is that, when the display pipeline has a low workload, the minimum frequency forced by TouchBoost may be much higher than the demand of the pipeline. This possible overprovisioning of the frequency results in some waste of energy that gives no improvement to the user experience and should be limited, or possibly eliminated.
                  We would need a mechanism to boost frequency on uneven multi thread loads.


                  • #10
                    maybe one day lol