Announcement

Collapse
No announcement yet.

Linux 5.16's New Cluster Scheduling Is Causing Regression, Further Hurting Alder Lake

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by lethalwp View Post
    i wonder, since it s a desktop, should nt we just ignore E cores and go full perfs?
    i think these are a good idea on laptops, but a gimmik on desktops to artificially increase corecount, when intel cannot cope with amd cores?
    I wan't my desktop to consume as little power as possible since it is on basically 24/7.

    Comment


    • #12
      Originally posted by lethalwp View Post
      i wonder, since it s a desktop, should nt we just ignore E cores and go full perfs?
      i think these are a good idea on laptops, but a gimmik on desktops to artificially increase corecount, when intel cannot cope with amd cores?
      They are not gimmicky for Christ's sake. Their power efficiency is a lot higher than P-cores, which means they are an extremely good fit for heavy MT tasks. In fact it's been rumored that Intel wants to have more of them in Raptor Lake. And AMD has been rumored to have them in Zen 5.

      Of course if you don't care about MT performance you may not want them at all and Intel has got you covered, 12500, 12400 and other ADL CPUs won't have e-cores at all.
      Last edited by avem; 14 November 2021, 03:04 PM.

      Comment


      • #13
        Originally posted by F.Ultra View Post

        AFAIK all big.little schedulers are proprietary and the stock Linux scheduler works just as bad on those as it does on Alder lake since there AFAIK is no one that uses that arc outside of smart phones. At the moment Intel haven't released any info on how to talk to Thread Director so it's not possible to build support in Linux until they do.
        The support has been there for ages:

        Code:
        linux-5.15.2]# grep -R big.LITTLE .
        ./kernel/sched/rt.c:     * systems like big.LITTLE.
        ./drivers/perf/arm_pmu.c:         * configuration (e.g. big.LITTLE). This is not an uncore PMU,
        ./drivers/cpuidle/Kconfig.arm:    bool "Support for ARM big.LITTLE processors"
        ./drivers/cpuidle/Kconfig.arm:      Select this option to enable CPU idle driver for big.LITTLE based
        ./drivers/cpufreq/vexpress-spc-cpufreq.c:MODULE_DESCRIPTION("Vexpress SPC ARM big LITTLE cpufreq driver");
        ./drivers/cpufreq/Kconfig.arm:      big.LITTLE platforms using SPC for power management.
        ./arch/arm64/kernel/proton-pack.c: * It's not unlikely for different CPUs in a big.LITTLE system to fall into
        ./arch/arm64/kernel/proton-pack.c: * being stale when re-entering the kernel. The usual big.LITTLE caveats apply,
        ./arch/arm64/kernel/cpufeature.c:     * Even in big.LITTLE, processors should be identical instruction-set
        ./arch/arm/mm/Kconfig:      Some big.LITTLE systems have I-Cache line size mismatch between
        ./arch/arm/mach-vexpress/Kconfig:      on RTSM implementing big.LITTLE.
        ./arch/arm/mach-vexpress/Kconfig:      with a TC2 (A15x2 A7x3) big.LITTLE core tile.
        ./arch/arm/include/asm/topology.h:/* big.LITTLE switcher is incompatible with frequency invariance */
        ./arch/arm/common/bL_switcher_dummy_if.c:MODULE_DESCRIPTION("big.LITTLE switcher dummy user interface");
        ./arch/arm/common/bL_switcher.c: * arch/arm/common/bL_switcher.c -- big.LITTLE cluster switcher core driver
        ./arch/arm/common/bL_switcher.c:    pr_info("big.LITTLE switcher initializing\n");
        ./arch/arm/common/bL_switcher.c:    pr_info("big.LITTLE switcher initialized\n");
        ./arch/arm/common/bL_switcher.c:    pr_warn("big.LITTLE switcher initialization failed\n");
        ./arch/arm/Kconfig:      for (multi-)cluster based systems, such as big.LITTLE based
        ./arch/arm/Kconfig:    bool "big.LITTLE support (Experimental)"
        ./arch/arm/Kconfig:      This option enables support selections for the big.LITTLE
        ./arch/arm/Kconfig:    bool "big.LITTLE switcher support"
        ./arch/arm/Kconfig:      The big.LITTLE "switcher" provides the core functionality to
        ./arch/arm/Kconfig:      and a cluster of A7's in a big.LITTLE system.
        ./arch/arm/Kconfig:    tristate "Simple big.LITTLE switcher user interface"
        ./arch/arm/Kconfig:      the big.LITTLE switcher core code.  It is meant for
        ./Documentation/scheduler/sched-energy.rst:EAS operates only on heterogeneous CPU topologies (such as Arm big.LITTLE)
        ./Documentation/scheduler/sched-capacity.rst:Arm big.LITTLE systems are an example of both. The big CPUs are more
        ./Documentation/scheduler/sched-capacity.rst:To draw the parallel with Arm big.LITTLE, CPU0 would be a big while CPU1 would
        ./Documentation/devicetree/bindings/arm/cpu-capacity.txt:(e.g., ARM big.LITTLE systems) or maximum frequency at which CPUs can run
        ./Documentation/devicetree/bindings/arm/arm,vexpress-juno.yaml:          CPU cores and 3 Cortex A7 cores in a big.LITTLE MPCore configuration
        ./Documentation/devicetree/bindings/arm/arm,vexpress-juno.yaml:          V2M-Juno) was introduced as a vehicle for evaluating big.LITTLE on
        ./Documentation/devicetree/bindings/arm/arm,vexpress-juno.yaml:          cores in a big.LITTLE configuration. It also features the MALI T624
        ./Documentation/arm64/asymmetric-32bit.rst:Some Armv9 SoCs suffer from a big.LITTLE misfeature where only a subset
        ./Documentation/arm/vlocks.rst:use in ARM-based big.LITTLE platforms, with review and input gratefully

        Comment


        • #14
          Originally posted by lethalwp View Post
          i wonder, since it s a desktop, should nt we just ignore E cores and go full perfs?
          i think these are a good idea on laptops, but a gimmik on desktops to artificially increase corecount, when intel cannot cope with amd cores?
          As said above, they are indeed better for heavy MT workloads.

          Comment


          • #15
            Originally posted by avem View Post
            Lastly, Linux must not suck as well, as we've had big-little ARM cores for ages now and Linux supports them perfectly. Probably there are some systemctl or/and boot variables which can let the kernel know which cores are which to work properly - strangely no one has researched this.
            On the ARM side, you have to describe the hardware through a device tree file in the kernel. Each CPU core is listed as part of the tree, and it can be defined how fast it is. The linux kernel automatically takes care of everything past there.

            For x86, I think the same sort of device tree is built internally, but it's done by probing the hardware at boot since it doesn't have predefined systems the way that ARM does. Presumably the whole big.little stuff isn't supported there yet, but I would assume it would be fairly easy to port over.
            Last edited by smitty3268; 14 November 2021, 03:58 PM.

            Comment


            • #16
              Originally posted by avem View Post

              The support has been there for ages:

              Code:
              linux-5.15.2]# grep -R big.LITTLE .
              ./kernel/sched/rt.c: * systems like big.LITTLE.
              ./drivers/perf/arm_pmu.c: * configuration (e.g. big.LITTLE). This is not an uncore PMU,
              ./drivers/cpuidle/Kconfig.arm: bool "Support for ARM big.LITTLE processors"
              ./drivers/cpuidle/Kconfig.arm: Select this option to enable CPU idle driver for big.LITTLE based
              ./drivers/cpufreq/vexpress-spc-cpufreq.c:MODULE_DESCRIPTION("Vexpress SPC ARM big LITTLE cpufreq driver");
              ./drivers/cpufreq/Kconfig.arm: big.LITTLE platforms using SPC for power management.
              ./arch/arm64/kernel/proton-pack.c: * It's not unlikely for different CPUs in a big.LITTLE system to fall into
              ./arch/arm64/kernel/proton-pack.c: * being stale when re-entering the kernel. The usual big.LITTLE caveats apply,
              ./arch/arm64/kernel/cpufeature.c: * Even in big.LITTLE, processors should be identical instruction-set
              ./arch/arm/mm/Kconfig: Some big.LITTLE systems have I-Cache line size mismatch between
              ./arch/arm/mach-vexpress/Kconfig: on RTSM implementing big.LITTLE.
              ./arch/arm/mach-vexpress/Kconfig: with a TC2 (A15x2 A7x3) big.LITTLE core tile.
              ./arch/arm/include/asm/topology.h:/* big.LITTLE switcher is incompatible with frequency invariance */
              ./arch/arm/common/bL_switcher_dummy_if.c:MODULE_DESCRIPTION("big.LITTLE switcher dummy user interface");
              ./arch/arm/common/bL_switcher.c: * arch/arm/common/bL_switcher.c -- big.LITTLE cluster switcher core driver
              ./arch/arm/common/bL_switcher.c: pr_info("big.LITTLE switcher initializing\n");
              ./arch/arm/common/bL_switcher.c: pr_info("big.LITTLE switcher initialized\n");
              ./arch/arm/common/bL_switcher.c: pr_warn("big.LITTLE switcher initialization failed\n");
              ./arch/arm/Kconfig: for (multi-)cluster based systems, such as big.LITTLE based
              ./arch/arm/Kconfig: bool "big.LITTLE support (Experimental)"
              ./arch/arm/Kconfig: This option enables support selections for the big.LITTLE
              ./arch/arm/Kconfig: bool "big.LITTLE switcher support"
              ./arch/arm/Kconfig: The big.LITTLE "switcher" provides the core functionality to
              ./arch/arm/Kconfig: and a cluster of A7's in a big.LITTLE system.
              ./arch/arm/Kconfig: tristate "Simple big.LITTLE switcher user interface"
              ./arch/arm/Kconfig: the big.LITTLE switcher core code. It is meant for
              ./Documentation/scheduler/sched-energy.rst:EAS operates only on heterogeneous CPU topologies (such as Arm big.LITTLE)
              ./Documentation/scheduler/sched-capacity.rst:Arm big.LITTLE systems are an example of both. The big CPUs are more
              ./Documentation/scheduler/sched-capacity.rst:To draw the parallel with Arm big.LITTLE, CPU0 would be a big while CPU1 would
              ./Documentation/devicetree/bindings/arm/cpu-capacity.txt:(e.g., ARM big.LITTLE systems) or maximum frequency at which CPUs can run
              ./Documentation/devicetree/bindings/arm/arm,vexpress-juno.yaml: CPU cores and 3 Cortex A7 cores in a big.LITTLE MPCore configuration
              ./Documentation/devicetree/bindings/arm/arm,vexpress-juno.yaml: V2M-Juno) was introduced as a vehicle for evaluating big.LITTLE on
              ./Documentation/devicetree/bindings/arm/arm,vexpress-juno.yaml: cores in a big.LITTLE configuration. It also features the MALI T624
              ./Documentation/arm64/asymmetric-32bit.rst:Some Armv9 SoCs suffer from a big.LITTLE misfeature where only a subset
              ./Documentation/arm/vlocks.rst:use in ARM-based big.LITTLE platforms, with review and input gratefully
              support for the arch yes, but that does not mean that the scheduler does things right. You can e.g configure big.LITTLE in a way that low frequency from the governor means run on little and high freq from the governor means run on big so the hw does automatic switching, I'm not sure that there are support for running on both types of cores at the same time with correct scheduling yet.

              Comment


              • #17
                Originally posted by avem View Post

                You still can use taskset and avoid any growing pains. It's currently the recommended solution for Windows 10 (processlasso and the likes) as well but it's not been so bad in terms of utilizing the full performance of ADL in the first place. It's the first time I see Linux demonstrating such a poor performance.

                Lastly, Linux must not suck as well, as we've had big-little ARM cores for ages now and Linux supports them perfectly. Probably there are some systemctl or/and boot variables which can let the kernel know which cores are which to work properly - strangely no one has researched this.
                As I've tried to point out a couple of times, the answer here is utilization clamping (uclamp). The tool is "uclampset". It's also useful for use with schedutil to give a hint not to ramp frequency for a task, or conversely to request a frequency boost.

                https://lwn.net/Articles/762043/

                Comment


                • #18
                  Originally posted by lethalwp View Post
                  i wonder, since it s a desktop, should nt we just ignore E cores and go full perfs?
                  i think these are a good idea on laptops, but a gimmik on desktops to artificially increase corecount, when intel cannot cope with amd cores?
                  What we want is for these early pains to be resolved, and then that E-core count to be scaled up indefinitely.

                  Raptor Lake will have 16 small cores, a successor Lake will have 32 small cores. As long as the number of large cores (8 of them for now, maybe 10-16 in a few years) is sufficient to handle all of the single-thread/IPC-sensitive tasks that will be running, the best way to improve multi-threaded performance is to add more small cores. Meaning dozens, hundreds, or even thousands of them (server).

                  Comment


                  • #19
                    Originally posted by avem View Post
                    Lastly, Linux must not suck as well, as we've had big-little ARM cores for ages now and Linux supports them perfectly. Probably there are some systemctl or/and boot variables which can let the kernel know which cores are which to work properly - strangely no one has researched this.
                    Afaik, big.LITTLE is an either/or affair (i.e. you can use the big part or the LITTLE part of a core, not both at the same time). Alder Lake is different, the E cores run alongside P cores. Scheduling logic will be quite different.

                    Comment


                    • #20
                      Originally posted by bug77 View Post
                      Afaik, big.LITTLE is an either/or affair (i.e. you can use the big part or the LITTLE part of a core, not both at the same time). Alder Lake is different, the E cores run alongside P cores. Scheduling logic will be quite different.
                      It's exactly the same. ARM is even more complicated because they often use a big.medium.little approach, while Intel only has big and medium cores.

                      Comment

                      Working...
                      X