Announcement

Collapse
No announcement yet.

Intel Making Cluster Scheduling Configurable, Disabled For Alder Lake Hybrid CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel Making Cluster Scheduling Configurable, Disabled For Alder Lake Hybrid CPUs

    Phoronix: Intel Making Cluster Scheduling Configurable, Disabled For Alder Lake Hybrid CPUs

    Added to the in-development Linux 5.16 kernel was cluster-aware scheduling designed to enhance system performance where groups of CPU cores may share caches or similar and thus the scheduler could benefit from knowing that information for making more optimal task placement. But as I pointed out early on with Linux 5.16, this cluster scheduling is hurting the Intel Alder Lake performance on the new kernel. Intel is now working to correct this by making the cluster scheduling configurable and disabling this functionality by default for hybrid CPUs such as Alder Lake...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Any idea if AMD have looked into utilising this for their CCD/CCX's?

    Comment


    • #3
      Originally posted by FireBurn View Post
      Any idea if AMD have looked into utilising this for their CCD/CCX's?
      I have little doubt that most manufacturers have not investigated how to best schedule their CPUs (Some of the classic NUMA designs are the poster child examples, but newer architectures have similar issues), for it makes their systems look and work better. Historically Intel and IBM have had the largest teams looking at the issues for their processors, while big.LITTLE in arm has introduced new opportunities. AMD, while apparently hiring in the right teams, has not yet shown they are in the same league as those early vendors (it should be noted that "optimal" scheduling is still considered a very hard problem (it is typically difficult to know the future), and heuristics are still sometimes just a WAG). The good (?????(*)) news is that some hyperscalers care enough to have invested significant resources from their teams to get the most they can from their systems. We can hope the results will be positive to most.

      (*) It is good that it means more (often very smart) eyes on the problem. It is potentially bad news in that the priority is for a class of use that may not be generally applicable to anyone who is not a hyperscaler, although sometimes the dregs have value too.
      Last edited by CommunityMember; 04 December 2021, 08:25 PM.

      Comment


      • #4
        Originally posted by FireBurn View Post
        Any idea if AMD have looked into utilising this for their CCD/CCX's?
        I don't think it's actually needed for Zen. Only L3 is of specific nature on larger desktop and many server CPUs, being possibly split between CCDs.
        L1 and L2 are still per core, and L3 access is already high enough in latency to disregard relatively small additional delay from accessing 'foreign' L3 slice.
        So scheduling tasks preferably to the same core normally does a good job there, no need to diverge. Windows 11 poor experience clearly shows this as well.
        Last edited by Alex/AT; 05 December 2021, 02:57 AM.

        Comment


        • #5
          Originally posted by Alex/AT View Post
          I don't think it's actually needed for Zen. Only L3 is of specific nature on larger desktop and many server CPUs, being possibly split between CCDs.
          L1 and L2 are still per core, and L3 access is already high enough in latency to disregard relatively small additional delay from accessing 'foreign' L3 slice.
          So scheduling tasks preferably to the same core normally does a good job there, no need to diverge. Windows 11 poor experience clearly shows this as well.
          Home slice vs foreign slice latency is only half the problem. The other half is that if you schedule two unrelated threads on the same CCX, they will create cache pressure for each other.

          Looking at comparison tests between the Ryzen 3100 (8+8 MiB L3) and the the 3300X (16+0), I expect it could make a significant difference.

          Comment

          Working...
          X