Announcement

Collapse
No announcement yet.

CPU Cluster Scheduler Continues To Be Worked On For Linux With Promising Results

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CPU Cluster Scheduler Continues To Be Worked On For Linux With Promising Results

    Phoronix: CPU Cluster Scheduler Continues To Be Worked On For Linux With Promising Results

    HiSilicon engineers continue working on a cluster scheduler that could help the performance of certain x86 and ARM platforms on Linux...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    As my Haswell-EP 12-Core Xeon 2678V3 also supports a Cluster-on-Die mode (CoD-mode) and that work might improve the scheduling quite a bit!

    The thing is, that the core layout is not optimal, in CoD-mode eight cores, eight L3 slices. one memory controller, the QPI interface, and the PCIe controller are connected to one bi-directional ring. The remaining four cores, L3 slices, and the second memory controller are connected to another bi-directional ring. Both rings are connected via two bi-directional queues. When put into CoD-mode the clusters contain an equal number of cores and are exposed to the operating system as two NUMA nodes. However, the software view on the NUMA topology actually does not match the hardware configuration. The 12 core chip (8-core + 4-core ring) presents itself as two 6 core nodes. This has massive performance implications as the cache latencies are better if work items are kept on each cluster.

    If anyone is interested in more details, there is a university paper around this topic with benchmarks and core layout graphs: https://tu-dresden.de/zih/forschung/...on.pdf?lang=de

    Comment


    • #3
      AMD Epyc have also 4 quadrants with 1 or 2 blocks of L3 cache shared among the cores. I suspect it will help there as well.

      Comment


      • #4
        Originally posted by kieffer View Post
        AMD Epyc have also 4 quadrants with 1 or 2 blocks of L3 cache shared among the cores. I suspect it will help there as well.
        Interestingly Microsoft configures their EPYCs in Azure to NPS2 instead of NPS4 with Cache as NUMA disabled. Which is not what AMD advises (page 25).

        Comment


        • #5
          I'm honestly surprised these sorts of improvements haven't been pursued years ago. I could imagine the scheduling code is pretty hairy, by this point.

          BTW, I was thrown off, by the title. I'd refer to this as "core cluster-aware scheduling".

          Comment

          Working...
          X