Announcement

Collapse
No announcement yet.

BLD Kernel Scheduler Updated For Linux 3.19

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BLD Kernel Scheduler Updated For Linux 3.19

    Phoronix: BLD Kernel Scheduler Updated For Linux 3.19

    The Barbershop Load Distribution (BLD) CPU load distribution technique has been updated for the mainline Linux 3.18 kernel...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    If anyone wonders what it does...

    it only tries to distribute the load properly by tracking lowest and highest loaded rq of the system. This technique never tries to balance the system load at idle context, which is done by the current scheduler. The motivation behind this technique is to distribute load properly amonst the CPUs in a way as if load balancer isn't needed and to make load distribution easier.

    Comment


    • #3
      BLD is still plagued by an issue where using it on select systems can cause significantly reduced throughput due to L2 cache misses and no L3 cache is available.
      What does this mean? L2 and L3 cache means the CPU cache? So if I have cpu with L2 and L3 cache should I use BLD or not?

      Maybe phoronix will made some test of the BLD on regular desktop? Would be fine to compare it to BFS and CFS.

      Comment


      • #4
        Originally posted by dragonn View Post
        What does this mean? L2 and L3 cache means the CPU cache? So if I have cpu with L2 and L3 cache should I use BLD or not?

        Maybe phoronix will made some test of the BLD on regular desktop? Would be fine to compare it to BFS and CFS.
        It means what it says, that it could be improved.
        The other comment highlighted main idea, it tries to distribute everything across cores which obviously results in less power-efficient kernel (e.g. probably not suitable for laptops).

        Offtopic, is this the same guy who came up with 100line-ish patch that grouped scheduler priorities by TTY (terminal) they were spawned from right?

        Comment


        • #5
          Originally posted by dragonn View Post
          What does this mean? L2 and L3 cache means the CPU cache? So if I have cpu with L2 and L3 cache should I use BLD or not?

          Maybe phoronix will made some test of the BLD on regular desktop? Would be fine to compare it to BFS and CFS.
          A cache miss occurs when the CPU checks to see if the data it needs is located in the L2 cache. If it isn't there, it needs to grab it from L3 or main memory.

          This would make sense. If you are trying to load balance, you are going to constantly be moving threads around to different CPU cores. As a result, at any point in time, the odds that a CPU's local L2 cache (assuming the L2 isn't global for that CPU architecture at least) having the data that threads needs will decrease as the number of CPU cores increases. As a result, absolute performance would likely DROP as you added more CPU cores, due to a higher rate of cache misses.

          Seriously, there is no technical reason you need to load balance CPU cores. As long as none of them are individually bottlenecking, you are fine.

          Comment


          • #6
            Originally posted by tpruzina View Post
            It means what it says, that it could be improved.
            The other comment highlighted main idea, it tries to distribute everything across cores which obviously results in less power-efficient kernel (e.g. probably not suitable for laptops).

            Offtopic, is this the same guy who came up with 100line-ish patch that grouped scheduler priorities by TTY (terminal) they were spawned from right?
            Thanks, but it could be good for smartphones, I think running more cores on a lower frequency could be more power efficient then less cores on high frequency (the core voltage goes constants up when the frequency goes up too).

            Comment


            • #7
              Originally posted by dragonn View Post
              Thanks, but it could be good for smartphones, I think running more cores on a lower frequency could be more power efficient then less cores on high frequency (the core voltage goes constants up when the frequency goes up too).
              There's a lot of factors at work here. For one, some processors will perform much better performance/watt at a higher voltage while some at a lower voltage.

              Also it depends on how well something is multi-threaded and if there's any other tasks that are being done at the same time. Sometimes running devices like Android phones at a lower stock clock actually decreases battery because the things don't finish a task and the workload gets built up. (having to play catch-up too much) Lets say full load is 5 watts, minimum workload is 1 watt and idle is like 0.1 watts. At 1 watt the phone can do xyz workload in 40 seconds. At 2 watts 15 seconds, 3 watts 9 seconds, 4 watts 5 seconds, 5 watts 3 seconds.

              so

              0.1 * 40 = 4 watts
              1 * 40 = 40 watts
              2 * 15 = 30 watts + 2.5 watts idle
              3 * 9 = 27 watts + 3.1 watts idle
              4 * 5 = 20 watts + 3.5 watts idle
              5 * 3 = 15 watts + 3.7 watts idle

              Now of course I'm pulling numbers out of my butt, but this should probably explain a few things. The problem with x86 in phones for the most part has been that they'll do things really fast, but the time idling is higher and they're also the worst at power draw while idling and they have to pull a certain amount of power just to perform simple tasks, leading to overall poor battery life and adoption.

              Comment


              • #8
                love me, feed me, never leave me.

                I would really like to see how it performs on Cavium Thunder-X architecture (MIPS) as it's well suited with one shared L2 cache between all cores and bigger L1 cache.

                Comment

                Working...
                X