Announcement

Collapse
No announcement yet.

Linux 2.6.38 Kernel Multi-Core Scaling

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by V!NCENT View Post
    Errrr.... http://en.wikipedia.org/wiki/Giant_lock

    Kernel lock: Kernel locks all threads, except one. So only one thread at a time. Removing this kernel lock means that threads still needs to lock, but there is not a total serial thread management going on. Now if you have a single core than no matter what you might hack together; only one process is done at a time anyway.
    David is right, the kernel has been progressively removing macro-locks over the last number of years. A few years ago, I know that SGI was looking at the BKL being taken over during ioctls on their multi-pipe GPU systems.

    As David has said, different subsystems now have a broad collection of finer grained locking around the kernel calls being made to those subsystems. Removing the BKL will only affect some types of workloads. The workloads that may be affected would absolutely need to be multi-threaded (which further reduces the likelihood of seeing a benefit).

    As Michael shows in the benchmark results in this article, the impact to the CPU centric benchmarks is virtually nothing between these kernels.

    Comment


    • #17
      Originally posted by V!NCENT View Post
      Now if that's not the case then Linux realy sucks balls at scaling...
      Interestingly, it looks like the CPU topology allows Linux to scale better with HyperThreading. PC-BSD (FreeBSD) and Illumos (OpenSolaris) both consistently had decreases going from 6 cores to 6 cores + HyperThreading.

      But a broad statement about scaling needs to have a context to get some meat. What workloads are you talking about scaling?

      Comment


      • #18
        Originally posted by V!NCENT View Post
        Errrr.... http://en.wikipedia.org/wiki/Giant_lock

        Kernel lock: Kernel locks all threads, except one. So only one thread at a time. Removing this kernel lock means that threads still needs to lock, but there is not a total serial thread management going on. Now if you have a single core than no matter what you might hack together; only one process is done at a time anyway.

        Now onto multiple cores; multiple threads at once.

        Seems like a very simple conclusion to me?

        Now if that's not the case then Linux realy sucks balls at scaling...
        Linux hasn't had a single giant lock in a long long time, its had fine grained locking since 2.2, an the BKL was only taken in a few places, though some of them were bad, they've been removed over the last few years. Like all ioctls used to take the BKL, and that was slowly fixed. The GPU drivers were one of the areas that lagged behind, but since we didn't really have much userspace parallelism going on it wasn't that noticable.

        Dave.

        Comment


        • #19
          Originally posted by mtippett View Post
          But a broad statement about scaling needs to have a context to get some meat. What workloads are you talking about scaling?
          Well more CPU's -> more compute power. Having multiple processes to schedule should result in:
          (total_amount_of_processes + kernel_resource_per_different_kind_of_syscal) / (total_amount_of_cores +- (0.25 * extra_threads_per_core) = good scaling.

          0.25 means 25% efficiency with crap like HT and that's generous...

          Comment


          • #20
            Originally posted by airlied View Post
            Linux hasn't had a single giant lock in a long long time, its had fine grained locking since 2.2, an the BKL was only taken in a few places, though some of them were bad, they've been removed over the last few years. Like all ioctls used to take the BKL, and that was slowly fixed. The GPU drivers were one of the areas that lagged behind, but since we didn't really have much userspace parallelism going on it wasn't that noticable.

            Dave.
            It seems like ReiserFS still issues BKLs.

            Comment


            • #21
              Originally posted by V!NCENT View Post
              It seems like ReiserFS still issues BKLs.
              Nobody uses it in some serious computing I guess.

              Btw. https://bbs.archlinux.org/viewtopic.php?id=102119&p=1 #25

              2.6.35 with Nick Piggins patches shows 22% improvement on 4 cores machine in sysbench.

              Comment


              • #22
                From looking at those benchmarks i'd say linux kernel scales very well. It's the tools that are used that can't scale too well. Look at c-ray and pts, they scale very well. I guess it all depends on how much work/locks the tool has to do. More complicated scale worse but it is not the kernel that is the bottleneck here.

                Comment


                • #23
                  Is there no way to benchmark the kernel by running all the benchmarks at once?

                  Comment


                  • #24
                    This benchmarks shows nothing. They are not testing kernel at all. And what do you mean by "4 core", "6 cores", etc. Was they disabled in BIOS somehow? I also assume when you enable/disable them in BIOS, you also changed options for all of above benchmarks to use more threads, right?

                    Comment


                    • #25
                      I would be interested to see if transparent hugepage support on 2.6.38 makes a difference in multicore performance. It should reduce bus traffic and thus contention for the bus upping performance.

                      Comment


                      • #26
                        Originally posted by baryluk View Post
                        This benchmarks shows nothing. They are not testing kernel at all. And what do you mean by "4 core", "6 cores", etc. Was they disabled in BIOS somehow? I also assume when you enable/disable them in BIOS, you also changed options for all of above benchmarks to use more threads, right?
                        Could you explain this further?

                        The 1/2/4/6/6+HT are configured in the BIOS, although I would expect could be hot-unplugged in the kernel as well.

                        There are a set of benchmarks that are single threaded and ones that are multi-threaded. I believe that all of the multi-threaded ones are configured relative to the number of (either explicitly or implicitly within the code).

                        Note that the numbers are normalized. So you can see that there is reasonable scalability (somewhere between 60-90% of the number CPU cores). For the highly parallelized benchmarks, it's nearly 100%. HyperThreading doesn't give the same gain because the cores are already maxed out and so there isn't as much of a gain by having a second thread on a new core.

                        If you contrast this Linux on Linux comparison against the Linux on other *nix at http://www.phoronix.com/scan.php?pag...lti_os_scaling, you will see that Linux does scale reasonable well compared to the other OSes. However, the delta between .35 and .38 kernels was minimal relative to scalability.

                        Note that the link at the bottom of the article also points out the absolute performance, which realistically show minimal change between kernels.

                        Comment


                        • #27
                          Originally posted by mtippett View Post
                          Could you explain this further?
                          Oh. For some reason i not seen "Timed PHP compilation", this actually shows something - scalability of filesystem and VFS layer. Indeed we see here big differences. Rest test just CPU scheduler, which essnetially is scalable very well by design even in most simple implementation of it. And particulary in this benchmarks where everything just computes something using one thread per core - in such case scheduler essentially assign them once, and do almost nothing. And in this cases sublinear scalability is only effect of application side, not kernel - you cannot parallelize everything to have linear scalability.

                          For real scalability benchmark you need to test more than just cpu intensive applications. It is more about putting lots of threads, using lots of network connections, forwarded packets, lots of opened files by multiple processes and threads, or single file by multiple thread and processes, and mix of them on high load (with numer of threads much more than number of cores), etc. Somewhere when we have pottential for some problem in resource sharing. In this test we do not have any resource sharing at kernel level - each thread is using different core, and shares nothing which will prevent it from running at full speed (from kernel perspective. in userland it will still have some mutexes and barriers - which should be minimized by properly designed parallel program).

                          Comment

                          Working...
                          X