Announcement

Collapse
No announcement yet.

GCC's Potential GSoC Projects Include Better Parallelizing The Compiler

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC's Potential GSoC Projects Include Better Parallelizing The Compiler

    Phoronix: GCC's Potential GSoC Projects Include Better Parallelizing The Compiler

    While in some areas it's still an extremely cold winter, many open-source projects are already preparing for their participation in Google's annual Summer of Code initiative. The GNU Compiler Collection (GCC) crew that always tends to see at least a few slots for interested student developers has begun formulating some potential project ideas...

    http://www.phoronix.com/scan.php?pag...al-Parallelize

  • #2
    So this is about parallelizing compilation of a single TU? How would this interact with parallel build systems?

    Comment


    • #3
      This speedup will always be sublinear, while compiling multiple files in parallel can scale linearly (assuming no bottlenecks like IO/shared Caches/Memory).
      The related cleanups are still useful, and might help if there are plans for a clangd like compile-server.

      Comment


      • #4
        Originally posted by AsuMagic View Post
        So this is about parallelizing compilation of a single TU?
        Yes (as far as I know).

        Originally posted by AsuMagic View Post
        How would this interact with parallel build systems?
        An option is that a proper and efficient interaction of multi-threaded executables with parallel build systems would require for the Linux kernel to have a global task scheduler to for example manage all C++ std::async tasks created in all running processes. This would require for the Linux kernel to be able to handle sub-thread granularity work items similar to fibers (i.e: a single process can post multiple fibers to the operating system which are automatically multiplexed onto the available CPU cores). The number of Linux threads existing at a single time in the system would be equal to the number of CPU cores, while without OS-level support for fibers the max number of threads for e.g. "make -j12" is the much higher 12*(number of GCC threads) on a 12-thread CPU (e.g: Ryzen 5 1600), which would most likely peak at 12*12=144. Also, the peak memory consumption of 1000s fibers multiplexed onto 12 CPU cores is potentially much lower than the peak memory consumption of 144 threads because max 12 fibers are running at a single time and 1000-12=988 fibers haven't been started yet (assuming that a fiber which hasn't been started has lower memory consumption than a running fiber, which might not be achievable in some cases).

        https://en.cppreference.com/w/cpp/thread/async
        https://en.wikipedia.org/wiki/Fiber_(computer_science)

        Comment


        • #5
          Originally posted by discordian View Post
          This speedup will always be sublinear, while compiling multiple files in parallel can scale linearly (assuming no bottlenecks like IO/shared Caches/Memory).
          The related cleanups are still useful, and might help if there are plans for a clangd like compile-server.
          What about the case when multiple C/C++ files being compiled in parallel #include a common include file, such as <QString>? In such a case, parsing <QString> a single time instead of 12 times on a 12-core CPU is in the total elapsed wallclock time of the parallel build faster than parsing <QString> 12 times in parallel, because the 12-1=11 processes could be doing something else while <QString> is being parsed.
          Last edited by atomsymbol; 02-05-2019, 09:59 AM. Reason: Fix typo

          Comment


          • #6
            Originally posted by phoronix View Post
            Phoronix: GCC's Potential GSoC Projects Include Better Parallelizing The Compiler

            While in some areas it's still an extremely cold winter, many open-source projects are already preparing for their participation in Google's annual Summer of Code initiative. The GNU Compiler Collection (GCC) crew that always tends to see at least a few slots for interested student developers has begun formulating some potential project ideas...

            http://www.phoronix.com/scan.php?pag...al-Parallelize
            Multi-threading a single GCC process is a very good idea. It would speed up C++ development in my case because I am often waiting for a single C++ file to get compiled.

            Comment


            • #7
              Originally posted by atomsymbol View Post

              What about the case when multiple C/C++ files being compiled in parallel #include a common include file, such as <QString>? In such a case, parsing <QString> a single time instead of 12 times on a 12-core CPU is in the total elapsed wallclock time of the parallel build faster than parsing <QString> 12 times in parallel, because the 12-1=11 processes could be doing something else while <QString> is being parsed.
              I would say that the parsing of include files dwarfs in comparison with the actual compilation of the source file, so much that optimizing that would probably not save much real runtime.

              Comment


              • #8
                Originally posted by discordian View Post
                This speedup will always be sublinear, while compiling multiple files in parallel can scale linearly (assuming no bottlenecks like IO/shared Caches/Memory).
                The related cleanups are still useful, and might help if there are plans for a clangd like compile-server.
                How does the clangd compile-server compare with the age old ccache? Or can they not be compared at all?

                Comment


                • #9
                  Originally posted by atomsymbol View Post

                  What about the case when multiple C/C++ files being compiled in parallel #include a common include file, such as <QString>? In such a case, parsing <QString> a single time instead of 12 times on a 12-core CPU is in the total elapsed wallclock time of the parallel build faster than parsing <QString> 12 times in parallel, because the 12-1=11 processes could be doing something else while <QString> is being parsed.
                  Thats not happening, buildsystems invoke gcc once per file. Threading up the compilation of a single file will take more cummulative time than just running over it serially.

                  Originally posted by F.Ultra View Post

                  How does the clangd compile-server compare with the age old ccache? Or can they not be compared at all?
                  Not at all, clangd is currently a server for code completion in editors and the like. For gcc to be able to be turned into a server you will need to rip out global state (like its part of the GSOC Task as far as i understand).

                  ccache runs per file, acompile-server could do much more, like atomsymbol implied. the buildsystem would just queue up alot of stuff to build, a single server would then be able to sort common includes, template instantiations and code snippets for the whole project (and even previous runs). Then intelligently cache common constructs at various levels (files, preprocessed and preoptimized code, classes, ...).

                  Comment


                  • #10
                    Originally posted by discordian View Post
                    This speedup will always be sublinear, while compiling multiple files in parallel can scale linearly (assuming no bottlenecks like IO/shared Caches/Memory).
                    This depends. One case I recently hit is I need to build a binary with a bunch of header only libraries.
                    "make -j" can finish other parts quickly, but then I need to wait for ~1 min for this single binary.

                    Comment

                    Working...
                    X