Announcement

Collapse
No announcement yet.

GCC "-fparallel-jobs" Sent Out For Compiling Individual Files In Parallel - Up To ~1.9x Speedup

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC "-fparallel-jobs" Sent Out For Compiling Individual Files In Parallel - Up To ~1.9x Speedup

    Phoronix: GCC "-fparallel-jobs" Sent Out For Compiling Individual Files In Parallel - Up To ~1.9x Speedup

    For the past two summers student developer Giuliano Belinassi has been working under Google Summer of Code in working to address GCC parallelization bottlenecks and ultimately a goal of allowing single source files to be split up for compilation in parallel by GCC. In particular, being able to split the compilation of large source files across multiple CPU cores. The latest patches on this "-fparallel-jobs=" was sent out today as we approach the end of GSoC 2020...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    I'm trying to think of the practical impact of this. Compiling large software like the Linux kernel won't benefit from this as they're already using all of the CPU for different programs. Maybe for a recompile after making a change to a single file?

    Is there a type of coding style that generates huge C/C++ files such that this kind of speedup is a meaningful impact to their users workflow?

    Or is this not for the 'compile' part but just for the LTO bit at the end--after all the .o files are generated and they're combined into the final binary?

    Comment


    • #3
      Speedup of x0.95 ist then a slowdown I guess. So in some cases it is slower ?

      Comment


      • #4
        Originally posted by willmore View Post
        I'm trying to think of the practical impact of this.
        I'm testing once in a while different compiler flags on Kernel (among other software packages). So I recompile the Kernel quite often in such cases. I like it.

        Comment


        • #5
          Originally posted by CochainComplex View Post

          I'm testing once in a while different compiler flags on Kernel (among other software packages). So I recompile the Kernel quite often in such cases. I like it.
          So you're recompiling the whole kernel with new flags? Then this won't help you at all--and will likely slow you down. Large jobs like kernel compiles involve many independent compiles which can easily be made to use every single thread on a system. It doesn't matter how fast each program compiles as long as the whole collection of them get done quickly. Throughput of the task is what matters. This optimization is to speed up the compilation of *one* large file.

          Comment


          • #6
            Originally posted by willmore View Post
            Is there a type of coding style that generates huge C/C++ files such that this kind of speedup is a meaningful impact to their users workflow?
            Yes, if you include a fair amount of libraries, especially ones that make heavy use template code, compilation time per translation unit can go up significantly. For my current C++ project, some take 20-30 seconds to compile each and since I'm often making small iterative changes that I then need to test, this can get really annoying.

            However, I highly doubt that this feature would bring any speedup in cases like mine. I imagine you would get the best case 1.9x speedup when you have a few hundred simple functions in a translation unit and most of them don't even need to call each other. Some C libraries would fit into that category.

            Comment


            • #7
              Originally posted by willmore View Post
              I'm trying to think of the practical impact of this. Compiling large software like the Linux kernel won't benefit from this as they're already using all of the CPU for different programs. Maybe for a recompile after making a change to a single file?
              I don't think it necessarily has to be that big, just big enough to be a bottleneck. I've had stuff compile where everything but one file finished, where that one file may have held back the whole process by several seconds.

              To me, the real tricky part is prioritizing when to use fparallel. Compiling multiple different single-threaded files simultaneously is indisputably more efficient than splitting a single file to be multi-threaded, but, there are cases where you get a chonky binary and multi-threading that might make the compilation process faster. So, perhaps if the file is bigger than, for example, 1000 lines of code (not including comments or whitespace), that's when fparallel would really come in handy.

              Comment


              • #8
                Originally posted by willmore View Post

                So you're recompiling the whole kernel with new flags? Then this won't help you at all--and will likely slow you down. Large jobs like kernel compiles involve many independent compiles which can easily be made to use every single thread on a system. It doesn't matter how fast each program compiles as long as the whole collection of them get done quickly. Throughput of the task is what matters. This optimization is to speed up the compilation of *one* large file.
                You are right. I was thinking it might helping to fill the small gaps. But yes it doesn't maybe only if you have a large batch of large files and one process is waiting for one to finish to consolidate...*1

                edit:*1 this was explained/posted by schmidtbag at the same time I have posted it. Consider his/her answer as more descriptive. It was not intented to doublepost.
                Last edited by CochainComplex; 21 August 2020, 08:29 AM.

                Comment


                • #9
                  I think we're in agreement on this, CochainComplex and schmidtbag. I'm afraid that determining a priori if this will be beneficial to a specific compile may be halting problem level of difficulty. But there's no reason the build system coulnd't figure out what big compiles are on the critical path from experience. Just keep track of how long each compile took last time and where it falls in the dependency graph to determine if that specific program needs this optimization. And as schmidtbag said, compiling one big program multi-threaded will negatively impact throughput. But since a large project compile may not just be bulk throughput limited. There may be that critical chain of compiles which could benefit from this.

                  For sure the last link/optimize step could, as at that point all the file level parallelism is gone. It's just take a bunch of .o and .a files and make a binary. That's somewhere that could clearly benefit from some threading (if possible). If that's what this optimization addresses, then I do see it as a win.
                  Last edited by willmore; 21 August 2020, 08:58 AM. Reason: Move a comma for clarity.

                  Comment


                  • #10
                    The fact that the only build system integration is with the GNU Make job server is...not great.

                    Comment

                    Working...
                    X