Announcement

**willmore** · 21 August 2020, 07:58 AM

I'm trying to think of the practical impact of this. Compiling large software like the Linux kernel won't benefit from this as they're already using all of the CPU for different programs. Maybe for a recompile after making a change to a single file?

Is there a type of coding style that generates huge C/C++ files such that this kind of speedup is a meaningful impact to their users workflow?

Or is this not for the 'compile' part but just for the LTO bit at the end--after all the .o files are generated and they're combined into the final binary?

**CochainComplex** · 21 August 2020, 08:00 AM

Speedup of x0.95 ist then a slowdown I guess. So in some cases it is slower ?

**CochainComplex** · 21 August 2020, 08:03 AM

Originally posted by willmore View Post

I'm trying to think of the practical impact of this.

I'm testing once in a while different compiler flags on Kernel (among other software packages). So I recompile the Kernel quite often in such cases. I like it.

**willmore** · 21 August 2020, 08:14 AM

Originally posted by CochainComplex View Post

I'm testing once in a while different compiler flags on Kernel (among other software packages). So I recompile the Kernel quite often in such cases. I like it.

So you're recompiling the whole kernel with new flags? Then this won't help you at all--and will likely slow you down. Large jobs like kernel compiles involve many independent compiles which can easily be made to use every single thread on a system. It doesn't matter how fast each program compiles as long as the whole collection of them get done quickly. Throughput of the task is what matters. This optimization is to speed up the compilation of *one* large file.

**david-nk** · 21 August 2020, 08:19 AM

Originally posted by willmore View Post

Is there a type of coding style that generates huge C/C++ files such that this kind of speedup is a meaningful impact to their users workflow?

Yes, if you include a fair amount of libraries, especially ones that make heavy use template code, compilation time per translation unit can go up significantly. For my current C++ project, some take 20-30 seconds to compile each and since I'm often making small iterative changes that I then need to test, this can get really annoying.

However, I highly doubt that this feature would bring any speedup in cases like mine. I imagine you would get the best case 1.9x speedup when you have a few hundred simple functions in a translation unit and most of them don't even need to call each other. Some C libraries would fit into that category.

**schmidtbag** · 21 August 2020, 08:24 AM

Originally posted by willmore View Post

I'm trying to think of the practical impact of this. Compiling large software like the Linux kernel won't benefit from this as they're already using all of the CPU for different programs. Maybe for a recompile after making a change to a single file?

I don't think it necessarily has to be that big, just big enough to be a bottleneck. I've had stuff compile where everything but one file finished, where that one file may have held back the whole process by several seconds.

To me, the real tricky part is prioritizing when to use fparallel. Compiling multiple different single-threaded files simultaneously is indisputably more efficient than splitting a single file to be multi-threaded, but, there are cases where you get a chonky binary and multi-threading that might make the compilation process faster. So, perhaps if the file is bigger than, for example, 1000 lines of code (not including comments or whitespace), that's when fparallel would really come in handy.

**CochainComplex** · 21 August 2020, 08:24 AM

Originally posted by willmore View Post

So you're recompiling the whole kernel with new flags? Then this won't help you at all--and will likely slow you down. Large jobs like kernel compiles involve many independent compiles which can easily be made to use every single thread on a system. It doesn't matter how fast each program compiles as long as the whole collection of them get done quickly. Throughput of the task is what matters. This optimization is to speed up the compilation of *one* large file.

You are right. I was thinking it might helping to fill the small gaps. But yes it doesn't maybe only if you have a large batch of large files and one process is waiting for one to finish to consolidate...*1

edit:*1 this was explained/posted by schmidtbag at the same time I have posted it. Consider his/her answer as more descriptive. It was not intented to doublepost.

**willmore** · 21 August 2020, 08:57 AM

I think we're in agreement on this, CochainComplex and schmidtbag. I'm afraid that determining a priori if this will be beneficial to a specific compile may be halting problem level of difficulty. But there's no reason the build system coulnd't figure out what big compiles are on the critical path from experience. Just keep track of how long each compile took last time and where it falls in the dependency graph to determine if that specific program needs this optimization. And as schmidtbag said, compiling one big program multi-threaded will negatively impact throughput. But since a large project compile may not just be bulk throughput limited. There may be that critical chain of compiles which could benefit from this.

For sure the last link/optimize step could, as at that point all the file level parallelism is gone. It's just take a bunch of .o and .a files and make a binary. That's somewhere that could clearly benefit from some threading (if possible). If that's what this optimization addresses, then I do see it as a win.

**re:fi.64** · 21 August 2020, 10:01 AM

The fact that the only build system integration is with the GNU Make job server is...not great.

Announcement

GCC "-fparallel-jobs" Sent Out For Compiling Individual Files In Parallel - Up To ~1.9x Speedup

GCC "-fparallel-jobs" Sent Out For Compiling Individual Files In Parallel - Up To ~1.9x Speedup

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment