Announcement

**AsuMagic** · 05 February 2019, 08:05 AM

So this is about parallelizing compilation of a single TU? How would this interact with parallel build systems?

**discordian** · 05 February 2019, 09:27 AM

This speedup will always be sublinear, while compiling multiple files in parallel can scale linearly (assuming no bottlenecks like IO/shared Caches/Memory).
The related cleanups are still useful, and might help if there are plans for a clangd like compile-server.

**F.Ultra** · 05 February 2019, 01:22 PM

Originally posted by atomsymbol

What about the case when multiple C/C++ files being compiled in parallel #include a common include file, such as <QString>? In such a case, parsing <QString> a single time instead of 12 times on a 12-core CPU is in the total elapsed wallclock time of the parallel build faster than parsing <QString> 12 times in parallel, because the 12-1=11 processes could be doing something else while <QString> is being parsed.

I would say that the parsing of include files dwarfs in comparison with the actual compilation of the source file, so much that optimizing that would probably not save much real runtime.

**F.Ultra** · 05 February 2019, 01:24 PM

Originally posted by discordian View Post

This speedup will always be sublinear, while compiling multiple files in parallel can scale linearly (assuming no bottlenecks like IO/shared Caches/Memory).
The related cleanups are still useful, and might help if there are plans for a clangd like compile-server.

How does the clangd compile-server compare with the age old ccache? Or can they not be compared at all?

**discordian** · 05 February 2019, 02:02 PM

Originally posted by atomsymbol

What about the case when multiple C/C++ files being compiled in parallel #include a common include file, such as <QString>? In such a case, parsing <QString> a single time instead of 12 times on a 12-core CPU is in the total elapsed wallclock time of the parallel build faster than parsing <QString> 12 times in parallel, because the 12-1=11 processes could be doing something else while <QString> is being parsed.

Thats not happening, buildsystems invoke gcc once per file. Threading up the compilation of a single file will take more cummulative time than just running over it serially.

Originally posted by F.Ultra View Post

How does the clangd compile-server compare with the age old ccache? Or can they not be compared at all?

Not at all, clangd is currently a server for code completion in editors and the like. For gcc to be able to be turned into a server you will need to rip out global state (like its part of the GSOC Task as far as i understand).

ccache runs per file, acompile-server could do much more, like Guest implied. the buildsystem would just queue up alot of stuff to build, a single server would then be able to sort common includes, template instantiations and code snippets for the whole project (and even previous runs). Then intelligently cache common constructs at various levels (files, preprocessed and preoptimized code, classes, ...).

**zxy_thf** · 05 February 2019, 06:19 PM

Originally posted by discordian View Post

This speedup will always be sublinear, while compiling multiple files in parallel can scale linearly (assuming no bottlenecks like IO/shared Caches/Memory).

This depends. One case I recently hit is I need to build a binary with a bunch of header only libraries.
"make -j" can finish other parts quickly, but then I need to wait for ~1 min for this single binary.

**carewolf** · 06 February 2019, 06:19 AM

Originally posted by atomsymbol

What about the case when multiple C/C++ files being compiled in parallel #include a common include file, such as <QString>? In such a case, parsing <QString> a single time instead of 12 times on a 12-core CPU is in the total elapsed wallclock time of the parallel build faster than parsing <QString> 12 times in parallel, because the 12-1=11 processes could be doing something else while <QString> is being parsed.

That is already a solved problem with precompiled headers.

**discordian** · 06 February 2019, 06:55 AM

Originally posted by zxy_thf View Post

This depends. One case I recently hit is I need to build a binary with a bunch of header only libraries.
"make -j" can finish other parts quickly, but then I need to wait for ~1 min for this single binary.

You need more CPU time (all cores combined) than doing everything naively on one Core, hence sublinear scaling.
The assumption is, that you can do something else like compiling other files at the same time, which should be true most of the time.

On your specific example, you cant easily parallelize processing headers, as the order they are included matters (or could matter, but the compiler needs to assume they matter and guarantee correct results).

**discordian** · 06 February 2019, 07:00 AM

Originally posted by carewolf View Post

That is already a solved problem with precompiled headers.

Its far from transparent to the buildsystem, compilerspecific and easy to break (differing macros or settings) - and barely used because of that.

C++ Modules ought to tackle this issue, I hope for the best but I am somewhat expecting this to take a long time until all kinks are known and solved for buildsystems.

Announcement

GCC's Potential GSoC Projects Include Better Parallelizing The Compiler

GCC's Potential GSoC Projects Include Better Parallelizing The Compiler

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment