Parallelizing GCC's Internals Continues To Be Worked On & Showing Promising Potential

pal666 replied

02 October 2019, 08:30 AM
Originally posted by ms178 View Post

From the article: "When taking it further with parallelized RTL, there was a 1.6x speed-up in compile time. That is also "without much optimization" as there is a lot of opportunity left." - these 60 % speed up is the impressive one.

i already said that article is wrong. 1.6 was extrapolation from 1.09. opportunity is left to parallelize larger part of compiler and maybe get 1.6x (on 4 cores, instead of 4x). an fixing bugs preventing passing of testsuite can lower those numbers
Leave a comment:
ms178 replied

01 October 2019, 01:36 PM
Originally posted by pal666 View Post

speedup achieved was 1.09 and it still doesn't pass testsuite
at the cost of some other benefits which they are developing instead, like just faster compilation on one thread. and benefits are not enough to replace parallel invocation, they can only augment it with future work

From the article: "When taking it further with parallelized RTL, there was a 1.6x speed-up in compile time. That is also "without much optimization" as there is a lot of opportunity left." - these 60 % speed up is the impressive one. Is there still work to do? Sure! But that shows me that there was not a lot of effort previously in this area.
Leave a comment:
pal666 replied

01 October 2019, 10:42 AM
Originally posted by ms178 View Post

And the example of GCC shows that a GSoC student can achieve these speed ups

speedup achieved was 1.09 and it still doesn't pass testsuite

Originally posted by ms178 View Post

Hence if a concentrated effort had been made earlier, they could have unlocked these benefits way sooner.

at the cost of some other benefits which they are developing instead, like just faster compilation on one thread. and benefits are not enough to replace parallel invocation, they can only augment it with future work
Leave a comment:
Zan Lynx replied

28 September 2019, 04:26 PM
Originally posted by atomsymbol

Except that some C++ header files tend to be quite large because bodies of templates have to be in .h files and cannot be in .cc files. In such cases the .cc file corresponding to the header file is basically empty. Quite a lot of template code optimizations are redundant and can be optimized away.

There are some good tricks that do let you put the template body in a cc or tcc or whatever you want to call it.

In the .h file declare the templates, but don't give any implementation details. Then for the types you know you will be using declare extern template specializations.

Then in a cpp file in your project include the actual template implementation from the tcc and define the template specializations.

That builds one single copy of the templates and links it.

And if you do need to use it with unknown types you can include the tcc and live with the extra compile time.
Leave a comment:
ms178 replied

27 September 2019, 06:55 AM
Originally posted by pal666 View Post

gcc was in multi-core world with parallel invocations. multithreading compiler isn't easy and isn't as efficient (you can see, they never got 4x spedup on 4 threads). it's nice it is being done, but you are acting like it's last compiler to do it

You read way too much into that quoted part and the context of the rest of the post shows you that I see it as a general problem, not just GCC, that software that could need the ressources we already have available today, doesn't make use of it due to lack of developer effort. And the example of GCC shows that a GSoC student can achieve these speed ups - there was no need of a Ninja programmer apparently. Hence if a concentrated effort had been made earlier, they could have unlocked these benefits way sooner. The same goes for other software, e.g. with video rendering and other areas where parallelism and vectorization could be used better even today, e.g. there is ISPC out there to make use of both in an easier and more portable way than intrinsics or assembly.
Leave a comment:
Clive McCarthy replied

27 September 2019, 12:53 AM
This is off-topic, however, it strikes me that Summer of Code projects might be better directed toward cleaning up poorly written code rather than speeding things up. Sure a project to speed things up has a measurable impact but cleaning up poorly written code would teach someone much more. Inculcating respect for Niklaus Wirth, Donald Knuth and Edsger Dijkstra might lead to a much higher level of competence.
Leave a comment:
Clive McCarthy replied

26 September 2019, 11:19 PM
I very much like the idea of a faster linker. Building with GTK is fine, but linking takes ages (I joke, I was once familiar with clean builds taking overnight with the crew I managed).
Leave a comment:
Clive McCarthy replied

26 September 2019, 08:13 PM
I agree. I should have excluded auto-generated source code from my comment.
Leave a comment:
flashmozzg replied

26 September 2019, 07:48 PM
Originally posted by Clive McCarthy View Post

Michael, You write:
"One of the most interesting Google Summer of Code projects this year was the student effort to work on better parallelizing GCC's internals to deal with better performance particularly when dealing with very large source files. Fortunately -- given today's desktop CPUs even ramping up their core counts -- this parallel GCC effort is being continued."

The use of very large source files (anything above 64kB) is a clear indication of poor software design and lack of modularity. It should not be encouraged by speeding up the compiler. The linker needs to be fast but the front end is already capable of using all the cores one has available. Parallelizing something that can already be run in parallel is a waste of effort.

Clive.

There are many files of several MiBs in size in, say, LLVM codebase which include all tablegen-ed boilerplate machinery from CPU definitions.Big source files are only a problem if the developer is expected to read and maintain them.
Leave a comment:
flashmozzg replied

26 September 2019, 07:44 PM
Originally posted by atomsymbol

ld.gold is several times faster than ld.bfd, but unfortunately ld.gold is making some packages fail to build.

And lld is even faster still.
Leave a comment:

Announcement

Parallelizing GCC's Internals Continues To Be Worked On & Showing Promising Potential

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: