Announcement

Collapse
No announcement yet.

Parallelizing GCC's Internals Continues To Be Worked On & Showing Promising Potential

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parallelizing GCC's Internals Continues To Be Worked On & Showing Promising Potential

    Phoronix: Parallelizing GCC's Internals Continues To Be Worked On & Showing Promising Potential

    One of the most interesting Google Summer of Code projects this year was the student effort to work on better parallelizing GCC's internals to deal with better performance particularly when dealing with very large source files. Fortunately -- given today's desktop CPUs even ramping up their core counts -- this parallel GCC effort is being continued...

    http://www.phoronix.com/scan.php?pag...-Cauldron-2019

  • atomsymbol
    replied
    Originally posted by nanonyme View Post

    Yeah, these days it's mostly in linker where LLVM compilation stack runs circles around GNU stack.
    Originally posted by atomsymbol View Post

    ld.gold is several times faster than ld.bfd, but unfortunately ld.gold is making some packages fail to build.
    Originally posted by flashmozzg View Post

    And lld is even faster still.
    Originally posted by atomsymbol View Post

    New technologies always require time to become widespread. I am using gcc 8.3 at the moment which does not support -fuse-ld=lld, will evaluate lld when I switch to gcc 9.2+ which does support lld.
    I switched to gcc 9.2 for my projects. -fuse-ld=lld is about 2 times faster than -fuse-ld=gold, while using the option -Wl,--threads in both cases.

    Due to a small number of compatibility issues, I am still compiling all Gentoo packages with the default linker (ld.bfd). This usually runs in background, so it mostly doesn't matter whether it takes 0.5 or 2 seconds to link executables/libraries.

    Leave a comment:


  • pal666
    replied
    Originally posted by ms178 View Post
    From the article: "When taking it further with parallelized RTL, there was a 1.6x speed-up in compile time. That is also "without much optimization" as there is a lot of opportunity left." - these 60 % speed up is the impressive one.
    i already said that article is wrong. 1.6 was extrapolation from 1.09. opportunity is left to parallelize larger part of compiler and maybe get 1.6x (on 4 cores, instead of 4x). an fixing bugs preventing passing of testsuite can lower those numbers

    Leave a comment:


  • ms178
    replied
    Originally posted by pal666 View Post
    speedup achieved was 1.09 and it still doesn't pass testsuite
    at the cost of some other benefits which they are developing instead, like just faster compilation on one thread. and benefits are not enough to replace parallel invocation, they can only augment it with future work
    From the article: "When taking it further with parallelized RTL, there was a 1.6x speed-up in compile time. That is also "without much optimization" as there is a lot of opportunity left." - these 60 % speed up is the impressive one. Is there still work to do? Sure! But that shows me that there was not a lot of effort previously in this area.

    Leave a comment:


  • pal666
    replied
    Originally posted by ms178 View Post
    And the example of GCC shows that a GSoC student can achieve these speed ups
    speedup achieved was 1.09 and it still doesn't pass testsuite
    Originally posted by ms178 View Post
    Hence if a concentrated effort had been made earlier, they could have unlocked these benefits way sooner.
    at the cost of some other benefits which they are developing instead, like just faster compilation on one thread. and benefits are not enough to replace parallel invocation, they can only augment it with future work

    Leave a comment:


  • atomsymbol
    replied
    Originally posted by Zan Lynx View Post

    There are some good tricks that do let you put the template body in a cc or tcc or whatever you want to call it.

    In the .h file declare the templates, but don't give any implementation details. Then for the types you know you will be using declare extern template specializations.

    Then in a cpp file in your project include the actual template implementation from the tcc and define the template specializations.

    That builds one single copy of the templates and links it.

    And if you do need to use it with unknown types you can include the tcc and live with the extra compile time.
    Yes, but the programmer would need to enable LTO compilation in order to maintain runtime performance if using an optimization level that enables inlining, which will negatively affect compilation time during software development compared to non-LTO. All optimization levels, except -O0 and -Og, have inlining enabled (I am not sure about -Og).

    There is also the issue of manual removal of template specializations that are no longer in use. The maintenance cost of template specializations of basic datatypes like lists/vectors, sets and maps is high.

    Leave a comment:


  • Zan Lynx
    replied
    Originally posted by atomsymbol View Post
    Except that some C++ header files tend to be quite large because bodies of templates have to be in .h files and cannot be in .cc files. In such cases the .cc file corresponding to the header file is basically empty. Quite a lot of template code optimizations are redundant and can be optimized away.
    There are some good tricks that do let you put the template body in a cc or tcc or whatever you want to call it.

    In the .h file declare the templates, but don't give any implementation details. Then for the types you know you will be using declare extern template specializations.

    Then in a cpp file in your project include the actual template implementation from the tcc and define the template specializations.

    That builds one single copy of the templates and links it.

    And if you do need to use it with unknown types you can include the tcc and live with the extra compile time.

    Leave a comment:


  • edwardbailey
    replied
    With the recent rise in computer science classes across all grades, we’re starting to grow our personal vocabularies in ways that make the average person uncomfortable. To complicate matters, many of these “new” words seem to have such intimidating histories that we don’t take the time to properly understand their usage — instead, we repeat them blindly, whether we know what they really mean or not.

    One such example of complex wordsmithing is the synonymous use of the terms “coding” and “programming”.

    Here you can read it more on it: https://www.goodcore.co.uk/blog/coding-vs-programming/

    Leave a comment:


  • ms178
    replied
    Originally posted by pal666 View Post
    gcc was in multi-core world with parallel invocations. multithreading compiler isn't easy and isn't as efficient (you can see, they never got 4x spedup on 4 threads). it's nice it is being done, but you are acting like it's last compiler to do it
    You read way too much into that quoted part and the context of the rest of the post shows you that I see it as a general problem, not just GCC, that software that could need the ressources we already have available today, doesn't make use of it due to lack of developer effort. And the example of GCC shows that a GSoC student can achieve these speed ups - there was no need of a Ninja programmer apparently. Hence if a concentrated effort had been made earlier, they could have unlocked these benefits way sooner. The same goes for other software, e.g. with video rendering and other areas where parallelism and vectorization could be used better even today, e.g. there is ISPC out there to make use of both in an easier and more portable way than intrinsics or assembly.

    Leave a comment:


  • atomsymbol
    replied
    Originally posted by pal666 View Post
    it is irrelevant for subj. includes are processed by fronted, subj is about optimizations in middle end. even with c++20 modules compiler will skip parsing sources, but will still have to do optimizations. and (surprise) optimizations are taking more time than parsing
    Except that some C++ header files tend to be quite large because bodies of templates have to be in .h files and cannot be in .cc files. In such cases the .cc file corresponding to the header file is basically empty. Quite a lot of template code optimizations are redundant and can be optimized away.

    Take a look at this 4K (4077 bytes) ZX Spectrum demo for inspiration about how far software optimization can be pushed:

    Leave a comment:

Working...
X