thanks for answering
Originally Posted by MWisBest
Last edited by Azrael5; 04-01-2014 at 03:20 PM.
It takes even more time if it wasnt supported from ground up but had to use linker plugins. Which is what I was getting at, gcc 4.8 is actually quite decent but earlier versions had quite a few problems.
Originally Posted by Zan Lynx
Still there is alot of work, changes and improvements in 4.9 that implies its still not that mature in 4.8.
Compilers traditionally optimize only within each function, not across functions. There are two reasons for this, which are kinda linked. To optimize across functions means you need to know all the functions, which means you need a lot more memory (or very smart data structures) and you need to have all the functions at hand.
Originally Posted by Azrael5
LTO means applying optimizations at the point of link time --- where you DO now have all the functions at hand. There are a variety of optimizations that can be performed, and I personally would be interested in knowing quite what GCC and LLVM implement.
Dead code stripping is obvious --- remove functions that are never called. Linkers have done this before, but with more compilation knowledge available they can do a better job. Eg, in the past if you had that f() called g(), and g() called f(), but no-one else ever called f() or g(), the linker might not detect that f() and g() are dead if it was doing very simple check for "is used or not".
More interesting are things like call optimization --- if a certain function is only ever called by another certain function, rather than generic marshaling of parameters on the stack or in registers, the callee can just use the registers already in use by the caller --- or maybe even can be inlined, even from another file.
Even more interesting IMHO are code and data reorganization. Code is laid out in an attempt to ensure that functions that are called together lie on the same page. A more aggressive version of this attempts to detect code that is rarely called (eg error handling code) to different pages, so that what's packed into each active page is as much commonly used code as possible, making your TLB entries and cache lines that much denser. These can do an OK job just with heuristics, but can do a rather better job with profile-directed-feedback, which, as I understand it, is one of the areas LLVM is trying hard to make work better. (PDF was one of the original goals of LLVM, but it got dropped by the wayside when there were so many other things to do. I'm guessing one reason it's coming back to prominence is that Apple finally has enough of the essentials in Xcode to feel it's at parity with Dev Studio, and can move on to adding this. PDF and code rearrangement were available on PPC MacOS before OSX, so Apple actually already code in-house for the basic algorithms, it's just a matter of integration.)
You can also attempt to apply the usual sort of optimizations between functions. For example if a condition holds in f() the same condition will hold in called function g(), so a test for it in g() is obsolete and can be removed.
Oh, I remember the joys of developing on Alpha and having to run the profiler every time after you built the code so you could rebuild it with decent performance after it analyzed the profile. Fortunately I managed to get back to developing on SPARC pretty quickly.
Originally Posted by name99
Hopefully they can make it work better than that, but any time you require profiling for performance, you're adding a step that developers won't want to do.