Still there is alot of work, changes and improvements in 4.9 that implies its still not that mature in 4.8.
LTO means applying optimizations at the point of link time --- where you DO now have all the functions at hand. There are a variety of optimizations that can be performed, and I personally would be interested in knowing quite what GCC and LLVM implement.
Dead code stripping is obvious --- remove functions that are never called. Linkers have done this before, but with more compilation knowledge available they can do a better job. Eg, in the past if you had that f() called g(), and g() called f(), but no-one else ever called f() or g(), the linker might not detect that f() and g() are dead if it was doing very simple check for "is used or not".
More interesting are things like call optimization --- if a certain function is only ever called by another certain function, rather than generic marshaling of parameters on the stack or in registers, the callee can just use the registers already in use by the caller --- or maybe even can be inlined, even from another file.
Even more interesting IMHO are code and data reorganization. Code is laid out in an attempt to ensure that functions that are called together lie on the same page. A more aggressive version of this attempts to detect code that is rarely called (eg error handling code) to different pages, so that what's packed into each active page is as much commonly used code as possible, making your TLB entries and cache lines that much denser. These can do an OK job just with heuristics, but can do a rather better job with profile-directed-feedback, which, as I understand it, is one of the areas LLVM is trying hard to make work better. (PDF was one of the original goals of LLVM, but it got dropped by the wayside when there were so many other things to do. I'm guessing one reason it's coming back to prominence is that Apple finally has enough of the essentials in Xcode to feel it's at parity with Dev Studio, and can move on to adding this. PDF and code rearrangement were available on PPC MacOS before OSX, so Apple actually already code in-house for the basic algorithms, it's just a matter of integration.)
You can also attempt to apply the usual sort of optimizations between functions. For example if a condition holds in f() the same condition will hold in called function g(), so a test for it in g() is obsolete and can be removed.
Hopefully they can make it work better than that, but any time you require profiling for performance, you're adding a step that developers won't want to do.