Announcement

**Azrael5** · 01 April 2014, 03:16 PM

Originally posted by MWisBest View Post

... x86 doesn't mean only 32-bit, it technically includes 64-bit as well.

thanks for answering

**discordian** · 01 April 2014, 03:19 PM

Originally posted by Zan Lynx View Post

LTO in GCC is supported just as well as anything else. Or do you consider code such as the automatic vectorization optimizer or hidden symbol support "rudely patched in?"

It takes some time for new features to mature and have all of their problems discovered and fixed, that's all.

It takes even more time if it wasnt supported from ground up but had to use linker plugins. Which is what I was getting at, gcc 4.8 is actually quite decent but earlier versions had quite a few problems.

Still there is alot of work, changes and improvements in 4.9 that implies its still not that mature in 4.8.

**name99** · 07 April 2014, 06:38 PM

Originally posted by Azrael5 View Post

So 64bit linux systems are optimized or not in link-tyme?

Compilers traditionally optimize only within each function, not across functions. There are two reasons for this, which are kinda linked. To optimize across functions means you need to know all the functions, which means you need a lot more memory (or very smart data structures) and you need to have all the functions at hand.

LTO means applying optimizations at the point of link time --- where you DO now have all the functions at hand. There are a variety of optimizations that can be performed, and I personally would be interested in knowing quite what GCC and LLVM implement.

Dead code stripping is obvious --- remove functions that are never called. Linkers have done this before, but with more compilation knowledge available they can do a better job. Eg, in the past if you had that f() called g(), and g() called f(), but no-one else ever called f() or g(), the linker might not detect that f() and g() are dead if it was doing very simple check for "is used or not".

More interesting are things like call optimization --- if a certain function is only ever called by another certain function, rather than generic marshaling of parameters on the stack or in registers, the callee can just use the registers already in use by the caller --- or maybe even can be inlined, even from another file.

Even more interesting IMHO are code and data reorganization. Code is laid out in an attempt to ensure that functions that are called together lie on the same page. A more aggressive version of this attempts to detect code that is rarely called (eg error handling code) to different pages, so that what's packed into each active page is as much commonly used code as possible, making your TLB entries and cache lines that much denser. These can do an OK job just with heuristics, but can do a rather better job with profile-directed-feedback, which, as I understand it, is one of the areas LLVM is trying hard to make work better. (PDF was one of the original goals of LLVM, but it got dropped by the wayside when there were so many other things to do. I'm guessing one reason it's coming back to prominence is that Apple finally has enough of the essentials in Xcode to feel it's at parity with Dev Studio, and can move on to adding this. PDF and code rearrangement were available on PPC MacOS before OSX, so Apple actually already code in-house for the basic algorithms, it's just a matter of integration.)

You can also attempt to apply the usual sort of optimizations between functions. For example if a condition holds in f() the same condition will hold in called function g(), so a test for it in g() is obsolete and can be removed.

**movieman** · 07 April 2014, 07:13 PM

Originally posted by name99 View Post

These can do an OK job just with heuristics, but can do a rather better job with profile-directed-feedback, which, as I understand it, is one of the areas LLVM is trying hard to make work better.

Oh, I remember the joys of developing on Alpha and having to run the profiler every time after you built the code so you could rebuild it with decent performance after it analyzed the profile. Fortunately I managed to get back to developing on SPARC pretty quickly.

Hopefully they can make it work better than that, but any time you require profiling for performance, you're adding a step that developers won't want to do.

Announcement

Link-Time Optimizations Near Reality For x86 Linux Kernel

Comment

Comment

Comment

Comment