With the recent interest regarding Link-Time Optimization support within the Linux kernel by GCC, here are some benchmarks of the latest stable release of GCC (v4.7.1) when benchmarking several common open-source projects with and without the performance-enhancing LTO compiler support.
This article, which spawned out of the many LTO comments from the Linux kernel link-time optimization article, has a few benchmarks conducted since then from an Intel Ivy Bridge system. More GCC LTO benchmarks will come in a future article. GCC's Link-Time Optimization support allows for creating faster generated binaries at the cost of increased compile times and greater system memory (RAM) usage. When building software with the performance-oriented feature, link-time optimizations are applied on recent versions of the GNU Compiler Collection that allow for in-lining of functions between different files, performing various compile-time optimizations across the binary as a whole, dropping dead code, and running entire checks across the whole binary rather than just individual files.
As explained on the GCC Wiki, "Link Time Optimization (LTO) gives GCC the capability of dumping its internal representation (GIMPLE) to disk, so that all the different compilation units that make up a single executable can be optimized as a single module. This expands the scope of inter-procedural optimizations to encompass the whole program (or, rather, everything that is visible at link time)...The fundamental mechanism used by the compiler to delay optimization until link time is to write the GIMPLE representation of the program on special sections in the output file. For the initial implementation on the branch, ELF was chosen as the container format for these sections, but in GCC-4.6 support has been added on the trunk for PE-COFF and Mach-O. In order to use LTO the target must support one of these binary formats."
Using GCC LTO requires passing "-flto" compiler flag to enable the main Link-Time Optimization features. There's also a "-fwhopr" that is similar to "-flto" but splits compilation to achieve scalability for very large code-bases that cannot fit all in memory at once.
For this quick GCC 4.7.1 LTO benchmarking, the Intel Core i7 3517UE "Ivy Bridge" system was running Ubuntu 12.10 x86_64 with GCC 4.7.1 from the Ubuntu Quantal repository and then running several common computational test profiles with and without the "-flto" compiler flag. All other CFLAGS/CXXFLAGS options were maintained the same throughout the testing process. This benchmarking was handled in a fully automated manner using the open-source multi-platform Phoronix Test Suite software.