Dehao Chen of Google published an AutoFDO implementation to the GCC mailing list on Friday.
This patch implements the fine-graind AutoFDO optimizations for GCC. It uses linux perf to collect sample profiles, and uses debug info to represent the profile. In GCC, it uses the profile to annotate CFG to drive FDO. This can bring 50% to 110% of the speedup derived by traditional instrumentation based FDO. (Average is between 70% to 80% for many CPU intensive applications). Comparing with traditional FDO, AutoFDO does not require instrumentation. It just need to have an optimized binary with debug info to collect the profile.So this implementation is also faster than the traditional Feedback Directed Optimization (FDO) support currently found in GCC.
This patch has passed bootstrap and gcc regression tests as well as tested with crosstool. Okay for google branches?
If people in up-stream find this feature interesting, I'll spend some time to port this to trunk and try to opensource the tool to generate profile data file.
Up until now traditional FDO has required running the specially-created binary manually to collect sample/training data that can then be passed back to the GNU Compiler Collection for producing a more-optimized binary on an application-specific case. Among the optimizations being dealt with here are instruction scheduling, basic block re-ordering, function splitting, and register allocation.
Let's hope it makes it into GCC trunk! Until then, the patch is on gcc-patches.