Google Using AutoFDO On Linux Meant Up To 12% Less Cycles Spent Within The Kernel
While a Microsoft engineer was at Linux Plumbers Conference this week talking up their LTO and PGO optimization work for the Linux kernel, Google engineers have now one upped that work by also shipping kernels with AutoFDO optimizations.
Google is not only leveraging link-time optimizations (LTO) and profile-guided optimizations (PGO) for maximizing compiler-based efficiencies of their kernel images but also auto feedback directed optimizations. AutoFDO relies upon CPU hardware counters with a sampling based profile for driving feedback to the compiler for better optimizing the binaries. AutoFDO uses data collected by Linux's perf subsystem with hardware counters and using that information for making more informed decisions regarding optimizations. More details on AutoFDO can be found via this Wiki page.
AutoFDO has the benefit of not needing specialized builds in the first place, unlike PGO, for collecting the profile information. But AutoFDO obviously still requires a run of the program in order to collect the samples.
Google's Sami Tolvanen, Bill Wendling, and Nick Desaulniers talked at LPC 2020 about their LTO/PGO/AutoFDO adventures for the kernels they are now shipping at Google. One of the most interesting tid-bits of information they shared was in regards to the AutoFDO results. When making use of AutoFDO, they found up to a 12% reduction in cycles spent in the kernel. That 12% appears to be for x86_64 hardware while other microarchitectures weren't quite as good but still benefiting. In part it may also be a function of the accuracy of the collected hardware counters for how well AutoFDO can optimize the software being built.
In any case, those interested in more details on Google's adventures with LTO + PGO + AutoFDO for the Linux kernel can see the slide deck for all the details.
Google is not only leveraging link-time optimizations (LTO) and profile-guided optimizations (PGO) for maximizing compiler-based efficiencies of their kernel images but also auto feedback directed optimizations. AutoFDO relies upon CPU hardware counters with a sampling based profile for driving feedback to the compiler for better optimizing the binaries. AutoFDO uses data collected by Linux's perf subsystem with hardware counters and using that information for making more informed decisions regarding optimizations. More details on AutoFDO can be found via this Wiki page.
AutoFDO has the benefit of not needing specialized builds in the first place, unlike PGO, for collecting the profile information. But AutoFDO obviously still requires a run of the program in order to collect the samples.
Google's Sami Tolvanen, Bill Wendling, and Nick Desaulniers talked at LPC 2020 about their LTO/PGO/AutoFDO adventures for the kernels they are now shipping at Google. One of the most interesting tid-bits of information they shared was in regards to the AutoFDO results. When making use of AutoFDO, they found up to a 12% reduction in cycles spent in the kernel. That 12% appears to be for x86_64 hardware while other microarchitectures weren't quite as good but still benefiting. In part it may also be a function of the accuracy of the collected hardware counters for how well AutoFDO can optimize the software being built.
In any case, those interested in more details on Google's adventures with LTO + PGO + AutoFDO for the Linux kernel can see the slide deck for all the details.
8 Comments