LLVM Merges Machine Function Splitter For ~32% Reduction In TLB Misses
At the beginning of August we reported on Google engineers proposing the Machine Function Splitter to LLVM as a means of making binaries up to a few percent faster thanks to this code generation optimization pass for splitting code functions into hot and cold portions. That work has now been merged into LLVM 12.0 with very promising results.
The LLVM Machine Function Splitter was merged prior to the weekend into the Git code-base for what will be LLVM 12.0 early next year. Making use of this optimization pass ensures the hot code paths are loaded into the CPU cache while keeping the cold code paths at lower priority for the cache.
The Google engineers found a 2.33% runtime improvement with a ~32% reduction in iTLB and sTLB misses. The L1 iCache misses were done by 9.5% while L2 instruction misses dropped by 20%. For SPECInt, the Clang performance improved by 0.6~1.6%.
The code is merged and I'll be working on some Machine Function Splitter benchmarks soon. The Machine Function Splitter does rely upon profile information for being able to evaluate the hot/cold paths of the program.
The LLVM Machine Function Splitter was merged prior to the weekend into the Git code-base for what will be LLVM 12.0 early next year. Making use of this optimization pass ensures the hot code paths are loaded into the CPU cache while keeping the cold code paths at lower priority for the cache.
The Google engineers found a 2.33% runtime improvement with a ~32% reduction in iTLB and sTLB misses. The L1 iCache misses were done by 9.5% while L2 instruction misses dropped by 20%. For SPECInt, the Clang performance improved by 0.6~1.6%.
The code is merged and I'll be working on some Machine Function Splitter benchmarks soon. The Machine Function Splitter does rely upon profile information for being able to evaluate the hot/cold paths of the program.
18 Comments