Intel's Newest Software Effort For Achieving Greater Performance: Thin Layout Optimizer
Intel's software team is today sharing their newest innovation for achieving greater performance on Linux systems: the Thin Layout Optimizer. Intel's Thin Layout Optimizer is inspired by the likes of the Meta/LLVM BOLT optimizer and Google's Propeller but aims to be much easier to use while still delivering measurable performance gains for optimized binaries.
LLVM's BOLT has been around for a few years and optimizes the code layout of binaries for healthy performance gains. Meta/Facebook upstreamed this Binary Optimization Layout Tool to LLVM and it's shown the ability to deliver nice performance gains. BOLT'ing a binary is a bit of a chore with first needing to compile the codebase, generating the performance profile, and then running BOLT to optimize the code layout of the binary. The Thin Layout Optimizer GitHub page describes BOLT drawbacks as not being able to guarantee correctness and greater speed and memory requirements. Propeller drawbacks are described as requiring Clang/LLD, significant changes to a program's build process, and not being too suitable for general use or package distributions.
Intel's Thin Layout Optimizer still relies on needing a performance profile but Intel engineers put much greater emphasis on ease-of-use and making it more portable for integrating into different build processes. The Thin Layout Optimizer also doesn't require any intermediate binaries for use. The Thin Layout Optimizer relies on perf for recording a profile, Python scripts conveniently package up the end result, and then thin-layout-optimizer command then carries out the optimization task for binaries.
The Thin Layout Optimizer Wiki explains of their design approach:
The Thin Layout Optimizer is already being used by Intel's in-house Linux distribution, Clear Linux, which is known for delivering the best out-of-the-box Linux x86_64 performance. Clear Linux packages have already quietly begun leveraging the Thin Layout Optimizer for achieving even better performance in 2024.
There's also this graphic from the Intel Wiki to provide a high level overview on the Thin Layout Optimizer process:
The Wiki page closes with this summary:
I'm still diving through the Thin-Layout-Optimizer now having access, hopefully will be able to setup some of my own benchmarks soon. With Intel's own tests they are finding around 4% better performance when optimizing LLVM as a test example. That's less than the ~7.5% of using BOLT but with the Thin-Layout-Optimizer having the benefit of being more adaptable and easier to deploy.
Those wishing to learn more about the Thin Layout Optimizer or try out the open-source (MIT-licensed) code can find it via intel/thin-layout-optimizer on GitHub.
LLVM's BOLT has been around for a few years and optimizes the code layout of binaries for healthy performance gains. Meta/Facebook upstreamed this Binary Optimization Layout Tool to LLVM and it's shown the ability to deliver nice performance gains. BOLT'ing a binary is a bit of a chore with first needing to compile the codebase, generating the performance profile, and then running BOLT to optimize the code layout of the binary. The Thin Layout Optimizer GitHub page describes BOLT drawbacks as not being able to guarantee correctness and greater speed and memory requirements. Propeller drawbacks are described as requiring Clang/LLD, significant changes to a program's build process, and not being too suitable for general use or package distributions.
Intel's Thin Layout Optimizer still relies on needing a performance profile but Intel engineers put much greater emphasis on ease-of-use and making it more portable for integrating into different build processes. The Thin Layout Optimizer also doesn't require any intermediate binaries for use. The Thin Layout Optimizer relies on perf for recording a profile, Python scripts conveniently package up the end result, and then thin-layout-optimizer command then carries out the optimization task for binaries.
The Thin Layout Optimizer Wiki explains of their design approach:
"Thin-Layout-Optimizer is a new code-layout optimizer that primarily emphasizes ease of use and ease of adoption while remaining competitive in performance compared to BOLT/Propeller.
Like BOLT/Propeller, Thin-Layout-Optimizer operates on profiles generated with Linux perf and LBR.
Thin-Layout-Optimizer does not disassemble the binary, but rather works as a section reordered via linker scripts similar to Propeller.
Unlike Propeller however, it does not require basic-block sections, and works with any section granularity. An effective granularity is function-sections (-ffunction-sections) which is both near universally supported and provides a reasonable basis for reordering optimizations.
Further, it does not require any changes to the linker command and instead operates transparently by use of environment variables.
Finally, it transparently scales to an arbitrary number of packages and requires little to no incremental changes."
The Thin Layout Optimizer is already being used by Intel's in-house Linux distribution, Clear Linux, which is known for delivering the best out-of-the-box Linux x86_64 performance. Clear Linux packages have already quietly begun leveraging the Thin Layout Optimizer for achieving even better performance in 2024.
There's also this graphic from the Intel Wiki to provide a high level overview on the Thin Layout Optimizer process:
The Wiki page closes with this summary:
"Thin-Layout-Optimizer is a new code-layout optimizer we are happy to be releasing. It is still in its infancy and there is a lot of room for improvement, but we believe its emphasis on usability and transparent scalability will be valuable, and hope that if you have been weary of integrating prior code-layout optimization tools into your workflow, this might change your mind."
I'm still diving through the Thin-Layout-Optimizer now having access, hopefully will be able to setup some of my own benchmarks soon. With Intel's own tests they are finding around 4% better performance when optimizing LLVM as a test example. That's less than the ~7.5% of using BOLT but with the Thin-Layout-Optimizer having the benefit of being more adaptable and easier to deploy.
Those wishing to learn more about the Thin Layout Optimizer or try out the open-source (MIT-licensed) code can find it via intel/thin-layout-optimizer on GitHub.
16 Comments