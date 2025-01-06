NVIDIA compiler engineers have spent the past several months working on a proposed GCC optionfor having the compiler optimize the code layout for locality between callees and callers as part of the link-time optimization (LTO) process. For some workloads NVIDIA is finding this -flto-partition=locality compiler option being of significant help for bettering the CPU performance.NVIDIA engineers have been working on improvements to the GNU Compiler Collection for optimize the code layout for locality between callees and callers to minimize the branch distance between frequently called functions. For large applications NVIDIA is finding their proposed patch can be of significant benefit but no actual performance benchmark numbers were provided.

"With this optimization we are seeing good performance gains on some large internal workloads that stress the parts of the processor that is sensitive to code locality, but we'd appreciate wider performance evaluation."