LLVM 10 Adds Option To Help Offset Intel JCC Microcode Performance Impact
Written by Michael Larabel in Intel on 16 January 2020 at 09:06 AM EST. 2 Comments
Disclosed back in November was the Intel Jump Conditional Code Erratum that necessitated updated CPU microcode to mitigate and with that came with a nearly across the board performance impact. But Intel developers had been working on assembler patches for helping to reduce that performance hit. The GNU Assembler patches were merged back in December while now ahead of LLVM 10.0 that alternative toolchain has an option for helping to recover some of the lost performance.

On the GNU side the exposed option is "-mbranches-within-32B-boundaries" for altering the handling of jump instructions to aide in reducing the performance hit from the Intel CPU microcode update for Skylake through Cascadelake. (More details in the original JCC article, which includes early benchmarks of the JCC impact and of the mitigated support that has been available within Intel's Clear Linux since the disclosure date.)

LLVM developers meanwhile since November have been discussing their equivalent handling to help reduce the performance hit on affected processors from this microcode update. They began landing changes in late December while now the necessary LLVM and Clang bits are now exposed.

This week prior to the LLVM 10.0 feature freeze the necessary option is now available: --x86-branches-within-32B-boundaries. The commit adding --x86-branches-within-32B-boundaries explains, "Since we have the nop support checked in, and it appears mature(*), I think it's time to add the master flag. For now, it will default to nop padding, but once the prefix padding support lands, we'll update the defaults. (*) I can now confirm that downstream testing of the changes which have landed to date - nop padding and compiler support for suppressions - is passing all of the functional testing we've thrown at it. There might still be something lurking, but we've gotten enough coverage to be confident of the basic approach. Note that the new flag can be used either when assembling an .s file, or when using the integrated assembler directly from the compiler. The later will use all of the suppression mechanism and should always generate correct code."

For Clang 10 the option is -mbranches-within-32B-boundaries and just landed as well. This option is the same naming as on the GNU side. That Clang driver option was added here.

This LLVM/Clang 10.0 support for helping to offset the JCC Erratum microcode performance impact is not enabled by default.

LLVM 10.0 entered its feature freeze / code branching yesterday and should be out as stable at the end of February.
Related News
About The Author
Author picture

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter or contacted via MichaelLarabel.com.

Popular News This Week