Show Your Support: This site is primarily supported by advertisements. Ads are what have allowed this site to be maintained on a daily basis for the past 18+ years. We do our best to ensure only clean, relevant ads are shown, when any nasty ads are detected, we work to remove them ASAP. If you would like to view the site without ads while still supporting our work, please consider our ad-free Phoronix Premium.
Benchmarks Of JCC Erratum: A New Intel CPU Bug With Performance Implications On Skylake Through Cascade Lake
Intel is today making public the Jump Conditional Code (JCC) erratum. This is a bug involving the CPU's Decoded ICache where on Skylake and derived CPUs where unpredictable behavior could happen when jump instructions cross cache lines. Unfortunately addressing this error in software comes with a performance penalty but ultimately Intel engineers are working to offset that through a toolchain update. Here are the exclusive benchmarks out today of the JCC erratum performance impact as well as when trying to recover that performance through the updated GNU Assembler.
The Jump Conditional Code Erratum
The issue at hand comes down to jump instructions that cross cache lines where on Skylake through Cascadelake there is the potential for "unpredictable behavior" related to the Decoded ICache / Decoded Streaming Buffer.
The microcode update prevents jump instructions from being cached in the Decoded Icache when those instructions cross a 32-byte boundary or where they end on a 32-byte boundary. Due to that change there will be more misses from the Decoded ICache and switches back to the legacy decode pipeline -- resulting in a new performance penalty. The Decoded ICache / Decoded Streaming Buffer has been around since Sandy Bridge but only Skylake and newer is affected by this erratum. Cascade Lake is affected by this erratum but Ice Lake and future iterations appears unaffected. The erratum notice officially lists Amber Lake, Cascade Lake, Coffee Lake, Comet Lake, Kaby Lake, Skylake, and Whiskey Lake as affected generations for the JCC bug.
While this microcode mitigation yields a performance hit for many workloads, Intel is trying to off-set that performance hit through a compiler toolchain update in the assembler around the behavior of jump instructions.
Intel was kind enough to brief us in advance on the JCC issue in order to carry out our own independent performance tests on both the microcode impact and then the --at least partially -- recovered performance when making use of the updated compiler toolchain bits. This erratum isn't being described as a security issue but just one potentially yielding "unpredictable behavior" when jump instructions cross cache lines.
Intel's Performance Guidance
Intel's official guidance coming out today states their observed performance effects from this microcode update to be in the range of 0~4% but with some "outliers higher than the 0~4% range."
Unlike the Spectre/Meltdown/L1TF/MDS speculative execution vulnerabilities where the mitigations mostly impact workloads interacting with the kernel and switching between user/kernel space, the JCC erratum's microcode update can affect just pure user-space workloads depending upon whether there are many jumps spanning 32-byte boundaries. So while the performance impact may often just be in the single digits, more workloads are affected than we have seen out of some of the recent speculative execution security vulnerabilities.
Helping Reduce The Performance Impact
To help offset the impact of the updated microcode, Intel engineers have been working on toolchain updates for trying to ensure jump instructions do not cross 32-byte boundaries or end on a 32-byte boundary. Intel has patches to the GNU Assembler for being able to align jumps within a 32-byte boundary and various flags for toggling the behavior. Of course, these patches still need to work their way upstream into released versions of the GNU Assembler and other assemblers and from there to be picked up by the various operating systems, etc.
Given how we usually see such updates occur, sans the rolling-release distributions it will likely not be until the next round of Linux distribution releases before the software is built with the patched assembler. Even if an updated assembler makes it as a stable release update to non-rolling-release distributions, it's still the matter of all software needing to be re-compiled with this workaround to avoid the jumps across 32-byte boundaries. So at least there is the ability to partially to fully recover from the microcode performance drop, but will likely take some time before users at large will have the updated assembler on their systems unless proactively doing so. On the other hand, distributions like Ubuntu tend to more actively ship microcode updates as stable release updates than they do key pieces of their toolchain, so it's quite possible some distributions may soon begin shipping this updated microcode in order to address the potential unpredictable behavior months before seeing any updated assembler.
There is also the matter of when proprietary software vendors ship software rebuilt with the updated toolchain as well and cases like games or other performance-minded software that might never see such rebuilds.
Our Performance Tests
For your viewing pleasure I already have some initial benchmarks carried out over the past few days. The tests are looking at the existing (pre-mitigated) performance, the performance when just updating the CPU microcode, and the performance of the new microcode but also having a patched assembler. Additional tests including gaming benchmarks will be on Phoronix shortly.