Zombieload V2 TAA Performance Impact Benchmarks On Cascade Lake
While this week we have posted a number of benchmarks on the JCC Erratum and its CPU microcode workaround that introduces new possible performance hits, also being announced this week as part of Intel's security disclosures was "Zombieload Variant Two" as the TSX Async Abort vulnerability that received same-day Linux kernel mitigations. I've been benchmarking the TAA mitigations to the Linux kernel since the moment they hit the public Git tree and here are those initial benchmark results on an Intel Cascade Lake server.
While Intel's latest-generation Cascade Lake server processors have hardware protections against other MDS vulnerabilities like RIDL and Fallout, they require software mitigations for Zombieload V2 / TAA. Researchers had disclosed this Zombieload variant back to Intel earlier in the year but was placed under an extended embargo and not revealed back during the original May disclosures.
Besides Cascade Lake, other Intel CPUs requiring the extra TAA mitigations are Whiskey Lake and Coffeelake-R processors -- at least those where Intel TSX (Transactional Synchronization Extensions) are supported. Those wanting to learn more about all of the intracices of Zombieload V2 / TSX Async Abort can see ZombieloadAttack.com and the Intel Deep Dive. For your viewing pleasure in this article are the initial Cascade Lake benchmarks following Linux's TAA mitigations landing. Details on the Linux kernel's TAA mitigations can be found via this documentation.
For this Cascade Lake testing, which is also believed to be the first public benchmarks of the TAA Linux mitigations anywhere, tests were done on a dual Intel Xeon Platinum 8280 server. The server platform in use was the Gigabyte S451-3R0 Xeon Scalable, kindly provided by Gigabyte.
During this benchmarking the server was running Ubuntu 19.10 with the Linux 5.4 Git kernel. Being compared in this article was the new TAA mitigations by default when TSX is enabled, the performance impact when disabling the mitigation (using the new tsx_async_abort=off switch), and the performance when simply disabling Intel TSX using the new tsx=off switch.
This article isn't comparing the combined impact of the other speculative execution mitigations, the JCC Erratum, or any other combinations. Follow-up articles will be looking at the different combinations while for today is just seeing what this new TSX Async Abort code in the kernel presents. Also keep in mind for all these tests today SMP/HT was left enabled, but again the current no-HT performance is something that will be revisited in the future.
When firing up different benchmarks found to be impacted by the TAA mitigations, the geometric mean of those results pointed to the Cascade Lake server running just under 8% slower from the new kernel mitigation this week on affected workloads. Meanwhile disabling TSX and running TSX without any mitigations yielded similar performance.
Now let's look at the individual benchmark results.