VMware: ESXi VM Performance Tanks Up To 70% Due To Intel Retbleed Mitigation
VMware's performance engineering team today announced a performance regression in Linux 5.19 affecting compute performance up to -70%, networking up to -30%, and storage up to -13%. But the unfortunate thing is the heavy hitting regressions are known and a side effect of the Intel Retbleed mitigation for older processors.
VMware engineers were concerned by the performance regression when running Linux virtual machines with VMware ESXi and bisected it down to the Intel Retbleed vulnerability handling. They confirmed the performance regression dropped by disabling the mitigation using the spectre_v2=off option.
The hardware they were testing was an Intel Xeon "Skylake" server with 112 threads and 2TB of RAM. Retbleed on the Intel side is known to affect Intel Core 6th through 8th Gen client CPUs and associated Xeon processors. For the very popular Skylake Xeons and there still being many out there in use, Retbleed further impacts the performance especially for VMs with additional performance cost beyond Spectre, Meltdown, and the other CPU speculative execution vulnerabilities of recent years. Due to Retbleed, Intel Indirect Branch Restricted Speculation (IBRS) is flipped on by default for affected CPUs.
Most severe in VMware's tests was thread creation times dropping from 16 to 27 ms while also kernel build times taking seconds longer, longer start-up and shutdown times for VMs, VM networking performance dropping from ~11.932 Gbps to ~8.56 Gbps, and storage also being hit noticeably too.
See this kernel mailing list post for the notes from VMware.
So far the only response was from an Intel engineer commenting on it and the spectre_v2=off comparison as, "Well, duh.. :-)" (Peter previously referred to the Retbleed/IBRS situation as a "performance horror show" and has been working on call depth tracking as a possible alternative mitigation technique.)
I haven't ran any Retbleed VM-impact comparison benchmarks yet but those wondering about bare metal performance impact on affected Intel and AMD CPUs can see my other Retbleed articles for those benchmarks.
VMware engineers were concerned by the performance regression when running Linux virtual machines with VMware ESXi and bisected it down to the Intel Retbleed vulnerability handling. They confirmed the performance regression dropped by disabling the mitigation using the spectre_v2=off option.
Ouch...
The hardware they were testing was an Intel Xeon "Skylake" server with 112 threads and 2TB of RAM. Retbleed on the Intel side is known to affect Intel Core 6th through 8th Gen client CPUs and associated Xeon processors. For the very popular Skylake Xeons and there still being many out there in use, Retbleed further impacts the performance especially for VMs with additional performance cost beyond Spectre, Meltdown, and the other CPU speculative execution vulnerabilities of recent years. Due to Retbleed, Intel Indirect Branch Restricted Speculation (IBRS) is flipped on by default for affected CPUs.
Most severe in VMware's tests was thread creation times dropping from 16 to 27 ms while also kernel build times taking seconds longer, longer start-up and shutdown times for VMs, VM networking performance dropping from ~11.932 Gbps to ~8.56 Gbps, and storage also being hit noticeably too.
See this kernel mailing list post for the notes from VMware.
So far the only response was from an Intel engineer commenting on it and the spectre_v2=off comparison as, "Well, duh.. :-)" (Peter previously referred to the Retbleed/IBRS situation as a "performance horror show" and has been working on call depth tracking as a possible alternative mitigation technique.)
I haven't ran any Retbleed VM-impact comparison benchmarks yet but those wondering about bare metal performance impact on affected Intel and AMD CPUs can see my other Retbleed articles for those benchmarks.
18 Comments