Lenovo Discovers Situation Of Linux Dropping PCIe Gen 5 NVMe SSDs To Gen 1 Speeds

Written by Michael Larabel in Linux Storage on 10 January 2025 at 10:33 AM EST. 8 Comments
LINUX STORAGE
A change made to the Linux kernel in June 2023 has led to a situation where PCIe Gen5 NVMe solid state drives could potentially drop down to Gen1 speeds... Lenovo engineers spotted this issue and bisected the problem along with coming up with a solution.

Lenovo engineers discovered PCIe Gen5 NVMe drives dropping from 32 GT/s down to 2.5 GT/s with Linux kernels from the past year and a half. This disastrous hit to the throughput was discovered during a hot-add/hot-remove of the storage drives, so fortunately not a very common occurrence especially among desktop/mobile class hardware. But presumably it was Lenovo engineers on the server side noticing this issue when quickly hot-adding/removing drives on Lenovo server platforms.

PCIe Gen5 SSD hot swap


Jiwei Sun of Lenovo explained in a new patch series today on the Linux kernel mailing list:
"When we do the quick hot-add/hot-remove test with a PCIE Gen 5 NVMe disk, there is a possibility that the PCIe bridge will decrease to 2.5GT/s from 32GT/s.

The issue is caused by commit a89c82249c37 ("PCI: Work around PCIe link training failures"). Although the commit 712e49c96706 ("PCI: Correct error reporting with PCIe failed link retraining") and the commit f68dea13405c ("PCI: Revert to the original speed after PCIe failed link retraining") have tried to fix the similar issue. However, there is still a window for triggering the issue within 1-second hot-add/hot-remove test.

Besides, the commit de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() introduces two potential issues might cause that the removing 2.5GT/s downstream link speed restriction works fail."

The original commit in June 2023 that introduced the problem was a patch itself trying to workaround PCIe link training failures on hardware such as the ASMedia ASM2824 PCIe switch but in turn opened up this can of worms now being dealt with by Lenovo.

The fixes involve correcting the Linux PCIe hot-plug testing code as well as fixing the reading of the wrong register fields during the PCIe link re-training function.

The patch series is now out for review on the LKML.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week