Google Has A Problem With Linux Server Reboots Too Slow Due To Too Many NVMe Drives
Google engineers are proposing an asynchronous shutdown interface for the Linux kernel. Currently the Linux kernel's shutdown APIs at the bus level are synchronous, which can cause problems like Google reports with having too many NVMe storage drives in a single server. Due to the synchronous nature during the shutdown handling, each NVMe drive can take about 4.5 seconds to shutdown. With Google servers now having 16+ NVMe devices, that can mean an extra minute to shutdown and go through the reboot phase... With the asynchronous shutdown interface and adapting the NVMe driver to use it, their reboots -- and ultimately the amount of server down time -- can be easily reduced by one minute.
The proposed patches from Google allow for an optional asynchronous shutdown interface at the bus level. The new interface maintains backwards compatibility with the synchronous implementation. As part of the patches, all PCI Express based devices are moved to use the async interface, implements the changes at the PCIe level, and then the changes to the NVMe driver to exploit the async shutdown interface.
This proposed async shutdown interface in current form is just around one hundred lines of new code, granted, just one driver making use of it at the moment. But with modern high performance Linux servers continuing to add in more NVMe drives and other PCIe devices where the Linux kernel's synchronous shutdown interface can mean extra downtime, hopefully these patches will manage to move ahead and mainline in short order along with adapting more drivers to make use of it.