How Cloudflare Updates The BIOS & Firmware Across Thousands Of Servers
For those wondering how Cloudflare keeps their thousands of servers around the world up-to-date for the latest BIOS and firmware, Cloudflare's engineering blog has put out an interesting post that outlines their process of handling system BIOS updates as well as various other firmware updates.
With BIOS/firmware updates often being done in the name of security, they are dedicated to ensuring their servers are on the latest versions. While there exists the likes of the Linux Vendor Firmware Service (LVFS) and Fwupd for handling firmware updates, sadly, they aren't using that but rather their own set of solutions.
It's too bad they hadn't thrown their weight behind the likes of LVFS/fwupd for encouraging more vendor adoption, but Cloudflare's current approach amounts to maintaining a set of homegrown scripts.
System BIOS/firmware updates are done via an iPXE boot script. Cloudflare reboots their servers monthly (or sooner if needed for security reasons) and so they rely on an iPXE boot script that queries the current UEFI BIOS version. If that BIOS version is out of date, it proceeds to download and flash against the latest specified version using UEFI. If the system BIOS is up-to-date, it proceeds to boot the Linux kernel.
For keeping the BMC, network cards, and other component firmware up-to-date is where things get a bit messier. Cloudflare leverages a "pre-flight" systemd service to run on boot that checks various firmware versions and then proceeds to take the different steps to flash the component firmware when necessary.
Those interested in learning more about Cloudflare's firmware flashing adventures across their thousands of servers can see this blog post.
With BIOS/firmware updates often being done in the name of security, they are dedicated to ensuring their servers are on the latest versions. While there exists the likes of the Linux Vendor Firmware Service (LVFS) and Fwupd for handling firmware updates, sadly, they aren't using that but rather their own set of solutions.
It's too bad they hadn't thrown their weight behind the likes of LVFS/fwupd for encouraging more vendor adoption, but Cloudflare's current approach amounts to maintaining a set of homegrown scripts.
System BIOS/firmware updates are done via an iPXE boot script. Cloudflare reboots their servers monthly (or sooner if needed for security reasons) and so they rely on an iPXE boot script that queries the current UEFI BIOS version. If that BIOS version is out of date, it proceeds to download and flash against the latest specified version using UEFI. If the system BIOS is up-to-date, it proceeds to boot the Linux kernel.
For keeping the BMC, network cards, and other component firmware up-to-date is where things get a bit messier. Cloudflare leverages a "pre-flight" systemd service to run on boot that checks various firmware versions and then proceeds to take the different steps to flash the component firmware when necessary.
Those interested in learning more about Cloudflare's firmware flashing adventures across their thousands of servers can see this blog post.
10 Comments