Google Has A Problem With Linux Server Reboots Too Slow Due To Too Many NVMe Drives

cynic replied

29 March 2022, 04:01 AM
Originally posted by caligula View Post

Wow, so many comments and nobody is wondering why few seconds matter. After all you'll reboot a desktop only once per day, servers have uptime of months. So shouldn't matter even if it takes few hours to boot.

That's YOUR usecase.

If someone is investing resources to solve what he believes being an issue, then maybe for its usecase it IS an issue.

Stop thinking that your PC and your servers are the whole universe and that there is anything else.
Likes 15
Leave a comment:
chithanh replied

29 March 2022, 03:29 AM
Originally posted by caligula View Post

servers have uptime of months. So shouldn't matter even if it takes few hours to boot.

Depends. If there is some urgent kernel update that runs into the limitations of kernel live patching, then you have to reboot all of your servers in short order. And while servers are rebooting, your server capacity is either much reduced, or you have to draw out your update process over a long time.

Originally posted by paulocoghi View Post

60 seconds are not a "few seconds" for important services that must be online.

Eh, for important services you usually avoid SPOF by running multiple servers anyway, and can reboot them one after another.
Likes 6
Leave a comment:
paulocoghi replied

29 March 2022, 03:16 AM
Originally posted by caligula View Post

Wow, so many comments and nobody is wondering why few seconds matter. After all you'll reboot a desktop only once per day, servers have uptime of months. So shouldn't matter even if it takes few hours to boot.

60 seconds are not a "few seconds" for important services that must be online.

With fast reboot times we get services online faster. The more uptime, the better. Simple, no? Even on clustered scenarios, we make the synchronization faster as well.

The question I make to you is: Why would we want a slower restart time if we can make it faster?
Likes 21
Leave a comment:
garyb replied

29 March 2022, 03:14 AM
Originally posted by caligula View Post

Wow, so many comments and nobody is wondering why few seconds matter. After all you'll reboot a desktop only once per day, servers have uptime of months. So shouldn't matter even if it takes few hours to boot.

A peek behind Colossus, Google’s file system | Google Cloud Blog

https://cloud.google.com/blog/products/storage-data-transfer/a-peek-behind-colossus-googles-file-system

An overview of Colossus, the file system that underpins Google Cloud’s storage offerings.

Each write is sharded onto many D servers ... imagine if you are running maintenance on a D server which takes a few more seconds, those seconds means more parity is happening to 'fill the blanks', this means more energy is being used and times that by millions of reads/writes you see how time in a server being down, albeit a few more seconds could start to cost money and energy

Now imagine there's a critical vulnerability that needs to be patched on ALL the D servers, quicker reboots = quicker deployment as you won't patch them all at once.
Likes 4
Leave a comment:
Mark Rose replied

29 March 2022, 03:13 AM
I wonder if this is why EC2 supports a max of 40 volumes with Linux. Perhaps there is a similar issue when booting as well.

Instance volume limits - Amazon Elastic Compute Cloud

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html

The maximum number of Amazon EBS volumes that you can attach to an instance depends on the instance type and instance size. When considering how many volumes to attach to your instance, you should consider whether you need increased I/O bandwidth or increased storage capacity.
Leave a comment:
milkylainen replied

29 March 2022, 03:07 AM
Umm. But WHY is it taking 4.5 seconds?
There is nothing that should take 4.5 seconds on an NVMe drive.
Even if it has a power hold-up cache, it would be flushed rather immediately.
Buggy drive or bugged specification?
Likes 5
Leave a comment:
caligula replied

29 March 2022, 02:45 AM
Wow, so many comments and nobody is wondering why few seconds matter. After all you'll reboot a desktop only once per day, servers have uptime of months. So shouldn't matter even if it takes few hours to boot.
Likes 1
Leave a comment:
partcyborg replied

29 March 2022, 02:16 AM
Originally posted by cjcox View Post

I think we need more info. I'll admit, I don't have a gazillion NVMe drives, so maybe it is a problem with that specific driver (?). In which case, Google seems to be on top of it.

What? Linux only has one nvme driver. It's called (shocker!) nvme.
Likes 3
Leave a comment:
abufrejoval replied

29 March 2022, 01:47 AM
That reminds me of a question I asked myself only some days ago, when I was considering what do do with a Linux box that showed the symptoms of "locked-in syndrome": while it had lost network connectivity and wasn't reacting to any keyboard input, some parts of the kernel must have still been alive, because I could see the reguluar blips from the disk activity indicator that hinted at sync() being called every couple of seconds.

The question was: is it better to force a power-off or use the reset button as it clearly wasn't responding to the short power-off?

For SATA HDDs and SSDs the choice used to be clear, AFAIK there is no reset line on the SATA bus so a reset would ensure that there wasn't going to be any device internal data corruption with in-flight buffers being partially written or similar.

But with these NVMe devices I was thinking that there is a reset line on the PCIe bus, which could wreak havoc if the SSD didn't "manage" that intelligently. In that case a power-down might be better, if there was at least some kind of power-fail protection on the device (not that I noticed any caps on the PCB).

I'd be happy to be enlightened and get some background on how reset is being dealt with on NVMe and hardware RAID controllers on a PC bus (yeah, I still have some to manage the spinning rust).
Leave a comment:
kescherPh replied

29 March 2022, 01:10 AM
It seems that lately, a lot of work on making things like this async is going on anyway, so this fits right in.
Likes 6
Leave a comment:

Announcement

Google Has A Problem With Linux Server Reboots Too Slow Due To Too Many NVMe Drives

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: