Announcement

**cynic** · 29 March 2022, 12:14 AM

Hey Google, I can help you!
I can send you some 1GB 5400RPM spinning rust and solve your problem and you give me back some NVME drives...

thank me later...

**cjcox** · 29 March 2022, 12:34 AM

I think we need more info. I'll admit, I don't have a gazillion NVMe drives, so maybe it is a problem with that specific driver (?). In which case, Google seems to be on top of it.

**kescherPh** · 29 March 2022, 01:10 AM

It seems that lately, a lot of work on making things like this async is going on anyway, so this fits right in.

**abufrejoval** · 29 March 2022, 01:47 AM

That reminds me of a question I asked myself only some days ago, when I was considering what do do with a Linux box that showed the symptoms of "locked-in syndrome": while it had lost network connectivity and wasn't reacting to any keyboard input, some parts of the kernel must have still been alive, because I could see the reguluar blips from the disk activity indicator that hinted at sync() being called every couple of seconds.

The question was: is it better to force a power-off or use the reset button as it clearly wasn't responding to the short power-off?

For SATA HDDs and SSDs the choice used to be clear, AFAIK there is no reset line on the SATA bus so a reset would ensure that there wasn't going to be any device internal data corruption with in-flight buffers being partially written or similar.

But with these NVMe devices I was thinking that there is a reset line on the PCIe bus, which could wreak havoc if the SSD didn't "manage" that intelligently. In that case a power-down might be better, if there was at least some kind of power-fail protection on the device (not that I noticed any caps on the PCB).

I'd be happy to be enlightened and get some background on how reset is being dealt with on NVMe and hardware RAID controllers on a PC bus (yeah, I still have some to manage the spinning rust).

**partcyborg** · 29 March 2022, 02:16 AM

Originally posted by cjcox View Post

I think we need more info. I'll admit, I don't have a gazillion NVMe drives, so maybe it is a problem with that specific driver (?). In which case, Google seems to be on top of it.

What? Linux only has one nvme driver. It's called (shocker!) nvme.

**caligula** · 29 March 2022, 02:45 AM

Wow, so many comments and nobody is wondering why few seconds matter. After all you'll reboot a desktop only once per day, servers have uptime of months. So shouldn't matter even if it takes few hours to boot.

**milkylainen** · 29 March 2022, 03:07 AM

Umm. But WHY is it taking 4.5 seconds?
There is nothing that should take 4.5 seconds on an NVMe drive.
Even if it has a power hold-up cache, it would be flushed rather immediately.
Buggy drive or bugged specification?

**Mark Rose** · 29 March 2022, 03:13 AM

I wonder if this is why EC2 supports a max of 40 volumes with Linux. Perhaps there is a similar issue when booting as well.

Instance volume limits - Amazon Elastic Compute Cloud

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html

The maximum number of Amazon EBS volumes that you can attach to an instance depends on the instance type and instance size. When considering how many volumes to attach to your instance, you should consider whether you need increased I/O bandwidth or increased storage capacity.

**garyb** · 29 March 2022, 03:14 AM

Originally posted by caligula View Post

Wow, so many comments and nobody is wondering why few seconds matter. After all you'll reboot a desktop only once per day, servers have uptime of months. So shouldn't matter even if it takes few hours to boot.

A peek behind Colossus, Google’s file system | Google Cloud Blog

https://cloud.google.com/blog/products/storage-data-transfer/a-peek-behind-colossus-googles-file-system

An overview of Colossus, the file system that underpins Google Cloud’s storage offerings.

Each write is sharded onto many D servers ... imagine if you are running maintenance on a D server which takes a few more seconds, those seconds means more parity is happening to 'fill the blanks', this means more energy is being used and times that by millions of reads/writes you see how time in a server being down, albeit a few more seconds could start to cost money and energy

Now imagine there's a critical vulnerability that needs to be patched on ALL the D servers, quicker reboots = quicker deployment as you won't patch them all at once.

Announcement

Google Has A Problem With Linux Server Reboots Too Slow Due To Too Many NVMe Drives

Google Has A Problem With Linux Server Reboots Too Slow Due To Too Many NVMe Drives

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment