Announcement

Collapse
No announcement yet.

Google Has A Problem With Linux Server Reboots Too Slow Due To Too Many NVMe Drives

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Google Has A Problem With Linux Server Reboots Too Slow Due To Too Many NVMe Drives

    Phoronix: Google Has A Problem With Linux Server Reboots Too Slow Due To Too Many NVMe Drives

    Hyperscaler problems these days? Linux servers taking too long to reboot due to having too many NVMe drives. Thankfully Google is working on an improvement to address this where some of their many-drive servers can take more than one minute for the Linux kernel to carry out its shutdown tasks while this work may benefit other users too albeit less notably...

    https://www.phoronix.com/scan.php?pa...-Too-Many-NVMe

  • #2
    Hey Google, I can help you!
    I can send you some 1GB 5400RPM spinning rust and solve your problem and you give me back some NVME drives...

    thank me later...

    Comment


    • #3
      I think we need more info. I'll admit, I don't have a gazillion NVMe drives, so maybe it is a problem with that specific driver (?). In which case, Google seems to be on top of it.

      Comment


      • #4
        It seems that lately, a lot of work on making things like this async is going on anyway, so this fits right in.

        Comment


        • #5
          That reminds me of a question I asked myself only some days ago, when I was considering what do do with a Linux box that showed the symptoms of "locked-in syndrome": while it had lost network connectivity and wasn't reacting to any keyboard input, some parts of the kernel must have still been alive, because I could see the reguluar blips from the disk activity indicator that hinted at sync() being called every couple of seconds.

          The question was: is it better to force a power-off or use the reset button as it clearly wasn't responding to the short power-off?

          For SATA HDDs and SSDs the choice used to be clear, AFAIK there is no reset line on the SATA bus so a reset would ensure that there wasn't going to be any device internal data corruption with in-flight buffers being partially written or similar.

          But with these NVMe devices I was thinking that there is a reset line on the PCIe bus, which could wreak havoc if the SSD didn't "manage" that intelligently. In that case a power-down might be better, if there was at least some kind of power-fail protection on the device (not that I noticed any caps on the PCB).

          I'd be happy to be enlightened and get some background on how reset is being dealt with on NVMe and hardware RAID controllers on a PC bus (yeah, I still have some to manage the spinning rust).

          Comment


          • #6
            Originally posted by cjcox View Post
            I think we need more info. I'll admit, I don't have a gazillion NVMe drives, so maybe it is a problem with that specific driver (?). In which case, Google seems to be on top of it.
            What? Linux only has one nvme driver. It's called (shocker!) nvme.

            Comment


            • #7
              Wow, so many comments and nobody is wondering why few seconds matter. After all you'll reboot a desktop only once per day, servers have uptime of months. So shouldn't matter even if it takes few hours to boot.

              Comment


              • #8
                Umm. But WHY is it taking 4.5 seconds?
                There is nothing that should take 4.5 seconds on an NVMe drive.
                Even if it has a power hold-up cache, it would be flushed rather immediately.
                Buggy drive or bugged specification?

                Comment


                • #9
                  I wonder if this is why EC2 supports a max of 40 volumes with Linux. Perhaps there is a similar issue when booting as well.

                  https://docs.aws.amazon.com/AWSEC2/l...me_limits.html

                  Comment


                  • #10
                    Originally posted by caligula View Post
                    Wow, so many comments and nobody is wondering why few seconds matter. After all you'll reboot a desktop only once per day, servers have uptime of months. So shouldn't matter even if it takes few hours to boot.
                    https://cloud.google.com/blog/produc...es-file-system

                    Each write is sharded onto many D servers ... imagine if you are running maintenance on a D server which takes a few more seconds, those seconds means more parity is happening to 'fill the blanks', this means more energy is being used and times that by millions of reads/writes you see how time in a server being down, albeit a few more seconds could start to cost money and energy

                    Now imagine there's a critical vulnerability that needs to be patched on ALL the D servers, quicker reboots = quicker deployment as you won't patch them all at once.

                    Comment

                    Working...
                    X