Announcement

Collapse
No announcement yet.

Google Has A Problem With Linux Server Reboots Too Slow Due To Too Many NVMe Drives

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Rebooting my home server takes about 5 minutes. Anything to speed that up is welcome...

    Originally posted by mdedetrich View Post
    I notice this somewhat frequently, for example one off the top of my head is that NetworkManager when its connecting/disconnecting to different networks it appears to block/freeze the UI. Being predominantly written in C also didn't help in this regard because doing this kind of programming in C is really hard (languages like C++/Rust allow you to provide higher level abstractions in libraries to simplify this a lot).
    That would be a problem of the UI... NetworkManager is a daemon, and communicates asynchronously over D-BUS. They are different programs.

    Comment


    • #22
      Originally posted by Espionage724 View Post
      I don't know what Google does for their servers, but my servers reboot daily for updates and I'd gladly take any free speed improvements.
      Guess I would be interested in knowing just what OS you are using that requires daily updates because even Windows 2003 could go a full week before required reboots from the Windows Updates.

      Comment


      • #23
        Originally posted by chithanh View Post
        Eh, for important services you usually avoid SPOF by running multiple servers anyway, and can reboot them one after another.
        I agree and this is the reason I mentioned clustered scenarios as well, since it's still better to the health of any cluster to have a member back online earlier, even when all services are clustered and high-available.

        If we can reboot faster without compromising safety, why on earth someone would choose the slow option?

        Originally posted by bug77 View Post

        Those services are not tied to a physical machine, they just move to another running instance.
        I am aware of this and still, at the same time, in favor of faster reboot times. Again, why not?

        Comment


        • #24
          Originally posted by dekernel View Post

          Guess I would be interested in knowing just what OS you are using that requires daily updates because even Windows 2003 could go a full week before required reboots from the Windows Updates.
          Any rolling release will push updates requiring a reboot on an almost daily basis.

          Comment


          • #25
            Originally posted by bug77 View Post

            Any rolling release will push updates requiring a reboot on an almost daily basis.
            Good point. I guess I just don't equate "servers" and "rolling releases" in the same sentence for just such reasons.

            Comment


            • #26
              Originally posted by caligula View Post
              Wow, so many comments and nobody is wondering why few seconds matter. After all you'll reboot a desktop only once per day, servers have uptime of months. So shouldn't matter even if it takes few hours to boot.
              I reboot my desktop only after a kernel update or if otherwise deemed necessary.

              Comment


              • #27
                Originally posted by DRanged View Post
                Ah remember the old days IBM p43 servers booting just within 7 minutes. Those fond memories.
                How about an IBM RS/6000 J40 (microchannel, PowerPC) A full boot with 8 processors took about 45 minutes. A quick boot was 15 minutes. :-)

                Comment


                • #28
                  Google almost certainly have more NVMe drives per server than they're letting on. 16 is probably just enough to cross the 60 second threshold. They're a hyperscalar, they pack things as densely as possible.

                  At the same time, this may well be a holdover from their previous generation of servers. It's just only now finally working is way from their prod-kernel into the mainline.

                  Comment


                  • #29
                    Originally posted by milkylainen View Post
                    Umm. But WHY is it taking 4.5 seconds?
                    There is nothing that should take 4.5 seconds on an NVMe drive.
                    Even if it has a power hold-up cache, it would be flushed rather immediately.
                    Buggy drive or bugged specification?
                    AFAIK it can't be anything else than a write back cache that causes the NVMe driver to wait so the question you should ask yourself is what crazy sized write back caches that Google uses that takes 4.5 seconds to flush on a NVMe...

                    Comment


                    • #30
                      I just realized that with this patch if I buy a second NVMe my PC will shutdown 4.5 seconds faster. So it's not just about cloud servers.

                      Comment

                      Working...
                      X