No announcement yet.

Google Has A Problem With Linux Server Reboots Too Slow Due To Too Many NVMe Drives

  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by juarezr View Post

    It looks to me like an issue that is very complex and hard to get right.

    I'm also unsure that only async is the right approach considering the long term. Some questions about priority and concurrency remain yet:
    • What about a driver/task that is using data in an async NVme drive? It will fail? It will force the shutdown process back to a serial process?
    • What is the correct order of shutdown on every Linux box?
      • Apps, users, networks, filesystems, then keyboard/user-input and for last display?
      • Apps, users, keyboard/user-input, filesystems, networks, display, and finally filesystems?
      • Or maybe it will depend if:
        • the system is a smaller Raspberry PI or a larger weather forecast cluster node?
        • the architecture is ARM, x86, MIPS, or RISCV?
        • the system is optimized to sleep/hibernate rather than shutdown/restart?
    Maybe at some time in the future, people will have a passionate discussion about async shutdown vs dependency-based shutdown. Exactly it was the case on "systemd" vs "upstart".

    Commonly the order of deactivation is the inverse order of activation. But in a modern SO like Linux, that has become way too complex because many drivers, hardware, and dependencies can be plugged in at any time by users and processes.

    Also, the myriad of architectures and use cases that Linux supports, ranging from the smallest IoT and embedded devices to enormous servers and clusters requires that the order and the concurrency may be tackled in different ways, depending on the setup.

    I won't be surprised if people further propose a "micro-systemd" or a "mini-upstart" inside the kernel for handling concurrent boot and shutdown before the kernel delegates control to PID 1.
    Async and dependency-based are complementary AFAICT. You do have dependency chains in systemd, for example. But because things are async, when one service needs to wait for other service to go offline first, you can continue working on the services you can shutdown now.
    Regarding "micro-systemd", I don't really see that happening. The kernel is more or less self-contained, which means you can handle that in simpler, less flexible ways internally, and can still rely in pid 1 to shutdown services themselves.


    • #42
      Originally posted by logical View Post

      How about an IBM RS/6000 J40 (microchannel, PowerPC) A full boot with 8 processors took about 45 minutes. A quick boot was 15 minutes. :-)
      Still have a RS/6000 320 and a RS/6000 380 Haven't booted them for a while now. Might give it try to see if they still work with easter . Those are abt. 30 years old now. Will sound like a starfighter taking off.


      • #43
        Originally posted by milkylainen View Post
        Umm. But WHY is it taking 4.5 seconds?
        There is nothing that should take 4.5 seconds on an NVMe drive.
        Even if it has a power hold-up cache, it would be flushed rather immediately.
        Buggy drive or bugged specification?
        My guess is that initialization is done sequentially and you won't see the problem unless you have tons of NVMe drives on a single OS instance. But, I'm just guessing.

        But there's a lot of that sort of stuff in the kernel, but maybe they've gotten around or avoided those things.