Announcement

Collapse
No announcement yet.

Google Has A Problem With Linux Server Reboots Too Slow Due To Too Many NVMe Drives

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CommunityMember
    replied
    Originally posted by dekernel View Post
    Guess I would be interested in knowing just what OS you are using
    In the hyperscaler cloud space one will find the hyperscaler reboots servers (including bare metal servers) all the time, and every minute lost can be (substantial) lost revenue at hyperscale (along with annoyed customers who want their systems available now, now now).

    Leave a comment:


  • CommunityMember
    replied
    Originally posted by F.Ultra View Post

    AFAIK it can't be anything else than a write back cache that causes the NVMe driver to wait so the question you should ask yourself is what crazy sized write back caches that Google uses that takes 4.5 seconds to flush on a NVMe...
    Hyperscalers have hyperscalers issues (of around a terabyte of memory). While my desktop is not going to have a terabyte of memory any time soon, the problems of the HPC and Hyperscalers today will be seen on workstations in a decade or less, so improvements today are beneficial.

    Leave a comment:


  • onlyLinuxLuvUBack
    replied
    echostream packs 108 nvme drives in a 2u server...
    Storage server supports Single Socket AMD EPYC CPU;up to 2TB DDR4 2U with 108x hot-swappable Intel® EDSFF (E1.L) NVMe SSD bays Supports 2x PCIe3 x16 slots and 1x PCIe3 x16 OCP 2.0 card Capable of sustaining up to 300Gbps of bandwidth and 7.8M IOPS

    Leave a comment:


  • cl333r
    replied
    I just realized that with this patch if I buy a second NVMe my PC will shutdown 4.5 seconds faster. So it's not just about cloud servers.

    Leave a comment:


  • F.Ultra
    replied
    Originally posted by milkylainen View Post
    Umm. But WHY is it taking 4.5 seconds?
    There is nothing that should take 4.5 seconds on an NVMe drive.
    Even if it has a power hold-up cache, it would be flushed rather immediately.
    Buggy drive or bugged specification?
    AFAIK it can't be anything else than a write back cache that causes the NVMe driver to wait so the question you should ask yourself is what crazy sized write back caches that Google uses that takes 4.5 seconds to flush on a NVMe...

    Leave a comment:


  • Developer12
    replied
    Google almost certainly have more NVMe drives per server than they're letting on. 16 is probably just enough to cross the 60 second threshold. They're a hyperscalar, they pack things as densely as possible.

    At the same time, this may well be a holdover from their previous generation of servers. It's just only now finally working is way from their prod-kernel into the mainline.

    Leave a comment:


  • logical
    replied
    Originally posted by DRanged View Post
    Ah remember the old days IBM p43 servers booting just within 7 minutes. Those fond memories.
    How about an IBM RS/6000 J40 (microchannel, PowerPC) A full boot with 8 processors took about 45 minutes. A quick boot was 15 minutes. :-)

    Leave a comment:


  • Vistaus
    replied
    Originally posted by caligula View Post
    Wow, so many comments and nobody is wondering why few seconds matter. After all you'll reboot a desktop only once per day, servers have uptime of months. So shouldn't matter even if it takes few hours to boot.
    I reboot my desktop only after a kernel update or if otherwise deemed necessary.

    Leave a comment:


  • dekernel
    replied
    Originally posted by bug77 View Post

    Any rolling release will push updates requiring a reboot on an almost daily basis.
    Good point. I guess I just don't equate "servers" and "rolling releases" in the same sentence for just such reasons.

    Leave a comment:


  • bug77
    replied
    Originally posted by dekernel View Post

    Guess I would be interested in knowing just what OS you are using that requires daily updates because even Windows 2003 could go a full week before required reboots from the Windows Updates.
    Any rolling release will push updates requiring a reboot on an almost daily basis.

    Leave a comment:

Working...
X