Announcement

Collapse
No announcement yet.

FreeDesktop.org GitLab Down Due To Drive Failures

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by partcyborg View Post

    In this bug the user overrode sane defaults and wound up setting an odd memory limit of 1MB.... How is this ceph's fault? Also, if you actually read through to the bottom the user got their cluster working again. Where is this expensive data recovery necessary?
    Well, that's why I said "numerous duplicates". Another one when a bad OSD eats all the RAM: https://lists.ceph.io/hyperkitty/lis...MNDVI5ORKWGVY/ https://tracker.ceph.com/issues/53729, and in that case it was not due to bad ceph.conf settings.

    Anyway I am glad that the service has been restored without the need to hire a data recovery specialist.

    Regarding the money, well, because of customer protection, I cannot fully disclose where I have seen the demand of this amount (all I know is that the customer didn't pay that amount to the company that asked, we looked but couldn't fix the cluster, and that 42on has eventually fixed the customer's Ceph cluster for less). But the real problem is that various Ceph courses only the topic of recovering after storage failures or server failures is covered, but not recovering after Ceph bugs.

    Full disclosure: I worked for Croit GmbH in the past.

    Comment


    • #32
      Originally posted by sinepgib View Post

      I heard a different one that goes about "Unix systems are really quick to boot, which is great because you'll be doing that often".
      I have head that about Windows... except it can't be true since Windows systems are so slow to boot that you can't possibly do that often simply due to time restrictions. Besides , Windows sometimes don't finish booting either :P

      http://www.dirtcellar.net

      Comment


      • #33
        Originally posted by waxhead View Post
        I have head that about Windows... except it can't be true since Windows systems are so slow to boot that you can't possibly do that often simply due to time restrictions. Besides , Windows sometimes don't finish booting either :P
        Well, the "don't finish booting" has happened to me on Linux as well. Not as often, mostly back when I used Ubuntu with a few PPAs and distro upgrades would break it.
        Note however the phrase I alluded to was coined back when Unix was a new thing, Lisp machine users would gloat about how their systems didn't need reboots even for changes on the OS. Windows set a different bar about how often you need to reboot lol

        Comment


        • #34
          Originally posted by Paradigm Shifter View Post
          They should, you're right. Sadly, that has not been my experience. The SSDs I've had die on me (both personally and at work) have just gone. There one second and gone the next. Not a specific manufacturer, either; Crucial, Sandisk, OCZ, a couple of never-heard-of-them-before super-cheap 128GB drives
          Yet another vote here for the whole "SSDs go read-only before failing" myth. My experience is the same as yours: multiple manufacturers, models, etc, and every one of them that died did so completely and without warning.

          In fact, I've actually had *more* SSDs die on me than HDDs, *already*, despite using far fewer of them and over a much shorter period than the decades of HDDs. I'm completely in agreement that SSDs are not to be trusted, at all, and if you don't have backups of anything important on them you're almost infinitely more likely to lose it than you are with an HDD.

          Comment

          Working...
          X