FreeDesktop.org GitLab Down Due To Drive Failures

patrakov

Phoronix Member

Join Date: Mar 2015

Posts: 98
- Share
- Tweet
#31

14 June 2022, 01:42 AM

Originally posted by partcyborg View Post

In this bug the user overrode sane defaults and wound up setting an odd memory limit of 1MB.... How is this ceph's fault? Also, if you actually read through to the bottom the user got their cluster working again. Where is this expensive data recovery necessary?

Well, that's why I said "numerous duplicates". Another one when a bad OSD eats all the RAM: https://lists.ceph.io/hyperkitty/lis...MNDVI5ORKWGVY/ https://tracker.ceph.com/issues/53729, and in that case it was not due to bad ceph.conf settings.

Anyway I am glad that the service has been restored without the need to hire a data recovery specialist.

Regarding the money, well, because of customer protection, I cannot fully disclose where I have seen the demand of this amount (all I know is that the customer didn't pay that amount to the company that asked, we looked but couldn't fix the cluster, and that 42on has eventually fixed the customer's Ceph cluster for less). But the real problem is that various Ceph courses only the topic of recovering after storage failures or server failures is covered, but not recovering after Ceph bugs.

Full disclosure: I worked for Croit GmbH in the past.
Comment
waxhead

Premium For Life

Join Date: Jul 2014

Posts: 1003
- Share
- Tweet
#32

15 June 2022, 05:37 PM

Originally posted by sinepgib View Post

I heard a different one that goes about "Unix systems are really quick to boot, which is great because you'll be doing that often".

I have head that about Windows... except it can't be true since Windows systems are so slow to boot that you can't possibly do that often simply due to time restrictions. Besides , Windows sometimes don't finish booting either :P

http://www.dirtcellar.net
Likes 1
Comment
sinepgib

Senior Member

Join Date: Aug 2021

Posts: 1093
- Share
- Tweet
#33

15 June 2022, 05:54 PM

Originally posted by waxhead View Post

I have head that about Windows... except it can't be true since Windows systems are so slow to boot that you can't possibly do that often simply due to time restrictions. Besides , Windows sometimes don't finish booting either :P

Well, the "don't finish booting" has happened to me on Linux as well. Not as often, mostly back when I used Ubuntu with a few PPAs and distro upgrades would break it.
Note however the phrase I alluded to was coined back when Unix was a new thing, Lisp machine users would gloat about how their systems didn't need reboots even for changes on the OS. Windows set a different bar about how often you need to reboot lol
Comment
arQon

Senior Member

Join Date: Sep 2019

Posts: 940
- Share
- Tweet
#34

16 June 2022, 05:58 PM

Originally posted by Paradigm Shifter View Post

They should, you're right. Sadly, that has not been my experience. The SSDs I've had die on me (both personally and at work) have just gone. There one second and gone the next. Not a specific manufacturer, either; Crucial, Sandisk, OCZ, a couple of never-heard-of-them-before super-cheap 128GB drives

Yet another vote here for the whole "SSDs go read-only before failing" myth. My experience is the same as yours: multiple manufacturers, models, etc, and every one of them that died did so completely and without warning.

In fact, I've actually had *more* SSDs die on me than HDDs, *already*, despite using far fewer of them and over a much shorter period than the decades of HDDs. I'm completely in agreement that SSDs are not to be trusted, at all, and if you don't have backups of anything important on them you're almost infinitely more likely to lose it than you are with an HDD.
Comment

Announcement

FreeDesktop.org GitLab Down Due To Drive Failures

Comment

Comment

Comment

Comment