Announcement

Collapse
No announcement yet.

KDE Almost Lost All Of Their Git Repositories

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    http://jefferai.org/2013/03/24/screw-the-mirrors/ another update (talks about backup strategies in place), should also have been linked from the original article.

    Oh, and reading those blog posts isn't enough, UNDERSTANDING them is important (I didn't).

    Comment


    • #12
      Originally posted by ryao View Post
      Unfortunately, proper backups would not have helped as much as one would think without a way to know whether or not the data was bad prior to doing them. It is not clear when the repositories became corrupted, although the blog post suggests that fsck was the trigger.
      Proper backups would have been a Plan B, in case all the mirroring that they were doing got corrupted. Yes, the backups might have been corrupted as well, for the last month. That still would have given them a proper backup from a month ago, and then they could have just updated the files from the latest tarballs and just lost the last month's git history.

      That would have been pretty bad - but not as bad as losing the entire git history.

      Anyway, the sysadmin whose blog was linked discussed zfs, and wants to use it.

      Comment


      • #13
        Originally posted by smitty3268 View Post
        Proper backups would have been a Plan B, in case all the mirroring that they were doing got corrupted. Yes, the backups might have been corrupted as well, for the last month. That still would have given them a proper backup from a month ago, and then they could have just updated the files from the latest tarballs and just lost the last month's git history.

        That would have been pretty bad - but not as bad as losing the entire git history.

        Anyway, the sysadmin whose blog was linked discussed zfs, and wants to use it.
        What if the corruption occurred 2 months ago? Honestly, there is usually no generic way to tell if a traditional backup is sane. If you do have a way to tell that a backup is sane and you need to go back months to fix things, then you really are losing quite a bit. Just like ZFS is not a substitute for backups, backups are not a substitute for early detection. With that said, I question why anyone who has an integrity check that could detect bad backups would not run it at the time of the backup.

        Also, I just saw that Jeff Mitchell plans to deploy ZFS to prevent a recurrence of this. That is cool.
        Last edited by ryao; 03-25-2013, 03:17 PM.

        Comment


        • #14
          Originally posted by ryao View Post
          What if the corruption occurred 2 months ago?
          Which is why your backups should go back farther. Use daily backups for a couple weeks, then weekly beyond that, then monthly, and so on. It doesn't take that much backup space to keep a yearly backup as far as when the project started. You just have to throw away the intervening backups at sane intervals.

          Honestly, there is usually no generic way to tell if a traditional backup is sane. If you do have a way to tell that a backup is sane and you need to go back months to fix things, then you really are losing quite a bit.
          Indeed. But less than if you don't have any backups at all. It is simply a mitigation strategy, to be used if all else has failed.

          Just like ZFS is not a substitute for backups, backups are not a substitute for early detection. With that said, I question why anyone who has an integrity check that could detect bad backups would not run it at the time of the backup.
          Yes. Integrity checks can potentially take too long to run at every backup point (depending on the data and software/hardware available), but you should at least run them occasionally. Once a week on those backups, if you can't afford to do it more often.

          I guess the issue here is that they thought they were running integrity checks but it turned out that wasn't happening.
          Last edited by smitty3268; 03-25-2013, 03:31 PM.

          Comment


          • #15
            Originally posted by ryao View Post
            Unfortunately, proper backups would not have helped as much as one would think without a way to know whether or not the data was bad prior to doing them. It is not clear when the repositories became corrupted, although the blog post suggests that fsck was the trigger.
            While true, it's still better than the current setup. Even if they'd been getting corruption for a few months before it was noticed, a good backup setup would still let them go back that far, and they can work out how to reconstruct subsequent activity on top of the restored repository. And that's better than the current setup, where they survived only out of pure luck, apparently in the belief that having redundancy from the mirror system is good enough. Wrong!

            Comment


            • #16
              Originally posted by Delgarde View Post
              While true, it's still better than the current setup. Even if they'd been getting corruption for a few months before it was noticed, a good backup setup would still let them go back that far, and they can work out how to reconstruct subsequent activity on top of the restored repository. And that's better than the current setup, where they survived only out of pure luck, apparently in the belief that having redundancy from the mirror system is good enough. Wrong!
              From what I have read, they actually do regular tarball backups of the repository. Still, losing days of commits is not fun. They were extremely lucky that one server had a glitch that kept it from updating from master during the window that it was serving corrupted updates. That being said, the entire incident would have been avoided had they put master on ZFS.

              Comment


              • #17
                Originally posted by Pawlerson View Post
                This is bunch of some morons bullshit. Furthermore, Linux is the most stable and reliable OS (that matters, I don't care about some casio watch "operating systems"). Winblows just blows up.
                Solaris is more reliable. This would not have happened had the master mirror been running a recent installation of Solaris.

                Comment


                • #18
                  The Punishment, Hodja Nasreddin

                  Hodja told his son to go get some water from the well.
                  Before the son left, Hodja slapped him and shouted, ''And make sure you don’t break the jug!''

                  The boy began crying, and a bystander noticed this and said,
                  ''Why did you hit him? He hasn’t done anything wrong.''

                  Hodja replied, ''Well, better to hit him now
                  than to hit him afterwards if he does end up breaking it. That would be too late.''




                  I would ask Hodja Nasreddin to slap birdie and ryao right now.

                  Twice.

                  One time - for not acting before.
                  Second time - for posting "wisdom from the manhole" when its too late.
                  Last edited by brosis; 03-25-2013, 06:23 PM.

                  Comment


                  • #19
                    Originally posted by Pawlerson View Post
                    You've got to be kidding me. It's a dead cow. Tell me why nearly nobody is using it? Btw. what are you doing for Gentoo? Last time you were trolling for bsd and now you're trolling for slowlaris.
                    I am not sure how you determined that Solaris' marketshare. Anyway, the open source version of Solaris is fairly popular at various data centers. Its kernel is by far better written from a reliability stand point than others that I have seen.

                    As for Gentoo, I tend to be all over the place, although the majority of the things that I do involve the kernel in some way. The most of the time that I spend on kernel stuff is spent on ZFSOnLinux, although I touch other areas too. Yesterday, I spent some time with another Gentoo developer on Nouveau reclocking support. In specific, we are now able to reclock both desktop and laptop versions of the NV92. The patch is not ready for upstream yet though.

                    Comment


                    • #20
                      Originally posted by Pawlerson View Post
                      Simply, by netcraft, for example. It's kernel is real mess and it's bloated as hell. As I have shown it's not reliable at all. So called open source version of solaris is at even worse state than solaris itself, so stop that bullshit right now. Nobody serious is using "open source" slowlaris and there are few (who didn't switch to Linux yet) that are using Oracle's Solaris.
                      Netcraft surveys webservers. It would not catch the servers powering the Joyent cloud or any of the data analytics servers out there. It also would not catch things like Netflix's CDN, which runs FreeBSD.

                      Originally posted by Pawlerson View Post
                      You've got to be kidding me. It's a dead cow. Tell me why nearly nobody is using it? Btw. what are you doing for Gentoo? Last time you were trolling for bsd and now you're trolling for slowlaris. Get the facts till you write another bullshit next time:

                      http://unixetc.co.uk/2012/01/22/zfs-...nlinked-files/
                      https://forums.oracle.com/forums/thr...art=0&tstart=0 unreliable slowlaris and zfs (somebody should tell KDE devs to not use it).
                      It looks like you amended your post. Honestly, the latter link does not show any problem whatsoever. The former does talk about a problem, but it does not involve data loss and in all likelihood, it has been fixed.

                      If you think that a few random posts on the internet reflect the quality of an operating system, then you really should not use Linux. Google can provide you with numerous posts complaining about Linux-based operating systems. I regularly find problems in code common to all GNU/Linux operating systems in Gentoo.
                      Last edited by ryao; 03-25-2013, 06:43 PM.

                      Comment

                      Working...
                      X