Announcement

Collapse
No announcement yet.

KDE Almost Lost All Of Their Git Repositories

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    http://jefferai.org/2013/03/24/screw-the-mirrors/ another update (talks about backup strategies in place), should also have been linked from the original article.

    Oh, and reading those blog posts isn't enough, UNDERSTANDING them is important (I didn't).

    Comment


    • #12
      Originally posted by ryao View Post
      Unfortunately, proper backups would not have helped as much as one would think without a way to know whether or not the data was bad prior to doing them. It is not clear when the repositories became corrupted, although the blog post suggests that fsck was the trigger.
      Proper backups would have been a Plan B, in case all the mirroring that they were doing got corrupted. Yes, the backups might have been corrupted as well, for the last month. That still would have given them a proper backup from a month ago, and then they could have just updated the files from the latest tarballs and just lost the last month's git history.

      That would have been pretty bad - but not as bad as losing the entire git history.

      Anyway, the sysadmin whose blog was linked discussed zfs, and wants to use it.

      Comment


      • #13
        Originally posted by smitty3268 View Post
        Proper backups would have been a Plan B, in case all the mirroring that they were doing got corrupted. Yes, the backups might have been corrupted as well, for the last month. That still would have given them a proper backup from a month ago, and then they could have just updated the files from the latest tarballs and just lost the last month's git history.

        That would have been pretty bad - but not as bad as losing the entire git history.

        Anyway, the sysadmin whose blog was linked discussed zfs, and wants to use it.
        What if the corruption occurred 2 months ago? Honestly, there is usually no generic way to tell if a traditional backup is sane. If you do have a way to tell that a backup is sane and you need to go back months to fix things, then you really are losing quite a bit. Just like ZFS is not a substitute for backups, backups are not a substitute for early detection. With that said, I question why anyone who has an integrity check that could detect bad backups would not run it at the time of the backup.

        Also, I just saw that Jeff Mitchell plans to deploy ZFS to prevent a recurrence of this. That is cool.
        Last edited by ryao; 03-25-2013, 03:17 PM.

        Comment


        • #14
          Originally posted by ryao View Post
          What if the corruption occurred 2 months ago?
          Which is why your backups should go back farther. Use daily backups for a couple weeks, then weekly beyond that, then monthly, and so on. It doesn't take that much backup space to keep a yearly backup as far as when the project started. You just have to throw away the intervening backups at sane intervals.

          Honestly, there is usually no generic way to tell if a traditional backup is sane. If you do have a way to tell that a backup is sane and you need to go back months to fix things, then you really are losing quite a bit.
          Indeed. But less than if you don't have any backups at all. It is simply a mitigation strategy, to be used if all else has failed.

          Just like ZFS is not a substitute for backups, backups are not a substitute for early detection. With that said, I question why anyone who has an integrity check that could detect bad backups would not run it at the time of the backup.
          Yes. Integrity checks can potentially take too long to run at every backup point (depending on the data and software/hardware available), but you should at least run them occasionally. Once a week on those backups, if you can't afford to do it more often.

          I guess the issue here is that they thought they were running integrity checks but it turned out that wasn't happening.
          Last edited by smitty3268; 03-25-2013, 03:31 PM.

          Comment


          • #15
            Originally posted by ryao View Post
            Unfortunately, proper backups would not have helped as much as one would think without a way to know whether or not the data was bad prior to doing them. It is not clear when the repositories became corrupted, although the blog post suggests that fsck was the trigger.
            While true, it's still better than the current setup. Even if they'd been getting corruption for a few months before it was noticed, a good backup setup would still let them go back that far, and they can work out how to reconstruct subsequent activity on top of the restored repository. And that's better than the current setup, where they survived only out of pure luck, apparently in the belief that having redundancy from the mirror system is good enough. Wrong!

            Comment


            • #16
              Originally posted by Delgarde View Post
              While true, it's still better than the current setup. Even if they'd been getting corruption for a few months before it was noticed, a good backup setup would still let them go back that far, and they can work out how to reconstruct subsequent activity on top of the restored repository. And that's better than the current setup, where they survived only out of pure luck, apparently in the belief that having redundancy from the mirror system is good enough. Wrong!
              From what I have read, they actually do regular tarball backups of the repository. Still, losing days of commits is not fun. They were extremely lucky that one server had a glitch that kept it from updating from master during the window that it was serving corrupted updates. That being said, the entire incident would have been avoided had they put master on ZFS.

              Comment


              • #17
                Originally posted by birdie View Post
                Yeah, Linux and Open Source are shaky.

                In fact I had an ext4 corruption on a partition I mount RO daily and remount RW maybe once a week to write a file or two.
                This is bunch of some morons bullshit. Furthermore, Linux is the most stable and reliable OS (that matters, I don't care about some casio watch "operating systems"). Winblows just blows up. Linux will have btrfs and Linux can use ZFS as well, while you can only dream about them on Windows.
                Last edited by Pawlerson; 03-25-2013, 06:10 PM.

                Comment


                • #18
                  Originally posted by Pawlerson View Post
                  This is bunch of some morons bullshit. Furthermore, Linux is the most stable and reliable OS (that matters, I don't care about some casio watch "operating systems"). Winblows just blows up.
                  Solaris is more reliable. This would not have happened had the master mirror been running a recent installation of Solaris.

                  Comment


                  • #19
                    Originally posted by ryao View Post
                    Solaris is more reliable. This would not have happened had the master mirror been running a recent installation of Solaris.
                    You've got to be kidding me. It's a dead cow. Tell me why nearly nobody is using it? Btw. what are you doing for Gentoo? Last time you were trolling for bsd and now you're trolling for slowlaris. Get the facts till you write another bullshit next time:

                    http://unixetc.co.uk/2012/01/22/zfs-...nlinked-files/
                    https://forums.oracle.com/forums/thr...art=0&tstart=0 unreliable slowlaris and zfs (somebody should tell KDE devs to not use it).
                    Last edited by Pawlerson; 03-25-2013, 06:27 PM.

                    Comment


                    • #20
                      The Punishment, Hodja Nasreddin

                      Hodja told his son to go get some water from the well.
                      Before the son left, Hodja slapped him and shouted, ''And make sure you don’t break the jug!''

                      The boy began crying, and a bystander noticed this and said,
                      ''Why did you hit him? He hasn’t done anything wrong.''

                      Hodja replied, ''Well, better to hit him now
                      than to hit him afterwards if he does end up breaking it. That would be too late.''




                      I would ask Hodja Nasreddin to slap birdie and ryao right now.

                      Twice.

                      One time - for not acting before.
                      Second time - for posting "wisdom from the manhole" when its too late.
                      Last edited by brosis; 03-25-2013, 06:23 PM.

                      Comment

                      Working...
                      X