Announcement

Collapse
No announcement yet.

KDE Almost Lost All Of Their Git Repositories

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • KDE Almost Lost All Of Their Git Repositories

    Phoronix: KDE Almost Lost All Of Their Git Repositories

    There was almost "The Great KDE Disaster Of 2013" when the KDE project almost lost all of their 1,500+ Git repositories...

    http://www.phoronix.com/vr.php?view=MTMzNTc

  • #2
    Poor funkSTAR

    Originally posted by phoronix View Post
    Phoronix: KDE Almost Lost All Of Their Git Repositories

    There was almost "The Great KDE Disaster Of 2013" when the KDE project almost lost all of their 1,500+ Git repositories...

    http://www.phoronix.com/vr.php?view=MTMzNTc
    funkSTAR was THIS CLOSE to getting everything he ever wanted!

    Actually, while it would have been a huge PITA, there are enough out-of-repository copies of the source floating around so that I'm sure they could have rebuilt the repos to at least the last-released state of the code base. Still not good, but not lost forever either.

    Comment


    • #3
      Yeah, Linux and Open Source are shaky.

      In fact I had an ext4 corruption on a partition I mount RO daily and remount RW maybe once a week to write a file or two.

      Comment


      • #4
        Go suck horse cocks with your mother, birdie!

        DO NOT FEED THE TROLL PLEASE

        Comment


        • #5
          This is the silent corruption that ZFS was created to prevent. Unfortunately, KDE's servers were not using it.

          Comment


          • #6
            Originally posted by ryao View Post
            This is the silent corruption that ZFS was created to prevent. Unfortunately, KDE's servers were not using it.
            The real problem KDE sysadms overlooked is that mirroring is not backing up. ZFS wouldn't have saved them if their HDDs totally failed - ZFS is good at spotting hardware errors - that's true. But ZFS is not a magic bullet.

            Comment


            • #7
              Originally posted by birdie View Post
              The real problem KDE sysadms overlooked is that mirroring is not backing up. ZFS wouldn't have saved them if their HDDs totally failed - ZFS is good at spotting hardware errors - that's true. But ZFS is not a magic bullet.
              It is possible that the backups themselves would have been corrupted had the administrators been doing proper backups and these issues predated the fsck. In this situation, silent corruption caused a problem occurred because there was no way to recover after its effects had been felt.

              ZFS is not a replacement for proper backups, but it makes doing them easier. End-to-end checksumming and self-healing would have prevented this mess before it happened had ZFS been deployed.
              Last edited by ryao; 03-25-2013, 02:44 PM.

              Comment


              • #8
                Originally posted by phoronix View Post
                Phoronix: KDE Almost Lost All Of Their Git Repositories

                There was almost "The Great KDE Disaster Of 2013" when the KDE project almost lost all of their 1,500+ Git repositories...

                http://www.phoronix.com/vr.php?view=MTMzNTc
                DO NOT COMMENT ON THIS THREAD UNLESS YOU READ BOTH THE BLOG POST AND THE FOLLOWUP. A lot of questions get answered in both that michael skimmed over or didn't mention.

                Comment


                • #9
                  Originally posted by Ericg View Post
                  DO NOT COMMENT ON THIS THREAD UNLESS YOU READ BOTH THE BLOG POST AND THE FOLLOWUP. A lot of questions get answered in both that michael skimmed over or didn't mention.
                  I've read both - they didn't have proper backups. Period.

                  Comment


                  • #10
                    Originally posted by birdie View Post
                    I've read both - they didn't have proper backups. Period.
                    Unfortunately, proper backups would not have helped as much as one would think without a way to know whether or not the data was bad prior to doing them. It is not clear when the repositories became corrupted, although the blog post suggests that fsck was the trigger.

                    Comment


                    • #11
                      http://jefferai.org/2013/03/24/screw-the-mirrors/ another update (talks about backup strategies in place), should also have been linked from the original article.

                      Oh, and reading those blog posts isn't enough, UNDERSTANDING them is important (I didn't).

                      Comment


                      • #12
                        Originally posted by ryao View Post
                        Unfortunately, proper backups would not have helped as much as one would think without a way to know whether or not the data was bad prior to doing them. It is not clear when the repositories became corrupted, although the blog post suggests that fsck was the trigger.
                        Proper backups would have been a Plan B, in case all the mirroring that they were doing got corrupted. Yes, the backups might have been corrupted as well, for the last month. That still would have given them a proper backup from a month ago, and then they could have just updated the files from the latest tarballs and just lost the last month's git history.

                        That would have been pretty bad - but not as bad as losing the entire git history.

                        Anyway, the sysadmin whose blog was linked discussed zfs, and wants to use it.

                        Comment


                        • #13
                          Originally posted by smitty3268 View Post
                          Proper backups would have been a Plan B, in case all the mirroring that they were doing got corrupted. Yes, the backups might have been corrupted as well, for the last month. That still would have given them a proper backup from a month ago, and then they could have just updated the files from the latest tarballs and just lost the last month's git history.

                          That would have been pretty bad - but not as bad as losing the entire git history.

                          Anyway, the sysadmin whose blog was linked discussed zfs, and wants to use it.
                          What if the corruption occurred 2 months ago? Honestly, there is usually no generic way to tell if a traditional backup is sane. If you do have a way to tell that a backup is sane and you need to go back months to fix things, then you really are losing quite a bit. Just like ZFS is not a substitute for backups, backups are not a substitute for early detection. With that said, I question why anyone who has an integrity check that could detect bad backups would not run it at the time of the backup.

                          Also, I just saw that Jeff Mitchell plans to deploy ZFS to prevent a recurrence of this. That is cool.
                          Last edited by ryao; 03-25-2013, 03:17 PM.

                          Comment


                          • #14
                            Originally posted by ryao View Post
                            What if the corruption occurred 2 months ago?
                            Which is why your backups should go back farther. Use daily backups for a couple weeks, then weekly beyond that, then monthly, and so on. It doesn't take that much backup space to keep a yearly backup as far as when the project started. You just have to throw away the intervening backups at sane intervals.

                            Honestly, there is usually no generic way to tell if a traditional backup is sane. If you do have a way to tell that a backup is sane and you need to go back months to fix things, then you really are losing quite a bit.
                            Indeed. But less than if you don't have any backups at all. It is simply a mitigation strategy, to be used if all else has failed.

                            Just like ZFS is not a substitute for backups, backups are not a substitute for early detection. With that said, I question why anyone who has an integrity check that could detect bad backups would not run it at the time of the backup.
                            Yes. Integrity checks can potentially take too long to run at every backup point (depending on the data and software/hardware available), but you should at least run them occasionally. Once a week on those backups, if you can't afford to do it more often.

                            I guess the issue here is that they thought they were running integrity checks but it turned out that wasn't happening.
                            Last edited by smitty3268; 03-25-2013, 03:31 PM.

                            Comment


                            • #15
                              Originally posted by ryao View Post
                              Unfortunately, proper backups would not have helped as much as one would think without a way to know whether or not the data was bad prior to doing them. It is not clear when the repositories became corrupted, although the blog post suggests that fsck was the trigger.
                              While true, it's still better than the current setup. Even if they'd been getting corruption for a few months before it was noticed, a good backup setup would still let them go back that far, and they can work out how to reconstruct subsequent activity on top of the restored repository. And that's better than the current setup, where they survived only out of pure luck, apparently in the belief that having redundancy from the mirror system is good enough. Wrong!

                              Comment

                              Working...
                              X