Announcement

Collapse
No announcement yet.

Btrfs Getting RAID 5/6 Fixes In Linux 4.12 Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by pcxmac View Post

    lz4 works great when you have plenty of RAM, but on smaller machines like embedded systems, with slower processors, lzo is the better choice.
    lzo doesn't make any sense on low end machines, it will just slow down all I/O. Its main benefit is to save up to 50% disk space, but it rarely does that well. On the other hand, low end machines probably have lower disk space requirements so the drives are already big enough. BTW, the 'plenty' you're referring to is probably something like < 20 MB per thread, if you got the info from some page comparing the lz4 utility? Some of that space is just large buffers that can be tuned down. So yes, lzo is better on some 50 MHz smart home beacon, but for example on $9 Orange Pi and other "high end machines" lz4 always wins.

    Comment


    • #22
      Originally posted by RussianNeuroMancer View Post
      Btrfs used in Jolla phones and tablets.
      That's actually no longer true - it was used in the first Sailfish OS device (the "Jolla 1" smartphone), but there were serious issues with it and new devices (Jolla Tablet, Jolla C, Intex Aquafish, community ports, etc.) all run EXT4 on top of LVM.

      AFAIK, these were the reasons why Jolla dropped btrfs:
      • metadata block exhaustion resulting in filesystem write operations failing at random
        • The Jolla 1 internal storage was about 16 GB big. Btrfs allocates data and metadata block on demand, starting with one data and one metadat block being allocated. Then it would often happen that the remaining free space will be all allocated to data blocks (as the users adds big files to the device, records video, etc.). Things would still work for a while, until the single metadata block gets full and there is no unallocated space for a new metadata block. Result: filesystem operations failing at random, Oops! :P
      • Bogus free space reporting - due to the issue above (there is enough space for data, just not for the metadata associated with the new files) and data "preserved" by snapshots free space reporting was very unreliable.
      • Btrfs balance issues.
        • ballancing is run periodically on Jolla 1 in an attempt to make thing a bit more bearable (eq. dealocate some data blocks so that new metadata blocks can be written)
        • this still often fails to free any data block even when there are multiple GB of free space reported by btrfs
        • the btrfs balancing tool sometimes crashes or runs indefinitely
      • Btrfs snapshots have been used for factory reset, which has not turned out as a good idea.
        • the Jolla 1 originally shipped at the end of 2013 and the factory reset snapshot dates from that time
        • any user who performs a factory reset will have to perform multiple consecutive upgrades to get to the 2017 "patch level"
        • there is no robust way to update the factory reset btrfs snapshot to a more recent firmware over network on all devices
        • all storage is allocated to btrfs, so it's not even possible to drop the fs, recreate it and fill it from some recovery partition
        • this has been solved on newer devices by adding a recovery partition that holds squashfs recovery images of the rootfs and home folder; facotry reset just drops the primary storage (ext4 on LVM), recreates it and fill it from the squashfs images; the squashfs recovery images on a separate partition can be easily updated as needed
      • Btrfs also does not work particularly well with the ancient Android kernels Sailfish OS has to use in order to reuse the Android hardware adaptation.

      Comment


      • #23
        Originally posted by starshipeleven View Post
        I have Snapper enabled since I installed Leap 42.1, and it has yet to eat all my drive space.
        It's clearly auto-deleting snapshots as apart from the first one done on install I see snapshot number 959 to snapshot 1092 in the list (many snapshots are deleted in between these, it's a total of a dozen or so).

        I heard that Snapper had these issues to the snapshot deleting algos but I thought they were fixed.
        That is fine on leap, where your updates are typically small. One Tumbleweed I often get updates of several thousand packages, and when I use something like 60% of my root partition just for packages, have another 10% or so in temp files, and then I have to deal with snapper backups eating another 30%, a big update's package cache combined with the snapper backup results in zero hard disk space.

        It is easier to just keep a list of installed packages and do a 2 hour clean install every 6 months or so on average if anything goes wrong, it takes less time over all then having to spend an hour every couple weeks dealing with a big update under snapper.

        Comment


        • #24
          Originally posted by waxhead View Post
          1. Stay with RAID1 mode only.
          RAID10 is safe also.

          Comment


          • #25
            Originally posted by Zucca View Post
            RAID10 is safe also.
            Yes and no. It works great, but it is very easy to get stuck in irreversible read-only mode. If you for example have a 4 disk raid10 and loose one disk you are essentially in the same situation as you are with a 2 disk raid1 setup where one disk is lost. Until BTRFS actually learns to understand failed devices this is risky. let's pretend that your /dev/sde drops out for a few seconds and comes back as /dev/sdf. BTRFS will still try to write to /dev/sde even if it does not exist. BTRFS does NOT look at your device ID. Same applies for RAID1 , but if you have a bunch of disks you are much safer.

            http://www.dirtcellar.net

            Comment


            • #26
              Originally posted by DrYak View Post

              1. I second that : RAID5/6 still aren't completely ready, they still can't handle bitrot.
              2. --- not necessarily. bugs have been fixed and distros have tools to automate the maintenance to avoid that.
              3. --- that's the whole point of BTRFS. Actually if you maintain your BTRFS properly (or actually use tools for that). Compression and specially snapshots are nice tools.
              I would more say : stay within known parameters.
              Thus do not use RAID5/6 (yet), do not tweak it un-necessarily (e.g.: no need to modify nodesize - makes scrubing compressed data less reliable).
              4. --- or use a distro which has tools for that (e.g.: suse's btrfs-maintenance, sailfish's: btrfs-balancer)
              5. --- which should be caught by the above tools too.
              6. I second that. Keep backups no matter what. And actually BTRFS on your fileserver makes making backups easier (thanks for rsync+snapshots, instead of playing with rsanc+hardlinks)
              7. --- well nowadays, btrfs has quite stabilized. You aren't in for bad surprises if you stay within stable features (no RAID5/6 yet).

              Again, I have no problem, BUT I tend to use distro which have tools (on suse and sailfish) or roll my own solution (on debian).
              2. I disagree. Are we sure we are talking about the same thing? read only happens if btrfs is not able to produce a second mirror. For RAID1 you need at least 3 disks to stay on the safe side- BTRFS does not either use device ID's so if /dev/sdy disappears it will still try to write to it. Are you saying that this is fixed? It was not a few months ago, and the btrfs status page also confirms this.

              3. Depends on what you are looking for. For me bit-rot detection and repair is the most important BTRFS features.

              7. Yes, BTRFS is stabilizing. You need to cherry pick what features you want to use if you are going to call it stable.

              BTRFS is getting better every day, and it is a great filesystem. I would assume that in 3-4 more years it is perhaps even considered the default in Debian


              http://www.dirtcellar.net

              Comment


              • #27
                Originally posted by waxhead View Post
                Yes and no. It works great, but it is very easy to get stuck in irreversible read-only mode. If you for example have a 4 disk raid10 and loose one disk you are essentially in the same situation as you are with a 2 disk raid1 setup where one disk is lost. Until BTRFS actually learns to understand failed devices this is risky. let's pretend that your /dev/sde drops out for a few seconds and comes back as /dev/sdf. BTRFS will still try to write to /dev/sde even if it does not exist. BTRFS does NOT look at your device ID. Same applies for RAID1 , but if you have a bunch of disks you are much safer.
                I had different experience with my raid10, but it's on 6 drives, so there's much more flexibility in various ways.
                Anyway when I was trying to find out what was causing all kinds of errors emerging from ata1 to ata6 (it was eventually kernel or hw bug on SATA controller) I started to unplug drives one after another. Of course plugging one back before pulling second out (hotswap cage made this easy). While one drive was un plugged btrfs knew it couldn't access it anymore and "marked it" as failed, since everything worked without a problem, no read timeouts etc... I watched dmesg and there was a clear event when the drive was dropped from the pool. After I replugged a drive, I ran commands to add it to the btrfs pool, replacing itself.
                This however isn't an analog for a case where one (or more) drive starts to slowly die. I don't have experience on how many errors and what kind of errors need to occur when btrfs decides to mark a drive dead. I swap my drives so often (to bigger ones). :P
                What I've read and understood is that the ro nightmare usually comes when you lose one drive and don't replace it but you balance the data over the remaining drives to preserve the redundancy --> not enough space --> lockup.
                Last edited by Zucca; 25 April 2017, 06:12 AM. Reason: Typos; begone!

                Comment


                • #28
                  Thank You all for answers.

                  Comment


                  • #29
                    Originally posted by pcxmac View Post

                    ... on spinning rust.
                    It seems pretty fashionable nowadays to use this “rust” pejorative when referring to hard drives.

                    Let me just point out that rust isn’t magnetic.

                    Comment


                    • #30
                      Originally posted by MartinK View Post
                      [*]Btrfs also does not work particularly well with the ancient Android kernels Sailfish OS has to use in order to reuse the Android hardware adaptation.
                      This is and always will be the problem.


                      Comment

                      Working...
                      X