Announcement

Collapse
No announcement yet.

Btrfs Updates Sent In For The Linux 4.17 Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by geearf View Post
    I've been using btrfs for probably close to a decade with no major issue.
    My only main concern is mount time of big partitions, in Tb on HDD. I've worked with some dev on this, and had hope, was told to wait for the analysis, and finally there's nothing much that will help anytime soon :/
    Since I don't restart my computers much it's no big deal, but mentally it annoys me
    How large are your partitions? I have 24x10TB disks in a BTRFS Raid 1 and it mounts within seconds:

    Code:
    root@fileserver-lon:~# dmesg | grep -i btrfs
    [   11.946091] Btrfs loaded
    [   12.759310] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 5 transid 25142 /dev/sde
    [   12.759718] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 24 transid 25142 /dev/sdx
    [   12.760121] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 11 transid 25142 /dev/sdk
    [   12.776776] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 8 transid 25142 /dev/sdh
    [   12.777255] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 13 transid 25142 /dev/sdm
    [   12.777708] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 4 transid 25142 /dev/sdd
    [   12.778190] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 10 transid 25142 /dev/sdj
    [   12.778662] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 9 transid 25142 /dev/sdi
    [   12.779124] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 18 transid 25142 /dev/sdr
    [   12.779580] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 12 transid 25142 /dev/sdl
    [   12.780037] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 7 transid 25142 /dev/sdg
    [   12.780473] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 14 transid 25142 /dev/sdn
    [   12.780910] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 1 transid 25142 /dev/sda
    [   12.781354] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 15 transid 25142 /dev/sdo
    [   12.781802] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 16 transid 25142 /dev/sdp
    [   12.782228] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 22 transid 25142 /dev/sdw
    [   12.782654] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 2 transid 25142 /dev/sdb
    [   12.783076] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 3 transid 25142 /dev/sdc
    [   12.783509] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 21 transid 25142 /dev/sdu
    [   12.783942] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 23 transid 25142 /dev/sdv
    [   12.784364] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 19 transid 25142 /dev/sds
    [   12.784774] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 20 transid 25142 /dev/sdt
    [   12.785174] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 17 transid 25142 /dev/sdq
    [   12.785574] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 6 transid 25142 /dev/sdf
    [   14.871372] BTRFS info (device sdf): disk space caching is enabled
    [   14.871373] BTRFS: has skinny extents

    Comment


    • #42
      Originally posted by waxhead View Post

      While BTRFS "RAID"1 resembles "normal" RAID1 it is not. I'll give a shot at explaining:

      In BTRFS terms the same data is duplicated on two drives (in a 2 disk setup), but the layout of the data e.g. where it resides on the disk may be very different. In a three disk BTRFS "RAID"1 you will have still only have two copies of the data, in a four disk setup you will still only have two copies of your data on different drives.

      But let's concentrate on a two disk setup.
      Let's pretend that you HAD three disk in "RAID"1 and rebalanced it to a two disk "RAID"1. On a "normal" RAID1 you would expect the data to be mirrored, e.g. if you seek to block 123456789 on a disk A you will find the same data as you will have on disk B. For BTRFS this may be very different. While it probably is mostly the same if you start off with two disks you have no guarantee. A corruption for example may cause the data to be relocated elsewhere on the disk.

      The main confusion is that while BTRFS "RAID"1 works very much like a "normal" RAID1 from a users perspective (duplicate the data), it will not MIRROR the data. Think of this as having two puzzles where one is assembled, and the other one is in a box. You still have the same pieces, and they make take up the same space, but their layout is very different.

      BTRFS "RAID"1 also (until kernel 4.14 I think) required at least three storage devices to not enter a permanent read only mode. This because it had to be able to create two copies of the data on two different devices always.

      Think of BTRFS different data/metadata profiles roughly like so:

      SINGLE: Just like your average filesystem. No redundancy
      DUP: Two copies of the data on the same device, makes it possible to survive / recover from corruption on a single device.
      RAID1: Should really be called 2COPIES: Keeps two copies of the data on two different devices
      RAID10: Should really be called 2COPIES mirrored: Keeps two copies of the data, each copy spread over half of the available storage devices
      RAID5: Should really be called STRIPE + Single parity : Spread the data over all, but one of the available devices. Writes parity information on the leftover device.
      RAID6: Should really be called STRIPE + Dual parity : Spread the data over all, but two of the available devices. Writes parity information on the two leftover devices
      TY for explaining this in a more clear fashion, English (and temperament ) is not my strong point. I loose patience with millennials very quickly because where as they were born with technology all around them they fail to grasp a lot of the abstractions. Your suggestions are good btw they should adopt them as it would be acceptable and clear terminology for what they are doing.

      Comment


      • #43
        Originally posted by ssorgatem View Post
        The thing is, nowhere in the definition of RAID1 does it specify that the data has to be mapped in the same way in both disks; onyl that it be in both disks.
        Also, the definition of RAID1 talks about 2 disks, not "one copy of the data in every disk". So Btrfs' approach of still having 2 copies of the data in different devices regardless of the amount of devices doe snot contradict the definition of RAID1 in any way.
        This is wrong, you seem to confuse this with RAID0
        The SNIA Common RAID DDF version 1.4 and 2.0 both define RAID1 like so:

        "4.2.2 RAID-1 Simple Mirroring (PRL=01, RLQ=00) A VD with PRL=01 and RLQ=00 MUST have two and only two extents. Each extent MUST be equal to the size of the VD. Each block of the VD, virtual_block(x), MUST be duplicated on both extents at the same offset Figure 3 gives an example of simple mirroring."

        Reference:



        So therefore, hence, ergo, vis a vis, concordantly and so fourth BTRFS "RAID"1 is not true RAID1 and it is not meant to be either. The naming is not that bad I admit as it attracts users because they instantly know what it is , but at the same time it adds to the confusion when people expect it to work as "normal" RAID.

        Originally posted by ssorgatem View Post
        Your problem is that you are trying to compare a block-level RAID1, that can only work in one way because it doesn't have knowledge of the data or the files in it, and a filesystem-level RAID1, which does not need to do useless and wasteful "mirroring" of data layouts, since it's aware of the data.

        So it is not "normal" RAID1 because it's not a block-level RAID, but a filesystem level one. Still RAID1, though.
        Absolutely , it is a filesystem RAID1 , but it is NOT RAID1 from a definition point of view. It looks like RAID1, it smells like RAID1 and it behaves MOSTLY like RAID1 so yes, it is almost RAID1 , but not quite.

        Originally posted by ssorgatem View Post
        Also, RAID1 has never required more than 2 devices that I can remember, and I've been using it for years.
        You could (can?) lock it into read-only if you mount it degraded and read-write with only one device and start writing data to it... but if you got to that point... what did you expect?
        "Normal" RAID1 will read/write in degraded mode. BTRFS will not allow this until at least kernel 4.14 I think, and from memory I think it also require a degraded mount option. E.g. it will not automatically allow writes without the degraded mount option. So yes, it is even closer to "proper RAID1" now , but by definition it is still not RAID1. And in my view BTRFS' version of "RAID"1 it is actually better and more reliable anyway.

        http://www.dirtcellar.net

        Comment


        • #44
          It's not just splitting hairs though, and it's not so much of a minor difference. It is only a minor difference when talking about two devices. (and that is a common level of redundancy in the home) In enterprise you'll see raid1 configs typically with 3 or more drives in them and when you have that situation.. say 4 disks in a raid1 that means something considerably different to what brtfs calls raid1. By the standard definition you could fail 3 drives and still run degraded but without interruption, by your definition of btrfs raid1 you can fail one and only one. (Also does it panic or mount ro when this happens?). What I was getting at before also is that the number of spindles increases your failure rate. Many more spindles without more redundancy is a no no. Disks do fail at the same time a common case for this is two next to each other in an array that are subject to the same heat/vibrations conditions and what you will run into in a large array is when you replace one and resilver. Resilver puts a lot of load on the disks so during the time it takes the other will go leaving your array toast (and most likely your job too). This is why you have to have more redundancy the more spindles you have.

          The largest storage array I've run on a single system was a 48 disk system with Solaris and ZFS. https://www.youtube.com/watch?v=lwT3Hrk4BSo No, not the biggest in the world by any mark but pretty impressive for its day and would still be quite useful even today. That was over 10 years ago and even then it was a rock solid, bet the business on, storage solution.

          Comment


          • #45
            Originally posted by k1e0x View Post
            It's not just splitting hairs though, and it's not so much of a minor difference. It is only a minor difference when talking about two devices. (and that is a common level of redundancy in the home) In enterprise you'll see raid1 configs typically with 3 or more drives in them and when you have that situation.. say 4 disks in a raid1 that means something considerably different to what brtfs calls raid1. By the standard definition you could fail 3 drives and still run degraded but without interruption, by your definition of btrfs raid1 you can fail one and only one. (Also does it panic or mount ro when this happens?).
            Oh keep in mind. It's not my definition, but yes - you can only fail one in a two disk system after kernel 4.14. Before you could fail one drive once and have one chance of recovering the filesystem back to read/write. If you failed this , it would go into irreversible read only mode. Therefore you would in the BTRFS world run with more than two disk to have something similar to RAID1. And yes, the functionality of BTRFS "RAID"1 is very simmilar to normal RAID1, but implementation wise it is vastly different.

            Originally posted by k1e0x View Post
            What I was getting at before also is that the number of spindles increases your failure rate. Many more spindles without more redundancy is a no no. Disks do fail at the same time a common case for this is two next to each other in an array that are subject to the same heat/vibrations conditions and what you will run into in a large array is when you replace one and resilver. Resilver puts a lot of load on the disks so during the time it takes the other will go leaving your array toast (and most likely your job too). This is why you have to have more redundancy the more spindles you have.
            Absolutely correct, the only exception is if you have a large "RAID"1 like array in BTRFS with not that much data on it. It will distribute reads and writes well (in particular if all disks are equally sized and enough threads are accessing the array at once), but you have to take into consideration failure rate for certain workloads in case your storage device goes the full 'unga-bunga' with complications

            Originally posted by k1e0x View Post
            The largest storage array I've run on a single system was a 48 disk system with Solaris and ZFS. https://www.youtube.com/watch?v=lwT3Hrk4BSo No, not the biggest in the world by any mark but pretty impressive for its day and would still be quite useful even today. That was over 10 years ago and even then it was a rock solid, bet the business on, storage solution.
            Yeah nice one , but in general - the larger the storage is the higher the risk of requiring a backup.

            http://www.dirtcellar.net

            Comment


            • #46
              Disks failing at the same time or near it is only common because arrays are often made with same-model disks purchased at the same time.
              It's normal they die at about the same time when subjected to the same workload...


              But in an heterogeneous btrfs RAID1 with no 2 disks with the same size, make, model or age?
              Way more unlikely than with a 2-disk array of identical disks.

              Comment


              • #47
                Originally posted by nazar-pc View Post
                One thing is to compare btrfs+lzo (which I'm using) to btrfs+zstd and completely different is comparing btrfs to ext4, f2fs or zfs. Btrfs performance is not as good as I'd like it to be. I still prefer feature set, but also would really like to see at least 90% of performance of the fastest filesystems (with cow disabled), otherwise there is a huge space for a new player with no apparent candidate.
                I'm not sure what to reply because it sounds a bit odd (or, rather, vague). BTRFS, compared to ext4, could not be as efficient e.g. in CPU intensive tasks, like, searching over a cached filesystem tree (but maybe it is — I pulled the example out of pure air). However in everyday desktop/laptop usage the bottleneck becomes the IO of the storage device (perhaps less so in case of SSD, but I don't know how much). Storage devices are much-much slower than CPU or RAM, so what one can do, is to relay a bit of the load from an HDD to CPU cores by compressing the files. In a result the amount of IO gets dramatically reduced.

                So, btrfs+zstd *for a typical PC usage* (as in, booting the system, starting the apps, reading documents, photos, playing games) ends up being faster than ext4 simply because the latter does not support compression.

                Comment


                • #48
                  Originally posted by waxhead View Post
                  4. BTRFS is not safe (reliable)

                  Really? That is utterly wrong. BTRFS is designed to be reliable.
                  ...
                  As long as you stay away from the "RAID"5/6 like features, quotas, and do a bit of research BTRFS is one of the better filesystems out there depending on your use case. I care less about performance, I want to trust my data and BTRFS does an absolutely stellar job at that compared to the alternatives.
                  Eh? You say that BTRFS is reliable if you do not use all these features? Have you not heard about all the data losses people had with BTRFS? If BTRFS was reliable, these data corruption reports would not exist. But all data corruption reports exist.

                  (BTW Facebook use BTRFS for read only data on clusters - they dont use BTRFS for mission critical data).

                  Comment


                  • #49
                    Originally posted by F.Ultra View Post

                    How large are your partitions? I have 24x10TB disks in a BTRFS Raid 1 and it mounts within seconds:

                    Code:
                    root@fileserver-lon:~# dmesg | grep -i btrfs
                    [ 11.946091] Btrfs loaded
                    [ 12.759310] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 5 transid 25142 /dev/sde
                    [ 12.759718] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 24 transid 25142 /dev/sdx
                    [ 12.760121] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 11 transid 25142 /dev/sdk
                    [ 12.776776] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 8 transid 25142 /dev/sdh
                    [ 12.777255] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 13 transid 25142 /dev/sdm
                    [ 12.777708] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 4 transid 25142 /dev/sdd
                    [ 12.778190] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 10 transid 25142 /dev/sdj
                    [ 12.778662] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 9 transid 25142 /dev/sdi
                    [ 12.779124] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 18 transid 25142 /dev/sdr
                    [ 12.779580] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 12 transid 25142 /dev/sdl
                    [ 12.780037] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 7 transid 25142 /dev/sdg
                    [ 12.780473] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 14 transid 25142 /dev/sdn
                    [ 12.780910] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 1 transid 25142 /dev/sda
                    [ 12.781354] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 15 transid 25142 /dev/sdo
                    [ 12.781802] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 16 transid 25142 /dev/sdp
                    [ 12.782228] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 22 transid 25142 /dev/sdw
                    [ 12.782654] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 2 transid 25142 /dev/sdb
                    [ 12.783076] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 3 transid 25142 /dev/sdc
                    [ 12.783509] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 21 transid 25142 /dev/sdu
                    [ 12.783942] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 23 transid 25142 /dev/sdv
                    [ 12.784364] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 19 transid 25142 /dev/sds
                    [ 12.784774] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 20 transid 25142 /dev/sdt
                    [ 12.785174] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 17 transid 25142 /dev/sdq
                    [ 12.785574] BTRFS: device fsid e89a8a07-138e-412a-ab46-6606633f2e8a devid 6 transid 25142 /dev/sdf
                    [ 14.871372] BTRFS info (device sdf): disk space caching is enabled
                    [ 14.871373] BTRFS: has skinny extents
                    Not that large, around 4Tb. Do you have many files in there or a few big ones?
                    It does mount in about 5 seconds max when it's fresh, and up to 30 seconds when it's not.

                    Comment


                    • #50
                      Originally posted by geearf View Post

                      Not that large, around 4Tb. Do you have many files in there or a few big ones?
                      It does mount in about 5 seconds max when it's fresh, and up to 30 seconds when it's not.
                      The files that I store are quite big so there is only 14544 files right now according to "find ./ | wc"

                      Comment

                      Working...
                      X