Announcement

Collapse
No announcement yet.

Btrfs With Linux 5.10 Brings Some Sizable FSync Performance Improvements

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • S.Pam
    replied
    Originally posted by ElectricPrism View Post
    Heads up. Been using BTRFS for 4-5 years no issues. With this 5.10 linux-git [AUR] my system froze and now I am having major tree read errors. I am on day 3 of data recovery. Day 1 I dd'd the broken btrfs NVME as a Image.

    I switched to linux-git after linux-zen started to have some dmesg errors. This is the 3rd Intel 660p NVME drive I have noted dmesg errors on. I am keeping the rest of my Linux workstations and servers on Linux 5.9 for the foreseeable future.

    Anyways, the recovery instructions here: https://ownyourbits.com/2019/03/03/h...rfs-partition/ so far haven't yielded much results.

    Anyways, my latest backup is a few months old. It's been 10 years since I've had a major data issue -- my last issue was related to Ubuntu 10.04 MDADM.

    I have been running bleeding edge for years so just a heads up to anyone reading DO NOT UPGRADE TO 5.10 without making your backup current and monitoring your dmesg.
    Sounds more like you have faulty hardware.

    Leave a comment:


  • ElectricPrism
    replied
    Heads up. Been using BTRFS for 4-5 years no issues. With this 5.10 linux-git [AUR] my system froze and now I am having major tree read errors. I am on day 3 of data recovery. Day 1 I dd'd the broken btrfs NVME as a Image.

    I switched to linux-git after linux-zen started to have some dmesg errors. This is the 3rd Intel 660p NVME drive I have noted dmesg errors on. I am keeping the rest of my Linux workstations and servers on Linux 5.9 for the foreseeable future.

    Anyways, the recovery instructions here: https://ownyourbits.com/2019/03/03/h...rfs-partition/ so far haven't yielded much results.

    Anyways, my latest backup is a few months old. It's been 10 years since I've had a major data issue -- my last issue was related to Ubuntu 10.04 MDADM.

    I have been running bleeding edge for years so just a heads up to anyone reading DO NOT UPGRADE TO 5.10 without making your backup current and monitoring your dmesg.

    Leave a comment:


  • S.Pam
    replied
    space_cache=v2 will soon be the default. It is long over due. Only really held back because some really old kernels only can mount v2 as read only.

    Leave a comment:


  • F.Ultra
    replied
    Originally posted by Spam View Post

    You need to use noatime. That implies nodiratime too. You can't do only nodiratime like in ext4.

    ​​​​​​you should really use space_cache=v2. Really a
    v​​​​ery big difference if you have an FS of over a TiB.
    Thanks, I used both noatime and nodiratime in fstab so that should have covered that then. Will experiment with space_cache later, will install a similar server soon so will test it out there.

    Leave a comment:


  • S.Pam
    replied
    Originally posted by F.Ultra View Post

    ok so BTRFS does not honor the nodiratime mount option?
    You need to use noatime. That implies nodiratime too. You can't do only nodiratime like in ext4.

    ​​​​​​you should really use space_cache=v2. Really a
    v​​​​ery big difference if you have an FS of over a TiB.

    Leave a comment:


  • waxhead
    replied
    Originally posted by F.Ultra View Post

    ok so BTRFS does not honor the nodiratime mount option?
    I have no clue.

    Leave a comment:


  • F.Ultra
    replied
    Originally posted by waxhead View Post

    Ok, BTRFS have two cache mechanisms for free space. You are probably using v1 if you are just using the defaults. On multi-terrabyte filesystem the performance may be degraded. When you list your directories the access time will be updated (which may result in a write) so it just maybe this was the reason you where seeing delays. You can try to switch to space_cache=v2 which was not as straight forward as it may seem from the manpage ( https://btrfs.wiki.kernel.org/index....e/btrfs%285%29 ). Another thing you can do is to try to put large directories in subvolumes. A couple of years ago I did this on a server with about 7.5 million files which was a bit slow on lots of (heavy) small file operations.
    ok so BTRFS does not honor the nodiratime mount option?

    Leave a comment:


  • waxhead
    replied
    Originally posted by kreijack View Post

    What I understood is that it is possible to switch to V2 space cache quite easily; however some bit of (uneed) data of V1 will survive to the switch.

    ...

    In raid 1/10 you can loose up to half of the disk in the best scenario. However in the worst one you lost your filesystem when only two disks fails. It depends by which pair of disks you loose.

    When you loose a disk, you cannot loose the disks where are the other half of the copies of the data.

    Eg. if you have the following setup

    DISK1 DISK2
    DISK3 DISK4
    DISK5 DISK6

    Where disk2,4,6 are the mirror of disk1,3,5 you can loose disk1, disk4 and disk6 and everything work well. However if you loos disk1 and sisk2 the filesystem is gone.
    BTRFS complicates the thing in the sense that there is a pair of chunk (== slice of disk) and not a pair of disk. So a chunk of the disk1 may be mirrored in the disk2; and the next chunk of the disk1 may be mirrored in another disk...

    With RAID1/10 it is guarantee that the filesystem will survives to a lost of *one* (any) disk. But if you loose another disk it is *not guarantee* that the filesystem will survive (it may happens or not).
    For example RAID6 have the guarantee that the filesystem will survive even if two disks are lost.
    First of all the space cache. What I understood is that if there is still something left of the V1 space cache it will continue to be used even if V2 is present. I may be totally or partly wrong about this, but it was a interesting and confusing discussion. It is not as easy as clearing v1, and then mounting with V2 *as I understood it* even if the manual page does not indicate this at all.

    I do understand how BTRFS RAID10 works. Until now loosing two disks have always been a problem for BTRFS RAID10. There was patched proposed a while ago ( https://patchwork.kernel.org/[email protected]/ ) but as you can see the author asked to discard that patch until a problem with not being able to create degraded chunks is solved (and I have no clue if it is solved now).

    Actually there is a theoretically much higher chance of recovering any data from a BTRFS filesystem due to the way data is stored (in chunks, slices or even "partitions" if you like) compared to traditional RAID10/5/6. As long as your metadata is safe you could in theory reconstruct whatever is readable even if you are missing more disks than what would otherwise be possible in a traditional RAID.
    Last edited by waxhead; 14 October 2020, 01:53 PM. Reason: typo

    Leave a comment:


  • kreijack
    replied
    Originally posted by waxhead View Post
    Are you using space_cache v2? I was under the impression that you just had to clear the v1 space cache and then enable the v2 cache , but this is not the case. It was a rather confusing and complex discussion on IRC a month or two back, but all I got out of it was that simply switching space cache was not that easy after all.
    What I understood is that it is possible to switch to V2 space cache quite easily; however some bit of (uneed) data of V1 will survive to the switch.

    Originally posted by waxhead View Post
    Depending on how many storage devices you use and what kind of HBA's you use I would suggest rebalancing data to raid10. If I remember correctly there was patched posted a while ago (that I think was merged) that allowed btrfs' raid10 to potentially handle loosing more than one drive. If that is true you **may** have a slightly better chance surviving two dropped devices if you are both unlucky and lucky at once Of course you would need your metadata to be in raid10 or raid1c3 or raid1c4 to benefit from that.
    In raid 1/10 you can loose up to half of the disk in the best scenario. However in the worst one you lost your filesystem when only two disks fails. It depends by which pair of disks you loose.

    When you loose a disk, you cannot loose the disks where are the other half of the copies of the data.

    Eg. if you have the following setup

    DISK1 DISK2
    DISK3 DISK4
    DISK5 DISK6

    Where disk2,4,6 are the mirror of disk1,3,5 you can loose disk1, disk4 and disk6 and everything work well. However if you loos disk1 and sisk2 the filesystem is gone.
    BTRFS complicates the thing in the sense that there is a pair of chunk (== slice of disk) and not a pair of disk. So a chunk of the disk1 may be mirrored in the disk2; and the next chunk of the disk1 may be mirrored in another disk...

    With RAID1/10 it is guarantee that the filesystem will survives to a lost of *one* (any) disk. But if you loose another disk it is *not guarantee* that the filesystem will survive (it may happens or not).
    For example RAID6 have the guarantee that the filesystem will survive even if two disks are lost.

    Leave a comment:


  • waxhead
    replied
    Originally posted by F.Ultra View Post

    No I'm not using space_cache (unless it's on by default), by cold cache I meant the Linux buffers cache. Strangely enough ls was fast today 24h later even though files have been added to the directories but I guess that Linux VFS simply cached that as well, the machine have 64GB free RAM after all. I have 24 SAS drives in that setup with a LSI 9207-8i as the HBA.
    Ok, BTRFS have two cache mechanisms for free space. You are probably using v1 if you are just using the defaults. On multi-terrabyte filesystem the performance may be degraded. When you list your directories the access time will be updated (which may result in a write) so it just maybe this was the reason you where seeing delays. You can try to switch to space_cache=v2 which was not as straight forward as it may seem from the manpage ( https://btrfs.wiki.kernel.org/index....e/btrfs%285%29 ). Another thing you can do is to try to put large directories in subvolumes. A couple of years ago I did this on a server with about 7.5 million files which was a bit slow on lots of (heavy) small file operations.

    Leave a comment:

Working...
X