Improved Btrfs Scrub Code Readied For Linux 6.4, ~10% Faster

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • vladpetric
    replied
    Originally posted by fitzie View Post

    I agree, being able to grow the raid by adding new disks new size is a key feature for home users. I do like how unraid handles this, but of course their solution is proprietary.

    I use openzfs (and btrfs and bcachefs) but I do think licensing is indirectly a hindrance. First, it prevents the filesystem from being worked on by experienced filesystem developers (at least in linux land), it also means that you have to delay following upstream kernels. And I suspect it also makes companies that might otherwise fund features like raidz expansion from getting involved. Btw - I remember when zfs came out, and it had significant deficiencies, but zfs made it through them pretty well. Sun was lucky to have brilliant developers, and put significant investment into it to see it through those early days. For better or worse linux filesystems have to take a radically different evolutionary path, and we may never see an investment like this again, because the cloud prefer investing in redundancy over the network rather then contained to a single host. I'm fairly pessimistic, but that doesn't keep me from funding bcachefs development
    Yeah, RAID offers redundancy at the controller/box level, whereas object stores (S3, GCS, etc) achieve redundancy at a cluster/network level ... for a highly distributed system that's typically more useful ... (I do understand where they're coming from).

    Anyway, thanks for your contributions!

    Leave a comment:


  • fitzie
    replied
    Originally posted by vladpetric View Post

    That's a decent video, thanks!

    I don't see issues with licenses if you:

    1. Are either willing to build module from source (dkms works reasonably well), or just use Ubuntu (ZFS in Ubuntu is totally fine, this is what I use)
    2. Aren't bothered by the possible OSS licensing incompatibility yourself (I would note there that the restrictions of the licenses apply to stuff that's, well, distributed, in source code or binary; doing this mix at home without distribution doesn't count at all).

    Not accommodating random disk sizes is perhaps the thing I don't like about ZFS (BTRFS can do that).
    I agree, being able to grow the raid by adding new disks new size is a key feature for home users. I do like how unraid handles this, but of course their solution is proprietary.

    I use openzfs (and btrfs and bcachefs) but I do think licensing is indirectly a hindrance. First, it prevents the filesystem from being worked on by experienced filesystem developers (at least in linux land), it also means that you have to delay following upstream kernels. And I suspect it also makes companies that might otherwise fund features like raidz expansion from getting involved. Btw - I remember when zfs came out, and it had significant deficiencies, but zfs made it through them pretty well. Sun was lucky to have brilliant developers, and put significant investment into it to see it through those early days. For better or worse linux filesystems have to take a radically different evolutionary path, and we may never see an investment like this again, because the cloud prefer investing in redundancy over the network rather then contained to a single host. I'm fairly pessimistic, but that doesn't keep me from funding bcachefs development

    Leave a comment:


  • vladpetric
    replied
    Originally posted by fitzie View Post

    that article does talk about a major deficiency in btrfs, which is it's RAID support, but fortunately you can use other layers for raid and still use btrfs for the other features and capabilities. I don't think there's any perfect filesystem for linux, but I think people running scared from btrfs are a bit behind the times and missing out on some great features like snapshots and compression. Of course, you can rock zfs, but that's not perfect either unfortunately mostly due to licensing.

    Anyway, there's some documents that facebook put out on their btrfs, but I found this video to be the most informative:

    https://youtu.be/U7gXR2L05IU
    That's a decent video, thanks!

    I don't see issues with licenses if you:

    1. Are either willing to build module from source (dkms works reasonably well), or just use Ubuntu (ZFS in Ubuntu is totally fine, this is what I use)
    2. Aren't bothered by the possible OSS licensing incompatibility yourself (I would note there that the restrictions of the licenses apply to stuff that's, well, distributed, in source code or binary; doing this mix at home without distribution doesn't count at all).

    Not accommodating random disk sizes is perhaps the thing I don't like about ZFS (BTRFS can do that).

    Leave a comment:


  • fitzie
    replied
    Originally posted by vladpetric View Post

    Sure, if Facebook publishes some report on larger scale btrfs deployments, that'd be great (no, I don't think such a technical report exists, though if you know of one, not necessarily FB, do share). In general, it's easy to measure performance (not that btrfs is great at that to begin with), much harder to measure reliability. But yeah, if you have 1k boxes in a cluster, and they all run btrfs, and perhaps another cluster in the same DC has ext4, then that's a great start for comparing reliability and maintenance efforts.

    Where I strongly disagree is the weight of anecdotes - when the failure rate is estimated, say, between 1 in 10 and 1 in 1000 (pretty wide range here), a negative weighs a lot more than a positive. And yes, one does typically want that kind of reliability from a filesystem. Sure, one could argue that negatives are much more likely to be published.

    As for ancient - ummm, less than two years ago is not that ancient ... https://arstechnica.com/gadgets/2021...-filesystem/2/
    that article does talk about a major deficiency in btrfs, which is it's RAID support, but fortunately you can use other layers for raid and still use btrfs for the other features and capabilities. I don't think there's any perfect filesystem for linux, but I think people running scared from btrfs are a bit behind the times and missing out on some great features like snapshots and compression. Of course, you can rock zfs, but that's not perfect either unfortunately mostly due to licensing.

    Anyway, there's some documents that facebook put out on their btrfs, but I found this video to be the most informative:

    Leave a comment:


  • vladpetric
    replied
    In general, when dealing with a large scale deployment - at least hundreds of nodes - then hardware failure is more or less a given. Do you need backups? Absolutely.

    What you don't want happening is the filesystem creating more problems on top of that ... and yes, it does happen, and it can create significant maintenance overhead.

    Leave a comment:


  • vladpetric
    replied
    Originally posted by fitzie View Post

    positive anecdotes are just as helpful as negative, especially since most of the btrfs horror stories are really old (although there was a bad bug causing corruption in the 5.2/5.3 kernels). as for widespread datacenter deployments go, facebook seems to be a heavy user, and funds a lot of development. If people are avoiding btrfs due to concerns of losing your data you should have backups, because there's no filesystem that will preserve data under all conditions.
    Sure, if Facebook publishes some report on larger scale btrfs deployments, that'd be great (no, I don't think such a technical report exists, though if you know of one, not necessarily FB, do share). In general, it's easy to measure performance (not that btrfs is great at that to begin with), much harder to measure reliability. But yeah, if you have 1k boxes in a cluster, and they all run btrfs, and perhaps another cluster in the same DC has ext4, then that's a great start for comparing reliability and maintenance efforts.

    Where I strongly disagree is the weight of anecdotes - when the failure rate is estimated, say, between 1 in 10 and 1 in 1000 (pretty wide range here), a negative weighs a lot more than a positive. And yes, one does typically want that kind of reliability from a filesystem. Sure, one could argue that negatives are much more likely to be published.

    As for ancient - ummm, less than two years ago is not that ancient ... https://arstechnica.com/gadgets/2021...-filesystem/2/

    Leave a comment:


  • fitzie
    replied
    Originally posted by vladpetric View Post
    Not my intent to start a flame war (though probably I am), but using one or two successful datapoints and implying/pretending that it will be good for everyone else is not a good idea. It could very well be that you were relatively lucky and nothing bad happened. Or, your workloads are simply not stressful enough in terms of I/O to trigger bad situations.

    To make an extreme (yes, extreme) one could be playing russian roulette and after firing two blanks conclude that it's totally fine.

    Or to quote a bad joke, 5 out of 6 scientist agree, russian roulette is perfectly safe!

    Ummm, failures and failure models really don't work like that.

    Why do I even say this? Because there's plenty of horror stories with btrfs as well, especially when it comes to device failure, rebuilding the array and so on. A datacenter deploying btrfs as scale is a much better data point than "It worked for me, (and I'm the most important person in the world anyway ...) therefore it will work for you".
    positive anecdotes are just as helpful as negative, especially since most of the btrfs horror stories are really old (although there was a bad bug causing corruption in the 5.2/5.3 kernels). as for widespread datacenter deployments go, facebook seems to be a heavy user, and funds a lot of development. If people are avoiding btrfs due to concerns of losing your data you should have backups, because there's no filesystem that will preserve data under all conditions.

    Leave a comment:


  • fitzie
    replied
    Originally posted by guglovich View Post

    Yes, I remember. I wish they would have prioritized this, it's a very important minus.
    as posted elsewhere in the thread, this improvement was separate from extent v2 and merged into 6.1 as block group tree feature.

    Leave a comment:


  • vladpetric
    replied
    Not my intent to start a flame war (though probably I am), but using one or two successful datapoints and implying/pretending that it will be good for everyone else is not a good idea. It could very well be that you were relatively lucky and nothing bad happened. Or, your workloads are simply not stressful enough in terms of I/O to trigger bad situations.

    To make an extreme (yes, extreme) one could be playing russian roulette and after firing two blanks conclude that it's totally fine.

    Or to quote a bad joke, 5 out of 6 scientist agree, russian roulette is perfectly safe!

    Ummm, failures and failure models really don't work like that.

    Why do I even say this? Because there's plenty of horror stories with btrfs as well, especially when it comes to device failure, rebuilding the array and so on. A datacenter deploying btrfs as scale is a much better data point than "It worked for me, (and I'm the most important person in the world anyway ...) therefore it will work for you".

    Leave a comment:


  • guglovich
    replied
    Originally posted by cynic View Post

    they're working on it. Extent tree v2 should solve the issue, but it won't come too soon.
    Yes, I remember. I wish they would have prioritized this, it's a very important minus.

    Leave a comment:

Working...
X