Announcement

Collapse
No announcement yet.

Bcachefs File-System Plans To Try Again To Land In Linux 6.6

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • timofonic
    replied
    Originally posted by mdedetrich View Post

    I spent like an hour reading all of the LKML threads and there is definitely politics/ego factors going on there.

    Also I meant bus factor, that was a typo.
    Something smells bad in LKML, it seems.
    Last edited by timofonic; 15 July 2023, 05:09 PM.

    Leave a comment:


  • mdedetrich
    replied
    Originally posted by timofonic View Post

    Politics? What kind of politics? Politics shouldn't influence Linux kernel development, but purely technical merits.

    I don't get what you mean by "bud factor ". If it's about weed, that's not so bad. I consider it a personal option and shouldn't influence Bcachefs technical merits. It shouldn't if it were something worse, anyway.

    If filesystem code in Linux kernel is a mess, it should be improved instead crying about it because adding another filesystem.

    What other kernel changes are needed that are so important right now?
    I spent like an hour reading all of the LKML threads and there is definitely politics/ego factors going on there.

    Also I meant bus factor, that was a typo.

    Leave a comment:


  • mdedetrich
    replied
    Originally posted by kreijack View Post

    What you wrote is incorrect. There is interest to merge bcache. Even a BTRFS developer ( Josef Bacik) showed interested in this merge.

    *concerns about Kent being bud factor Yes Kent sometime was too combative, but this is not the real problem (or almost it is not alone in this behavior). The real problem was that he fight for the "wrong battles". More below,

    The current fs code in Linux apparently being a bit of a mess... To me it seems good to ask to not increase the complexity of the already existing code

    For an optimal bcachefs merge some other kernel specific changes are needed General speaking, it was not the problem, of course there are some concerns about some specific change, and it was suggested to Kent to work to accept the preparatory patches and THEN to do the pull request for the bcachefs that at this time should be self-contained.
    The "wrong battle" above is related to the preparatory patches. Even reiserfs-v4 had the same faith: it had to left some features outside the pull request to allow the merge. It was very promising, but then when the developer disappeared there was nobody (with the exception of few people) that worked on it. Ted Ts'o remembered that even btrfs was self contained when merged in linux.

    The really points are:
    - linux is not anymore an hacking camp where every one can do whatever they want, but is an industrial asset over which there is multi-million dollar business. So now it is not enough that the project is near interesting to touch some area. And to me it is sane to be a bit conservative in some area
    - bcachefs has to prove its value: yes it raises interest but in the actual state it is far for to be a killer application. Nobody uses it, so it is unknown when happen when it will be used by thousand peoples. What everyone expect is that a lot of bugs bubble up (this is not a problem it is only the natural path to develop). To me this is the major concern; now the landscape is very different from when zfs appeared; at that time having a an integrated volume manager supporting different raid profile and the snapshot capabilities was enough. Today these are the minimum requirements for a new filesystem; moreover zfs and btrfs showed to be very complex and require dedicate team to maintain; and even so their develop speed decreased a lot, focusing on the improving the reliability.
    The widespread adoption of the SSD decrease the importance of the tiering (which is one of the biggest advantage of bcachefs). And bcachefs still doesn't have a valid equivalent of raid5/6 implementation like btrfs.
    All these things means that bcachefs is promising but it is not far better than btrfs or zfs (from a capabilities point of view) and it has to prove to satisfy the expectation.
    Good or bad btrfs and zfs have their teams that worked on them, and it is not clear if someone will pay for a bcachefs team.

    All the points above don't prevent the merging of bcachefs; these only make the merging less interesting from the other developers point of view. It is up to Kent to work with they to lower the barrier to the merge. It is possible and a "lower profile" will simplify the process. Then when bcachefs will be merged and will be able to proof their capabilities it will be allowed to Kent to make more invasive changes.
    I think you mistook the intention if my post and the context. My post was responding to someone else who was claiming that what I said earlier was wrong and he used the current merging concerns of btrfs as evidence of that post.

    The whole point behind my post is just stating that while there may concerns, they were largely not technical and has not that hard to address.

    Leave a comment:


  • kreijack
    replied
    Originally posted by mdedetrich View Post

    If you actually read the mailing list, it's not being rejected for because of the technical points being outlined.

    Rather the reasons for rejecting this boils down to
    • politics
    • concerns about Kent being bud factor
    • The current fs code in Linux apparently being a bit of a mess and this merge request is triggering people's concerns about the mess of this generic fs code
    • For an optimal bcachefs merge some other kernel specific changes are needed
    What you wrote is incorrect. There is interest to merge bcache. Even a BTRFS developer ( Josef Bacik) showed interested in this merge.

    *concerns about Kent being bud factor Yes Kent sometime was too combative, but this is not the real problem (or almost it is not alone in this behavior). The real problem was that he fight for the "wrong battles". More below,

    The current fs code in Linux apparently being a bit of a mess... To me it seems good to ask to not increase the complexity of the already existing code

    For an optimal bcachefs merge some other kernel specific changes are needed General speaking, it was not the problem, of course there are some concerns about some specific change, and it was suggested to Kent to work to accept the preparatory patches and THEN to do the pull request for the bcachefs that at this time should be self-contained.
    The "wrong battle" above is related to the preparatory patches. Even reiserfs-v4 had the same faith: it had to left some features outside the pull request to allow the merge. It was very promising, but then when the developer disappeared there was nobody (with the exception of few people) that worked on it. Ted Ts'o remembered that even btrfs was self contained when merged in linux.

    The really points are:
    - linux is not anymore an hacking camp where every one can do whatever they want, but is an industrial asset over which there is multi-million dollar business. So now it is not enough that the project is near interesting to touch some area. And to me it is sane to be a bit conservative in some area
    - bcachefs has to prove its value: yes it raises interest but in the actual state it is far for to be a killer application. Nobody uses it, so it is unknown when happen when it will be used by thousand peoples. What everyone expect is that a lot of bugs bubble up (this is not a problem it is only the natural path to develop). To me this is the major concern; now the landscape is very different from when zfs appeared; at that time having a an integrated volume manager supporting different raid profile and the snapshot capabilities was enough. Today these are the minimum requirements for a new filesystem; moreover zfs and btrfs showed to be very complex and require dedicate team to maintain; and even so their develop speed decreased a lot, focusing on the improving the reliability.
    The widespread adoption of the SSD decrease the importance of the tiering (which is one of the biggest advantage of bcachefs). And bcachefs still doesn't have a valid equivalent of raid5/6 implementation like btrfs.
    All these things means that bcachefs is promising but it is not far better than btrfs or zfs (from a capabilities point of view) and it has to prove to satisfy the expectation.
    Good or bad btrfs and zfs have their teams that worked on them, and it is not clear if someone will pay for a bcachefs team.

    All the points above don't prevent the merging of bcachefs; these only make the merging less interesting from the other developers point of view. It is up to Kent to work with they to lower the barrier to the merge. It is possible and a "lower profile" will simplify the process. Then when bcachefs will be merged and will be able to proof their capabilities it will be allowed to Kent to make more invasive changes.

    Leave a comment:


  • timofonic
    replied
    Originally posted by mdedetrich View Post

    If you actually read the mailing list, it's not being rejected for because of the technical points being outlined.

    Rather the reasons for rejecting this boils down to
    • politics
    • concerns about Kent being bud factor
    • The current fs code in Linux apparently being a bit of a mess and this merge request is triggering people's concerns about the mess of this generic fs code
    • For an optimal bcachefs merge some other kernel specific changes are needed
    Politics? What kind of politics? Politics shouldn't influence Linux kernel development, but purely technical merits.

    I don't get what you mean by "bud factor ". If it's about weed, that's not so bad. I consider it a personal option and shouldn't influence Bcachefs technical merits. It shouldn't if it were something worse, anyway.

    If filesystem code in Linux kernel is a mess, it should be improved instead crying about it because adding another filesystem.

    What other kernel changes are needed that are so important right now?

    Leave a comment:


  • mdedetrich
    replied
    Originally posted by woddy View Post

    What you say is blatantly at odds with what the article claims, since they try once again to add it to the kernel tree, after it has been repeatedly rejected.
    However I will only express my opinion at the end, when it will be available and stable, at the moment it is not regardless of the geniuses who are developing it.‚Äč
    If you actually read the mailing list, it's not being rejected for because of the technical points being outlined.

    Rather the reasons for rejecting this boils down to
    • politics
    • concerns about Kent being bud factor
    • The current fs code in Linux apparently being a bit of a mess and this merge request is triggering people's concerns about the mess of this generic fs code
    • For an optimal bcachefs merge some other kernel specific changes are needed
    Last edited by mdedetrich; 14 July 2023, 07:07 AM.

    Leave a comment:


  • waxhead
    replied
    Originally posted by ehansin View Post
    Interesting as well. More for me to look into and and try to understand. I keep digging deeper and deeper into storage stuff. If I am reading this correctly, using the 2 + 2 disk example, if one disk fails, Btrfs could rebuild duplicate blocks/chunks by copying onto the remaining three disks so that these would then have two copies of the blocks/chunks between the three remaining disks. Of course, that depends on space available to do so. Good or bad, probably depends on your take. But different, that is for sure.
    Yes, for BTRFS you only need 2 disks working regardless if it's configured as "RAID"1 or "RAID"10 ("RAID"10 will in BTRFS degrade to "RAID"1 if needed).
    Let's say you have a RAID10 with 2x 4TB drives, and 2x 500GB drives. If one of the 500GB drives goes "kebab" you do perhaps have enough space on the 4TB drives to keep your array healthy with only 3x drives.

    Now the biggest problem I see with BTRFS is that there is no automation for failure cases which means the filesystem does not automatically drop/mark as failed any disk from the array. It is not possible to have spare devices either which really makes sense when you think about it , but what is really missing is the ability to reserve spare-space which would make sense for any automated recovery. That however is partly mitigated by the "RAID"1c3 and "RAID"1c4 profiles which could be considered "active" spare-space with the benefit of an additional read boost. That being said, recovery is a manual operation in BTRFS as of now. I would also think it would be debatable if recovery policies would be handled by a user space daemon or by the filesystem itself.

    And to try to keep this slightly on-topic , as far as I know , BcacheFS does not (yet) support multiple mirrors or any automated recovery in case of a failed drive. ZFS on the other hand I believe does.

    Leave a comment:


  • tesfabpel
    replied
    Originally posted by EphemeralEft View Post
    I hadn't heard about "same_cpu_crypt" and "submit_from_crypt_cpus", so thank you for mentioning that.
    I've discovered the existence of the flags by reading this blog article https://blog.cloudflare.com/speeding...sk-encryption/ and then visiting the official documentation's parameters page https://gitlab.com/cryptsetup/crypts.../wikis/DMCrypt .

    Frankly, I haven't tested with another FS on top of LUKS though, but I find it strange BTRFS would make such an impact to dm-crypt...

    Leave a comment:


  • fitzie
    replied
    Originally posted by kreijack View Post

    IIRC the discussion was not on this specific code, but the fact that this code required to allocate a page which was both writable and executable. To have this kind of page, Kent *extended* an existing kernel API. The kernel developers complained about this part. This is related to the "preparatory patches". At the time there was a lot of NACK to allow a page to be writable and executable at the same time in a filesystem code.

    I don't know how Kent solved this. IMHO it is better to get rid of this kind of optimization *now* to simplify the inclusion of the code.
    this is the thread where Kent discovers there's a module_alloc function that eliminates the need for bcachefs to bring back vmalloc_exec



    Here is a thread where there seems to be a generic JIT facility being developed in the kernel, which is what Kent plans on using.




    There was some discussion that bcachefs shouldn't worry about optimizing things so soon, but Kent has responded with two facts. 1) this filesystem isn't brand new, it's been around for many years now. 2) that performance is actually important.

    Of course no one expects bcachefs to be fully optimized when it first lands in the kernel, but if bcachefs performance was terrible, no one would give it much attention. obviously what will make or break a filesystem is it's reliability, but the interest in bcachefs is that it will provide the features that mostly exist inside zfs or btrfs, so performance is an obvious differentiator, and i'm glad there's a path forward him doing this.

    Leave a comment:


  • EphemeralEft
    replied
    Originally posted by tesfabpel View Post

    Sorry for the off-topic, but I'd like to ask you a question...

    I'm using BTRFS on top of LUKS 2 on two (surely not weak) PCs (one using Fedora 38 and the other using ArchLinux) but in both, the system when doing a lot of I/O on the main disk (both have an SSD) sometimes there are hiccups (even of some seconds!) in the UI and also the responsiveness of apps (so it's not just the UI that's I/O starved)...

    I've read about some options of cryptsetup to tweak the performance by disabling some features that were needed when they were introduced but they shouldn't be needed anymore, but sadly they didn't fix the problem 100%...

    These are the flags I've used: discards same_cpu_crypt submit_from_crypt_cpus no_read_workqueue no_write_workqueue

    Code:
    # cryptsetup status luks-b8d62e8e-...
    /dev/mapper/luks-b8d62e8e-... is active and is in use.
    type: LUKS2
    cipher: aes-xts-plain64
    keysize: 512 bits
    key location: keyring
    device: /dev/sda3
    sector size: 512
    offset: 32768 sectors
    size: 465514496 sectors
    mode: read/write
    flags: discards same_cpu_crypt submit_from_crypt_cpus no_read_workqueue no_write_workqueue
    Have you used something? Are you affected?

    Thanks.
    I have that problem too. For me it's that IO occasionally hangs and any process doing anything with the disk (reading or writing) also freezes until IO resumes. It hangs for 1-5 minutes and usually happens twice a day. I've been trying to solve this problem for about a year and unfortunately still haven't found a solution. I only use CLI, so I'm not sure what it does to GUI applications.

    I extensively use CgroupsV2 and I've also found that (what should be) low-priority processes still slow down (what should be) high-priority processes. Maybe they're related, maybe not, but switching to Linux 6.3 improved the hanging (the changelog mentions something about propagating priorities through device-mapper, which all dm-* technologies use).

    In addition to using Cgroups (I create them with SystemD slices; that's basically what they are) I'd recommend switching the IO scheduler on all applicable block devices to BFQ. That's needed for ionice to work and probably also Cgroups. The change doesn't persist over reboots, so I'd add a script that runs on every boot. If you don't want to take the Cgroup plunge, at least start using nice and especially ionice. I also tried disabling DM-Crypt workqueues and it didn't help, so I changed it back. If you don't find that it helps, I'd re-enable them. I'd also recommend switching to 384 (192) bit AES. 512 (256) bit AES is way overkill. That's probably not the cause of your issue, but it certainly isn't helping.

    There was also a bug in the kernel regarding TPM and hanging. I'm not sure if it's related and disabling it (TPM and FTPM) didn't help. But it's worth a try. I hadn't heard about "same_cpu_crypt" and "submit_from_crypt_cpus", so thank you for mentioning that.

    Leave a comment:

Working...
X