Bcachefs Reining In Bugs: Test Dashboard Failures Drop By 40% Over Last Month

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • noigai
    replied
    Originally posted by lyamc View Post
    I like it how every time someone tries to demonstrate some form of data eating, all they do is demonstrate how reliable/robust the filesystem is.
    Yes, it's my opinion too, we shouldn't mix the two.

    I've had all my home storage in zfs since 2008 and after reading about actual disk corruption rates, I thought that checksumming everything was overkill.

    However, on all these years, I've had 3 cases of silent corruption on different computers which was detected by zfs and notified to me by email, and scrubs would always find wrong data.

    As I was away for few months, I was unable to troubleshoot the problem and zfs mirroring and checksumming allowed everything to work. When I finally got to troubleshoot the issue, it was fixed by replacing the SATA cables. No errors in kernel logs.

    At work, we used to have a backup on our NTFS san for "corruption prevention". I've always asked the tech team how would we detect corruption and the answer was "if some client reports wrong data". At the end of the project, as I was archiving, I saw two xml files of 12 and 33mb each, when should be few kb. The files were corrupt and unreadable, and yet nothing in the SAN or NTFS or Windows had detected anything.

    For me the situation is clear, even if disks work fine, there's always the possibility to have failure somewhere else, which is more common and detected by checksumming. So *my* data will always be in a checksummed file system.

    I really look forward to bcachefs being production ready and tested in few years time.

    Leave a comment:


  • Quackdoc
    replied
    Originally posted by curfew View Post
    Weird that you say btrfs is worse than that, but I've never encountered these issues. Last time I had an issue with btrfs was a couple years ago when I heavily undervolted my laptop and the system would crash during initrd rebuild, which resulted in total loss of all files modified during the system upgrade. Won't blame the fs because essentially it was a hardware failure.
    The worst one I had the unfortunate experience of having a bad root btree issue that corrupted the entire drive minutes before a backup was scheduled causing me to loose a days worth of, very productive, work. But I have had many devices running btrfs and all of them without fail had data corruption in the end, last one was 2 years ago now I think. after that I was done, btrfs has marinated too long and I don't trust it

    Leave a comment:


  • lyamc
    replied
    I like it how every time someone tries to demonstrate some form of data eating, all they do is demonstrate how reliable/robust the filesystem is.

    Leave a comment:


  • varikonniemi
    replied
    Originally posted by curfew View Post
    Weird that you say btrfs is worse than that, but I've never encountered these issues. Last time I had an issue with btrfs was a couple years ago when I heavily undervolted my laptop and the system would crash during initrd rebuild, which resulted in total loss of all files modified during the system upgrade. Won't blame the fs because essentially it was a hardware failure.
    Weird that you take your experience as some standard.

    What the question here was that there exists no documented case where bcachefs has eaten the user's data. BTRFS used to be notorious for eating data, even for trivial things like running out of disk space.

    And no, hardware failure should not eat the filesystem. At most the data that is being written when the failure happens. That's why critical data structures have multiple copies, like the superblock. So that you can always recover.
    Last edited by varikonniemi; 03 November 2024, 01:37 PM.

    Leave a comment:


  • curfew
    replied
    Originally posted by Quackdoc View Post
    Great work with bcachefs, Im okay with downtime bugs, I'm not okay with loosing data, For this somehow bcachefs has already managed to be better then btrfs has been for me. I have had a few issues on my artix install, for some reason bcachefs fails to remount until I can get to the "continue system startup anyways" stage, and then it works perfectly fine, no bloody clue what that is about, but that's no more then a nuisance.

    I did have a real downtime bug the other day however when I upgraded kernels and had an unclean power off. for some reason fsck failed to kick in and would just hang, I boot an archlinux iso I had hanging around (maybe 2-3 kernels old now), ran bcachefs fsck, it ran fine, reboot PC, boot fsck ran fine, and I booted no problems, a bit weird, but hey, no data loss.
    Weird that you say btrfs is worse than that, but I've never encountered these issues. Last time I had an issue with btrfs was a couple years ago when I heavily undervolted my laptop and the system would crash during initrd rebuild, which resulted in total loss of all files modified during the system upgrade. Won't blame the fs because essentially it was a hardware failure.

    Leave a comment:


  • Raka555
    replied
    Originally posted by Quackdoc View Post
    Great work with bcachefs, Im okay with downtime bugs, I'm not okay with loosing data, For this somehow bcachefs has already managed to be better then btrfs has been for me. I have had a few issues on my artix install, for some reason bcachefs fails to remount until I can get to the "continue system startup anyways" stage, and then it works perfectly fine, no bloody clue what that is about, but that's no more then a nuisance.

    I did have a real downtime bug the other day however when I upgraded kernels and had an unclean power off. for some reason fsck failed to kick in and would just hang, I boot an archlinux iso I had hanging around (maybe 2-3 kernels old now), ran bcachefs fsck, it ran fine, reboot PC, boot fsck ran fine, and I booted no problems, a bit weird, but hey, no data loss.
    I am not okay with downtime or eating data, but I am okay with waiting another year

    Leave a comment:


  • Quackdoc
    replied
    Great work with bcachefs, Im okay with downtime bugs, I'm not okay with loosing data, For this somehow bcachefs has already managed to be better then btrfs has been for me. I have had a few issues on my artix install, for some reason bcachefs fails to remount until I can get to the "continue system startup anyways" stage, and then it works perfectly fine, no bloody clue what that is about, but that's no more then a nuisance.

    I did have a real downtime bug the other day however when I upgraded kernels and had an unclean power off. for some reason fsck failed to kick in and would just hang, I boot an archlinux iso I had hanging around (maybe 2-3 kernels old now), ran bcachefs fsck, it ran fine, reboot PC, boot fsck ran fine, and I booted no problems, a bit weird, but hey, no data loss.

    Leave a comment:


  • KernelCrasher
    replied
    Originally posted by Siuoq View Post
    Ext2-3-4 work great for me
    I like all 3 of them except ext2 and ext3.

    Leave a comment:


  • Siuoq
    replied
    Originally posted by varikonniemi View Post

    Everyone laughed at the "provocative" catchphrase but as of now it seems very possible to be the first filesystem that has never and will never eat data while being mainline.
    Ext2-3-4 work great for me

    Leave a comment:


  • varikonniemi
    replied
    Originally posted by niner View Post

    Can we please bury this myth?


    Version Kernel: 2620221 Tools: 7af94e14b5a9945c28a3c34e58b920e81c696a53 Description If you enable prjquota at the format time and then try to make a snapshot, you would not be able to mount the fil...


    I mean yeah, if you define "not eaten any data" as "files don't tend to magically disappear from directories -- they show up, but then you get failures trying to stat() them" then maybe yes:
    https://www.reddit.com/r/bcachefs/co..._data_without/
    nothing in there is a demonstration of "eating your data", mainly just speculation from users that don't know what they are doing.

    As i said in my initial message, it's completely different between "a bug was hit, data is unavailable until fsck is extended to recover it" and "the data is gone / eaten" The second one has not yet happened on bcachefs since it has been mainline.
    Last edited by varikonniemi; 01 November 2024, 11:26 AM.

    Leave a comment:

Working...
X