Originally posted by gotar
View Post
Bcachefs Changes Rejected Reportedly Due To CoC, Kernel Future "Uncertain"
Collapse
X
-
-
-
well, this section is on loop, see yall in the next bcachefs post lol, hopefully it's one of better news
Leave a comment:
-
-
Originally posted by clipcarl View PostThere are a huge number of people running EXT4 and XFS on their personal computers and the consensus over the years has been that while these filesystems don't have that new sexy filesystem smell they are incredibly reliable and keep users' data reasonably safe even with crashes, bad hardware and power outages.
Sorry, but for "personal computers" you could as well use FAT16 and see no problems for years.
What good is that potential theoretical reliability when so many people routinely disable using COW on their critical files (especially VM images and databases) because the performance impact is too much to bear?
People doing stupid things are your argument, or weakly performing CoW fs (btrfs)? First of all - you shouldn't put VM images on any filesystem at all, there's some (usually other) inside already. Databases (and I'm not talking about data-bags like MySQL) have their own safety features and expect admin to understand the stack. E.g. PostgreSQL WAL-s are a kind of journaling and you can mount ext4 with data=writeback.
The rest was said by mdedetrich already, I won't repeat after him.
Leave a comment:
-
-
Originally posted by waxhead View Postkmalloc() with the __GPF_NOFAIL flag set means that the memory allocation CAN NOT FAIL. I am no expert on the bcachefs source code at all, but this to me sounds like a rather sketchy choice that instead should have been handled with a proper error handler.Originally posted by waxhead View PostYea, I agree with you 100% - no memory allocation can be guaranteed to succeed, but despite this there is a flag in kmalloc() that will, if necessary nuke other things to get that precious bit of memory requested.
And keep in mind that kmalloc() is for objects smaller than page size anyway (on my system that is 4K). If kmalloc() with __GFP_NOFAIL fails there is in other words less than 4096 bytes available in (most?) x86-64 based systems so what the heck are you going to do if it DID return zero? this stuff is for critical things, so the way I see it - if you need to call kmalloc() with __GFP_NOFAIL you are pretty screwed to have reached that critical state anyway.
But out of curiosity I ran a search through the kernel tree, it seems a lot of filesystems actually use __GFP_NOFAIL: XFS, UDF, reiserfs, gfs2, f2fs, ext4 and erofs.
So why is this apparently just a problem for Kent and his bcachefs ?!
Kent does not have a problem with GFP_NOFAIL. The entire flag is there because other filesystems used to "handle out-of-memory errors" by calling kmalloc in a loop, and it was decided that adding a flag that would cause kmalloc() to, essentially, run such a loop within itself is less ugly than open-coding that loop in every callsite.
Kent in fact needed a different flag, basically an opposite of GFP_NOFAIL (do not attempt direct reclaim), which was shot down by other maintainers because it would have conflicted with GFP_NOFAIL used in other places of the kernel.Last edited by intelfx; 22 November 2024, 03:18 PM.
Leave a comment:
-
-
Originally posted by mdedetrich View PostThis is not what the issue is, obviously having backups is a good idea. What I am talking about is that if you have a ZFS partition on non ECC memory and someone reads from it and gets the data from memory, the chance of delivering bad data is significant enough to note. Usually people don't see these issues with non ECC memory because
Originally posted by mdedetrich View Post1. Data that is read from the filesystem without filesystems (i.e. non ZFS) is often done with zero copying so it doesn't go via RAM (which is where ECC matters)Last edited by intelfx; 22 November 2024, 03:29 PM.
Leave a comment:
-
-
Originally posted by clipcarl View PostWhile bcachefs is definitely interesting and has some nice things about it virtually all of its advantages can be be replicated right now with the right combination of logical block layer devices and tried-and-true older filesystems. For example taking the old-school layered approach of dm-verity + MD RAID + Luks + LVM2 + XFS can do just almost everything bcachefs can do now (and has planned for the future) and they can currently do it a lot faster and more reliably than bcachefs.
Yes, I currently do the 5-7 layer stack. No, it isn't feature-par. I need to take serious compromises including compression (VDO won't work under crypt, sometimes it's possible to replace with user-level compression, e.g. ZSTD for Maildirs of dovecot), CPU load (when double-encrypting is the only option), storage usage (when btrfs is the only solution, and the write-patterns cause extent fragmentation), encryption compromises (damn! enabling TRIM over thinly provisioned devices), RAID profiles and so on.
The biggest problem is that the layout needs to be considered carefully for each and every container host. I have literally hundreds of block devices in my systems due to the different demands for different storage usage patterns (MSSQL databases and package builders on XFS, mail and various permanent archives on btrfs, PostgreSQL/TSDB on ext4 or XFS).
About the only thing bcachefs (and ZFS) can do that's not as easy to implement efficiently using the layered approach is compression and that's more of a niche thing for enthusiasts like us than something that's really needed widely. And if you really, really want that you could add Red Hat's VDO to the layers (but it's very bloated and slow).
Noone stores logs (like enterprise-wide centralized logging server) - compressed ones are not handled natively by tools (sure, just zstdcat everything...).
VDO is not only slow, but unusable below crypt. And for the N-way RAID crypt not to waste CPU, it needs to be above RAID.
And MD/DM RAID is set for entire volume, you cannot set it per file/directory not even btrfs subvolume, so for temporary data (caches, source code builders) you just need to set new dozen of devices (LV*2, maybe without integrity nor crypt, RAID0 - at least 3 new block devices).
And while having a one stop shop for everything the way bcachefs and ZFS do can be convenient it can also be a negative when you go all-in on this approach as I found out professionally when we ran into performance issues with ZFS that couldn't be solved without replacing it.
You don't have to use all of the functions at once, just the combination that's required to be bound. For example I would keep LVM at the bottom layer, but use FS-combo for integrity, encryption, compression and RAID profiles (setfacl-able).
What's the difference between encryption and compression? Both of them just transform the data in transit. And encryption can provide integrity (full authentication actually).
Compression must be done before encryption obviously, and RAID should be below ...but bit-rot prevention needs RAID to understand integrity. Therefore:
integrity(authenticated by crypt) -> RAID (integrity-aware) -> encryption -> compression ->
Now, let's say I want some cache-like subvolume to be fast, i.e. RAID-0, not/weakly encrypted and uncompressed:
integrity(authenticated by crypt) -> RAID (integrity-aware and getfacl-aware) -> encryption (getfacl-aware) -> compression (getfacl-aware) -> fs
Good luck simulating this by LVM, VDO, MD, DM and current filesystems.
Remember, that encryption must be per-volume (different keys and credentials) so often you cannot simply use FDE (well, you can - just encrypting second time somewhere above).
Leave a comment:
-
-
Originally posted by varikonniemi View Post
My sources are what kent has written on the LKML.
Clueless is my judgement on how they stalled memory profiling for a year because they did not understand the difficulties in how to make it work, and thought they had a better way to do it -- only to come to realize their way cannot work. And the two instances of where mm developer did go and change kent's code in mm without consulting him, which ended up breaking mm in subtle hard to debug ways.
Leave a comment:
-
-
Originally posted by Old Grouch View Post
I'm not saying you are wrong, but people are likely to put more credence on what you write if you include references for the assertions you are making there. If you don't, your writing resembles that of a 'clueless fanboy' - which is to say, making denigratory assertions without accompanying backing justification/evidence.
I doubt that any kernel maintainers could be described as clueless, given the technically involved work they do.
Clueless is my judgement on how they stalled memory profiling for a year because they did not understand the difficulties in how to make it work, and thought they had a better way to do it -- only to come to realize their way cannot work. And the two instances of where mm maintainer did go and change kent's code in mm without consulting him, which ended up breaking mm in subtle hard to debug ways. Certainly worse than kent's much criticized breaking of the build on a rare arch.Last edited by varikonniemi; 22 November 2024, 02:53 PM.
Leave a comment:
-
-
Originally posted by fotomar View PostI hope the whole diseased temple falls to the ground
Leave a comment:
-
Leave a comment: