Originally posted by Quackdoc
View Post
Announcement
Collapse
No announcement yet.
Another Look At The Bcachefs Performance on Linux 6.7
Collapse
X
-
http://www.dirtcellar.net
- Likes 3
-
Originally posted by waxhead View Post
I understand what you mean, but I still think it would be valuable to see how each filesystem copes if running on "ideal hardware" without any latency introduced by the hardware itself (SSD/HDD). I agree that your workload is the only thing that matters, but why exactly do you think it is useless? If filesystem X takes 10 seconds doing operation A and filesystem Y only uses 5 seconds it tells something about the efficiency of the algorithm being used. Yes, the results may be completely reversed on real hardware depending on access patterns, but since we are moving more and more towards storage devices that mostly are "regular RAM" I think it would be a good test still. I would love to hear more about why you think such a test is useless?!
I do think test's in isolation are useless because they will never be indicative of what a filesystem will actually be put through, there are thousands upon thousands of potential variables
- Likes 2
Comment
-
A few questions / points:- Why was a 512 byte block size used for bcachefs but 4k for all the other filesystems? SSDs virtually always perform much better with a 4k block size (though I haven't personally tested the difference on bcachefs).
- This article doesn't mention whether the bcachefs `discard` option was enabled. For this to be a fair test, that option should probably be disabled. (Bcachefs basically does a "trim" on its own as blocks are freed when that option, which is on by default, is enabled. Trim operations are usually pretty slow especially on consumer SSDs. The other filesystems don't automatically trim blocks-- the administrator needs to manually run something like `fstrim` and thus the trim penalty wouldn't be included in their results.)
- This article doesn't mention what was done in between filesystems to reset the SSD back to its "factory" state. Without this, the filesystems tested first would have a huge advantage over the filesystems tested later. You can't even just wait an hour or a day or whatever between tests. Something like using `nvme format /dev/nvme0n0 -s 2` might work (but double check for the specific drive). For SATA drives, usually a security erase does the trick and that `nvme format` command does something similar for NVME drives. Or simply use a different identical brand new SSD for each filesystem's tests. If you're not sure how to do a complete reset of the drive's state something like `blkdiscard /dev/nvme0n0` would be much better than nothing. (But in any case see the caveat below.)
- Because SSDs do garbage collection in the background, in order to get the most accurate results the tests should be run in the same order for all the filesystems with no other filesystem operations besides the tests. The amount of time before and in between each test should also be the same. I'd recommend using a script that includes the reset and every test so there's zero time in between tests. Was that done here? Manually starting individual tests when you get around to it is a good way to get non-meaningful results (I have no reason to believe that was done here).
- The Flexible IO Tester (fio) results (up to 1.5m IOPS) appear to suggest that the test setup was one that I wouldn't consider useful because it is only testing how quickly the SSD's controller can talk to its DRAM cache for very short bursts and not what the SSD can actually sustain over time. No SSD can really do 1.5m random 4k IOPS actually writing to the flash. (SSD manufacturers advertise that same largely useless kind of number for their consumer SSDs I guess because they assume that regular consumers and reviewers don't know better. Enterprise SSD spec sheets usually show much more useful sustained IOPS which is why enterprise SSD numbers can often at first glance look a lot slower even though they're often much faster in reality.) IMHO, if you're going to go through the trouble of running fio tests they should be set up with a large enough random dataset to completely overwhelm the SSD's DRAM and SLC caches and actually measure what the SSD can sustain over long periods of time.
- To make sure the benchmark numbers for each filesystem are as accurate as possible, the entire test run for all filesystems should be repeated multiple times. For example, 3 times on 3 successive days. If your results for a particular filesystem aren't reasonably close on each day you ran them then you know your test environment and controls need more work.
- Likes 4
Comment
-
Originally posted by waxhead View PostPersonally i think filesystem tests should be performed on a block device in ram.
- Likes 1
Comment
-
Originally posted by Berniyh View PostReally? I can think of many things that made Linux popular, but this isn't one of them.
I wouldn't even agree that it's really part of the core development technique for most tools.
I mean … one of the core components in the Linux ecosystem was/is X11 and that fails hugely in terms of KISS.
Last edited by cj.wijtmans; 30 November 2023, 03:35 PM.
Comment
-
Originally posted by hyperchaotic View PostAnd by a monolithic beast you mean a highly modular init and runtime management/admin system comprising many optional components each with their own purpose.
I'm a bit older than probably most of you here so I actually know and remember from my own professional experience what made Unix (and now Linux) such a force to be reckoned with over the last 40+ years. It really was in large part due to the KISS (Keep It Simple, Sir) philosophy. For Unix and Linux that boiled down to the idea that each program does exactly one thing and does it well. In practice this allowed vendors of integrated solutions to customize their systems to their specific needs in a way that's not as easy with more monolithic, tightly integrated software stacks. Need a `login` with different features? Just replace it with a different one without having to worry too much about it breaking half the system. Same thing for `syslogd`, the initrd system and lots of other things that have been and are being consumed by Systemd.
Personally I haven't figured out what problem(s) Systemd is supposed to solve that no other system can in the more traditional way. Which is part of the reason why I run Alpine Linux (with OpenRC) on my personal servers and Artix Linux (with OpenRC) on my personal workstations. Obviously professionally I do have to deal with Systemd, though.
It's a matter of personal preference and people can run whatever they want. But it does annoy me a little that pro-Systemd people often appear to go out of their way to make it much more difficult for useful software to be used without Systemd seemingly for no real benefit to the users. Obviously, that could simply be a matter of erroneous perception but it does sometimes seem that way.
Comment
-
Originally posted by Berniyh View PostWell, it failed me 3 times in the last 10-15 years. btrfs, so far, has not yet failed me in roughly the last 10 years. So now what?
Of course, nothing can actually repace backups other than more backups.
But, as an ext4 user, how do you even know that you have faulty data and that you need your backup?
tbh, (data) checksumming support should be a standard thing in filesystems today.
(Edit: and yes, that's what happened to me. Faulty data I noticed by pure coincidence.)
Though to be fair, both loses was like 10 years ago. Not had any problems with btrfs in 10 yearLast edited by carewolf; 30 November 2023, 04:47 PM.
- Likes 5
Comment
Comment