Originally posted by drag
View Post
And of course, if the benchmark isn't testing with the same size files as what you plan to use (small files? typical file sizes as seen by Open Office? MP3 sized files? Video files?), the results may also be not the same.
Finally, for all file systems, how full the file system is during the test will also make a huge difference. It's very easy for file systems to exhibit good performance if you're only using the first 10-20% of the file system. But how does the file system perform when it's 50% full? 75% full?
I can tell you that ext4's performance can be quite bad if you take an ext4 file system which is 1 or 2TB, and fill it up to 95% with file sizes are mostly 8-64 megabytes in size, especially if this happens where many 8-64 meg files are getting deleted, and then replaced with other 8-64 meg files, and the disk gradually fills up until it's 95% full. Since this is a workload (using ext4 as a backend store for cluster file systems) is one that I'm paid to care about at $WORK, I'm currently working on an extension to ext4 to help address this particular case.
Measuring how file systems' performance fall off as the disk utilization increases is hard, yes. But the question is whether the benchmarking is being done primarily for entertainment's sake (i.e., to drive advertising dollars, like a NASCAR race without the car crashes), or to help users make valid decisions about which file system to use, or to help drive improvements in one or more file systems. Depending on your goals, how you approach the file system benchmarking task will be quite different.
-- Ted
Leave a comment: