If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.
I don't have the ability to mkfs a btrfs filesystem unless I go in search of the utilities and compile them myself
I pulled the RHEL6 btrfs-progs source rpm and rebuilt it for CentOS 5 and made myself a btrfs filesystem on an LVM logical volume on an Intel SSD then discovered that I didn't have the kernel module either. One rebuild and reboot later, I mounted my LV with default options and it just reported rw,relatime. I explicitly gave it -o ssd and that appeared in the list but no discard. It only got mounted with the discard option when I explicitly passed that to mount. Maybe LVM is confusing it but it didn't seem to pick up the fact that it was on an SSD automatically here.
Try an older kernel prior to the semaphore removal otherwise known as the big kernel lock removal. It's difficulty to ascertain if there is performance loss from scheduler changes or file-system kernel modifications.
I don't know if you'll see the Michael because there's been so many posts on the topic already, but for graphs such as these would it be difficult to put the names within the bars. I find it somewhat frustrating going back and forth matching names to colors, especially when there are multiple datasets on the same plot.
I also find it hard to work out which is the best solution. I think it would be best to order results with best performance at the top. The current disorder is infinitely harder to analyse.
This is multi-dimensional data. You have disk type on one dimension and fs type on the other. Since it's being presented in a one dimensional form, you can't order by a value effectively.
Arguably a bubble plot would give information in a slightly more consistent way, but I think we'd spend more time explaining the interpretation. Michael and I are always talking visualization in the background.
I can understand that default options is a fair politic , but in this case is very misleading, the reader with poor knowledge, or the reader who quickly looks for graphs without reading, could obtain wrong ideas about filesystem performance. It's a totally unfair to compare filesystem performance mixing barrier = 0 and barrier = 1 options.
It's true that devs enable some features for some fs, meanwhile others disable them , so totally fair benchmark should be difficult, but at least, with barriers option you should be consistent because it impacts the performance numbers by a lot.
Basically each of those benchmarks mean very different things. Good benchmarks for file systems should include some specific microbenchmark that measures certain characteristics like 'I/O Operations Person Second', 'Throughput', and 'Random Access'. Preferably with a mixture of single thread versus multithread performance.
Some of the things you need to keep very careful of is the data set size is correct for the test.
Like, for example, if I am measuring raw I/O speeds for read/write and I have 4 GB of RAM and the dataset I am working with is only 4-5GB then your not really measuring the file systems as much as measuring the file system cache.
It's extremely easy to get these sorts of benchmarks wrong, and extremely difficult for people to tell if you did them right.
Most of the Phoronix file system benchmarks are like this. From the data on the website it's really impossible to even know what they mean. Based on data sizes, specific options for the benchmarks, and a hundred other variables you could be measuring IOps or random access or kernel file system cache or whatever. It's really difficult to tell.
Then beyond the micro benchmarks that are designed to exercise specific aspects of file systems then you want a number of 'general purpose' application-centric file system benchmarks.
This is the sort of thing that readers here would be more interested in. How long does it take for games to load. How many seconds does it take to go from cold boot to having a browser open and pointing at google. How long does it take for a large spreadsheet get loaded into OpenOffice? How long does it take to do it 300 times with a script?
Then probably you will want to see some latency benchmarks. if your reading audio from a file while doing transcoding how hard can you hit the file system before you start having xruns in jack. Can the file system allow heavy loads, handle multitasking well, give you good performance and yet be responsive? If the max performance of the drive for a single threaded read is 150MB/s and I start reading 30 huge files from the drive and dumping them into /dev/null... does the file system keep chugging along at 150MB/s or does it go into meltdown as it can't handle the load and start thrashing....
What happens when I throw 4 cpus, software raid, and 7 drives at it... does it actually scale any?
Or for server stuff...
With a Apache benchmarks backed by MySQL with a average configuration... how many clients can it support. How long does it take to render a page, how many connections can it handle. Does it scale well? Like if I bump the connections up to a insane level does the file system keep chucking along or does it go into meltdown and not be able to handle all the random I/O in a efficient manner?
Small files, big files, databases.
All this stuff is extremely difficult to do right, time consumer, and worse: hugely expensive.
Which is why nobody really does it. It would take months to put together something proper.
Now what Micheal has done is pretty good for a simple article. The file system devs have more interesting benchmarks, corporate sponsorship, and automated tests, but it's going to be even more difficult for the average user to even understand what is going on.
The thing is is that if you asking for a 'summary' of 'what is best' it's really going to be impossible to tell you.
What are you doing? Video encoding, game playing? Server systems? Are you hosting a moderate site on a VPS with slow storage and mysql... or are you hosting large files? What is your application? what is your goals?
You can't just average all the numbers together and expect to have any meaningful answer. The benchmarks are not all equal... some are better then others, some are more relevant then others. What is important to me may be worthless to you!
Trying to add up all the numbers and giving them different weights and trying to graph out the 'winner' is just silly.
You know how I deal with file systems?
I don't. I just use the defaults and buy gobs of RAM.
Because I know that with a desktop I am not going to be using more then 6-8GB for pretty much anything I'd care to do.
So I buy 16GB. After a month of being up I'll have the entire storage pretty much cache'd in RAM and it'll faster then the fastest SSD. :P
What I care about then is thing like sync, write speeds, and that sort of thing.
For my netbook, however, I only have about 8GB worth of storage. So you know what the best FS for me is on that system? Btrfs. Speed be damned. Why? Because it supports transparent online compression, which works perfectly.
Plus it's not Reiserfs.
If you want a summary of what is the best FS for you to use... use Ext4. It's a safe file system.
JFS is effectively unsupported. It's a port of a file system from OS/2 Warp... the AIX JFS is a entirely different beast. It was interesting when it was new, but besides a few fixes here and there it has essentially been unmaintained for years.
XFS is good if you need big datasets. If you have multiple TB-large file systems then XFS is a good choice. It's fast, it scales well, and it behaves well when dealing with large amounts of data. You'll want to have very good hardware for it... it's not nearly as robust as Ext4 is.
BTRFS is good if you want something to play around with. Otherwise leave it alone until distros start using it by default.