No announcement yet.

Large HDD/SSD Linux 2.6.38 File-System Comparison

  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by locovaca View Post
    Sounds like a bug report is in order for them? Btrfs has no problem adapting to a SSD by default.
    I don't have the ability to mkfs a btrfs filesystem unless I go in search of the utilities and compile them myself but I did have a quick read of the btrfs source from 2.6.37 and there are several options there relating to SSDs. There's ssd, ssd_spread, nossd and discard. These all seem to be set independently and the info in Documentation/filesystems/btrfs.txt says that discard is not default. Running `mount` would get it to tell you what options were actually in effect though and I would expect it to say both ssd and discard if both were in effect - ssd does not seem to imply or set discard. I also looked at Documentation/filesystems/nilfs.txt and that also says that nodiscard is default.


    • #17
      ntfs might have been a nice part. i think many ppl use it on mass storages for usability with windoze


      • #18
        Originally posted by energyman View Post
        does jfs support barriers? are they turned on?
        if not, you can disregard the results.
        JFS does not have an option for turning on barriers, and it does not use any barriers at all, as near as I can tell from searching the code for WRITE_FLUSH, or blkdev_issue_flush().

        That's my main problem with the Phoronix benchmarks. It doesn't compare apples and oranges, but instead uses the "default options", which aren't the same across file systems. The fact that it is apparently running with the garbage collector off for nilfs2 also will give very misleading results. File systems that use a copy-on-write also have a tendency to fragment their freespace very badly --- but that's something that doesn't show up if you just do a free mkfs of the file system between each benchmark run, and you don't try to simulate any kind of file system aging before starting the test.

        As with all benchmarks, you need to take them with a huge grain of salt.


        • #19
          Originally posted by tytso View Post
          That's my main problem with the Phoronix benchmarks. It doesn't compare apples and oranges, but instead uses the "default options", which aren't the same across file systems.
          That's done because most people just use the defaults given to them by upstream or their distribution so it's try to meant to be a real world comparison -

          If you would like to recommend a particular set of mount options for each file-system, I would be happy to carry out such tests under those conditions as well to complement the default options.

          Originally posted by tytso View Post
          The fact that it is apparently running with the garbage collector off for nilfs2 also will give very misleading results.
          The cleaner was running, but not quick enough for dbench, I assume a bug in that for NILFS2.
          Michael Larabel


          • #20
            Originally posted by Michael View Post
            That's done because most people just use the defaults given to them by upstream or their distribution so it's try to meant to be a real world comparison -
            The problem with that is that with barrier off, you can lose data --- especially if the system is very busy, and you crash while the disk is thrashing. Disks can sometimes delay writing blocks for full seconds; if you're sending lots of disk traffic to the end of the disk (i.e., a very high block number), and then you update a critical file system metadata block at the beginning of the disk (i.e., a very low block number), and then go back to sending lots of disk writes to the end of the disk, the hard drive can decide that it will avoid seeking to the beginning of the disk, keep it in its volatile RAM cache on the disk, and then focus on handling the writes at the end of the disk. If power drops, the critical file system metadata update could be lost forever. This is why barriers are important.

            Given that very few people are using reiserfs and JFS from distributions, and those file systems are effectively unmaintained, no one has bothered to fix them so they use barriers either by default, or at all. JFS doesn't have the ability to use barriers at all. In the case of reiserfs, Chris Mason submitted a patch 4 years ago to turn on barriers by default, but Hans Reiser vetoed it. Apparently, to Hans, winning the benchmark demolition derby was more important than his user's data. (It's a sad fact that sometimes the desire to win benchmark competition will cause developers to cheat, sometimes at the expense of their users.)

            In the case of ext3, it's actually an interesting story. Both Red Hat and SuSE turn on barriers by default in their Enterprise kernels. SuSE, to its credit, did this earlier than Red Hat. We tried to get the default changed in ext3, but it was overruled by Andrew Morton, on the grounds that it would represent a big performance loss, and he didn't think the corruption happened all that often --- despite the fact that Chris Mason had developed a python program that would reliably corrupt an ext3 file system if you ran it and then pulled the power plug. I suppose we should try again to have the default changed, now that Jan Kara is the official ext3 maintainer, and he works at SuSE.

            So when you say, "it's the default as it comes from the distribution", that can be a bit misleading. The Enterprise distro's change the defaults, for safety's sake. I'm not sure, but I wouldn't be surprised if SuSE forced Reiserfs to use barriers by default when shipped in their Enterprise product, given that they were going to support it and paying customers generally get upset when their file systems get fragged. (Also because Chris Mason worked at SuSE when he submitted the reiserfs barrier patch that got vetoed by Hans.) So just because it is one way as shipped by a community distribution, or from upstream, doesn't necessarily that is the way enterprise distros will ship things when paying customers have real money on the line.

            If people are just simply reading your reports for entertainment's sake, that's one thing. But if you don't warn them about the dangers of the default options, and they make choices to use a file system based on your report --- should you feel any concern? I suppose that's between you and your conscience and/or your sense of journalistic integrity....

            - Ted


            • #21
              Originally posted by drag View Post
              Hopefully never going to happen. Micheal should have better sense then that.

              Such things would be so blindingly worthless and counterproductive that it would counter any sort of positive benefit PTS file system benchmarks can offer.
              Care to explain to the _blind_?


              • #22
                Originally posted by TrevorPH View Post
                I don't have the ability to mkfs a btrfs filesystem unless I go in search of the utilities and compile them myself
                I pulled the RHEL6 btrfs-progs source rpm and rebuilt it for CentOS 5 and made myself a btrfs filesystem on an LVM logical volume on an Intel SSD then discovered that I didn't have the kernel module either. One rebuild and reboot later, I mounted my LV with default options and it just reported rw,relatime. I explicitly gave it -o ssd and that appeared in the list but no discard. It only got mounted with the discard option when I explicitly passed that to mount. Maybe LVM is confusing it but it didn't seem to pick up the fact that it was on an SSD automatically here.


                • #23

                  Reiser3 is still the best all around choice.
                  * Fault Tolerant
                  * Efficient
                  * Static


                  • #24
                    Performance test

                    Try an older kernel prior to the semaphore removal otherwise known as the big kernel lock removal. It's difficulty to ascertain if there is performance loss from scheduler changes or file-system kernel modifications.


                    • #25
                      I don't know if you'll see the Michael because there's been so many posts on the topic already, but for graphs such as these would it be difficult to put the names within the bars. I find it somewhat frustrating going back and forth matching names to colors, especially when there are multiple datasets on the same plot.


                      • #26
                        Originally posted by Xanikseo View Post
                        I also find it hard to work out which is the best solution. I think it would be best to order results with best performance at the top. The current disorder is infinitely harder to analyse.
                        This is multi-dimensional data. You have disk type on one dimension and fs type on the other. Since it's being presented in a one dimensional form, you can't order by a value effectively.

                        Arguably a bubble plot would give information in a slightly more consistent way, but I think we'd spend more time explaining the interpretation. Michael and I are always talking visualization in the background.


                        • #27
                          Why couldn't thay have named it MILFS?

                          I'd been reformatting right now.


                          • #28
                            I can understand that default options is a fair politic , but in this case is very misleading, the reader with poor knowledge, or the reader who quickly looks for graphs without reading, could obtain wrong ideas about filesystem performance. It's a totally unfair to compare filesystem performance mixing barrier = 0 and barrier = 1 options.

                            It's true that devs enable some features for some fs, meanwhile others disable them , so totally fair benchmark should be difficult, but at least, with barriers option you should be consistent because it impacts the performance numbers by a lot.


                            • #29

                              As usual, reiser4 is missing. A shame.


                              • #30
                                Originally posted by cruiseoveride View Post
                                Care to explain to the _blind_?
                                Yeah sure. Didn't know if anybody cared. :P

                                Basically each of those benchmarks mean very different things. Good benchmarks for file systems should include some specific microbenchmark that measures certain characteristics like 'I/O Operations Person Second', 'Throughput', and 'Random Access'. Preferably with a mixture of single thread versus multithread performance.

                                Some of the things you need to keep very careful of is the data set size is correct for the test.

                                Like, for example, if I am measuring raw I/O speeds for read/write and I have 4 GB of RAM and the dataset I am working with is only 4-5GB then your not really measuring the file systems as much as measuring the file system cache.

                                It's extremely easy to get these sorts of benchmarks wrong, and extremely difficult for people to tell if you did them right.

                                Most of the Phoronix file system benchmarks are like this. From the data on the website it's really impossible to even know what they mean. Based on data sizes, specific options for the benchmarks, and a hundred other variables you could be measuring IOps or random access or kernel file system cache or whatever. It's really difficult to tell.

                                Then beyond the micro benchmarks that are designed to exercise specific aspects of file systems then you want a number of 'general purpose' application-centric file system benchmarks.

                                This is the sort of thing that readers here would be more interested in. How long does it take for games to load. How many seconds does it take to go from cold boot to having a browser open and pointing at google. How long does it take for a large spreadsheet get loaded into OpenOffice? How long does it take to do it 300 times with a script?

                                Then probably you will want to see some latency benchmarks. if your reading audio from a file while doing transcoding how hard can you hit the file system before you start having xruns in jack. Can the file system allow heavy loads, handle multitasking well, give you good performance and yet be responsive? If the max performance of the drive for a single threaded read is 150MB/s and I start reading 30 huge files from the drive and dumping them into /dev/null... does the file system keep chugging along at 150MB/s or does it go into meltdown as it can't handle the load and start thrashing....

                                What happens when I throw 4 cpus, software raid, and 7 drives at it... does it actually scale any?

                                Or for server stuff...

                                With a Apache benchmarks backed by MySQL with a average configuration... how many clients can it support. How long does it take to render a page, how many connections can it handle. Does it scale well? Like if I bump the connections up to a insane level does the file system keep chucking along or does it go into meltdown and not be able to handle all the random I/O in a efficient manner?

                                Small files, big files, databases.

                                All this stuff is extremely difficult to do right, time consumer, and worse: hugely expensive.

                                Which is why nobody really does it. It would take months to put together something proper.

                                Now what Micheal has done is pretty good for a simple article. The file system devs have more interesting benchmarks, corporate sponsorship, and automated tests, but it's going to be even more difficult for the average user to even understand what is going on.

                                The thing is is that if you asking for a 'summary' of 'what is best' it's really going to be impossible to tell you.

                                What are you doing? Video encoding, game playing? Server systems? Are you hosting a moderate site on a VPS with slow storage and mysql... or are you hosting large files? What is your application? what is your goals?

                                You can't just average all the numbers together and expect to have any meaningful answer. The benchmarks are not all equal... some are better then others, some are more relevant then others. What is important to me may be worthless to you!

                                Trying to add up all the numbers and giving them different weights and trying to graph out the 'winner' is just silly.

                                You know how I deal with file systems?

                                I don't. I just use the defaults and buy gobs of RAM.

                                Because I know that with a desktop I am not going to be using more then 6-8GB for pretty much anything I'd care to do.

                                So I buy 16GB. After a month of being up I'll have the entire storage pretty much cache'd in RAM and it'll faster then the fastest SSD. :P

                                What I care about then is thing like sync, write speeds, and that sort of thing.

                                For my netbook, however, I only have about 8GB worth of storage. So you know what the best FS for me is on that system? Btrfs. Speed be damned. Why? Because it supports transparent online compression, which works perfectly.

                                Plus it's not Reiserfs.

                                If you want a summary of what is the best FS for you to use... use Ext4. It's a safe file system.

                                JFS is effectively unsupported. It's a port of a file system from OS/2 Warp... the AIX JFS is a entirely different beast. It was interesting when it was new, but besides a few fixes here and there it has essentially been unmaintained for years.

                                XFS is good if you need big datasets. If you have multiple TB-large file systems then XFS is a good choice. It's fast, it scales well, and it behaves well when dealing with large amounts of data. You'll want to have very good hardware for it... it's not nearly as robust as Ext4 is.

                                BTRFS is good if you want something to play around with. Otherwise leave it alone until distros start using it by default.