Announcement

Collapse
No announcement yet.

Benchmarks Of ZFS-FUSE On Linux Against EXT4, Btrfs

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Oh, I just wanted to chime in and say that, for my own dedicated server, I'm running a hardware RAID controller (Adaptec 5405 if you're interested) with four 1.5TB Seagate SATA disks, on Ubuntu 10.04.1 Server. I'm using the XFS filesystem due to the way it is tuned for smooth I/O performance and parallel access; I don't need the raw throughput of ext4, but I do need data safety and "fair" scheduling of I/O across processes, two things XFS is very good at.

    I was using ext4 on Software RAID5 before, but I realized my mistake when I was able to quadruple my write performance by moving to Hardware RAID10 and XFS.

    I don't think I will be upgrading anything as low-level as the filesystem on my server for at least a year (I tolerated the ext4 for a year before I tossed it), but if I ever do, I will definitely have to re-evaluate my options and see if btrfs has matured or if native ZFS on Linux is a reality.

    Comment


    • #32
      Another small typo in the article. In the text below the IOZone 8GB write test chart (second chart on page 4), judging by the image it's meant to read:
      When carrying out an 8GB write test with a 64Kb block size in IOzone, EXT4 and Btrfs were 1.64~1.67x faster than ZFS-FUSE.

      Comment


      • #33
        Originally posted by locovaca View Post
        What's your business case for a transfer that is going to take 30 minutes but yet may have been altered from when you started? If you're looking to back up something like a transactional database making copies of open files is not the way to be going.
        guess what - it is doable with zfs. Just because your filesystem of choice can't do it, it doesn't mean that it is impossible to do.

        Comment


        • #34
          Originally posted by edogawaconan View Post
          guess what - it is doable with zfs. Just because your filesystem of choice can't do it, it doesn't mean that it is impossible to do.
          It's doable doesn't means it's good practice. It's doable to strip PAM out of Linux and run everything as root, too. Making and trusting backups of open files is very bad business. There is no guarantee that the application has those files in any sort of useable state.

          Comment


          • #35
            Originally posted by locovaca View Post
            It's doable doesn't means it's good practice. It's doable to strip PAM out of Linux and run everything as root, too. Making and trusting backups of open files is very bad business. There is no guarantee that the application has those files in any sort of useable state.
            Then guess why it's on the mysql page...

            Comment


            • #36
              Originally posted by krogy View Post
              ext4 + lvm2 on top of your raid configuration of choice and you are done sir.
              and this way protects you also from the screw ups of the filesystem itself.
              Well apparently that was not allowed according to the OP

              Comment


              • #37
                Originally posted by korpenkraxar View Post
                Well apparently that was not allowed according to the OP
                At least one person claimed that lvm in snapshot mode killed performance. A benchmark to confirm this would be nice

                Comment


                • #38
                  Originally posted by edogawaconan View Post
                  Yes it says it is possible but nothing about whether or not it is safe. I guess the objection would be that a database consists of disk state + CPU/RAM state at any point during operation. You can backup a snapshot of the disk but not the other activities at a given point in time.

                  Comment


                  • #39
                    Originally posted by edogawaconan View Post
                    w00t, a cron script to create a backup of a database and copy it over ssh. Totally innovative and automatic. Thanks ZFS!

                    Of course, if were to page forward, you'd see stuff like this:

                    3.

                    Start up mysqld on the slave. If you are using InnoDB, Falcon or Maria you should get auto-recovery, if it is needed, to make sure the table data is correct, as shown here when I started up from our mid-INSERT snapshot:

                    InnoDB: The log sequence number in ibdata files does not match
                    InnoDB: the log sequence number in the ib_logfiles!
                    081109 15:59:59 InnoDB: Database was not shut down normally!
                    InnoDB: Starting crash recovery.
                    InnoDB: Reading tablespace information from the .ibd files...
                    InnoDB: Restoring possible half-written data pages from the doublewrite
                    InnoDB: buffer...
                    081109 16:00:03 InnoDB: Started; log sequence number 0 1142807951
                    081109 16:00:03 [Note] /slavepool/mysql-5.0.67-solaris10-i386/bin/mysqld: ready for connections.
                    Version: '5.0.67' socket: '/tmp/mysql.sock' port: 3306 MySQL Community Server (GPL)

                    On MyISAM, or other tables, you may need to run REPAIR TABLE, and you might even have lost some information. You should use a recovery-capable storage engine and a regular synchronization schedule to reduce the risk for significant data loss.
                    Uhm, no thanks. I don't shut down my machines by flicking the switch on the power strip, and I don't backup my databases by doing the same.

                    Comment


                    • #40
                      Originally posted by locovaca View Post
                      Uhm, no thanks. I don't shut down my machines by flicking the switch on the power strip, and I don't backup my databases by doing the same.
                      If you actually know how database works, you should know that it's normal. Unless you're using crappy database, that is (mysql with myisam engine).

                      Comment


                      • #41
                        Originally posted by edogawaconan View Post
                        If you actually know how database works, you should know that it's normal. Unless you're using crappy database, that is (mysql with myisam engine).
                        I do. I would be fired within 12 hours of making backups like this at my job. Obviously you've never heard of differential/incremental/transaction log backups.

                        Comment


                        • #42
                          Thanks for benchmark. I have few questions.

                          Why only single disk was used for test? It is much more common to use more disk in some RAID configuration on zfs.

                          Did you compiled zfs-fuse manually? Was debbuging disabled?

                          What was zfs-fuse options? I'm using currently
                          Code:
                          DAEMON_OPTS="${DAEMON_OPTS} --fuse-attr-timeout 60.0" # caching timeout of attributes in kernel
                          DAEMON_OPTS="${DAEMON_OPTS} --fuse-entry-timeout 60.0" # caching timeout of entries in kernel
                          DAEMON_OPTS="${DAEMON_OPTS} --max-arc-size 1024" # maximal size of ARC in zfs-fuse
                          DAEMON_OPTS="${DAEMON_OPTS} --vdev-cache-size 100" # maximal cache for dev blocks in zfs-fuse
                          #DAEMON_OPTS="${DAEMON_OPTS} --disable-block-cache" # uses O_DIRECT for accesing block device, so no caching of dev blocks in kernel. seting this will disable mmap support for files
                          #DAEMON_OPTS="${DAEMON_OPTS} --disable-page-cache" # disables caching of file pages in kernel
                          and this still can be tuned.

                          As of FUSE performance. On FUSE performance side main problem currently is xattr support, they are not cached on kernel side, and each read makes context switch to zfs-fus, to read them again, it kills performance. Be sure to have disabled xattrs and acls. For me it is most important problem as I really need xattrs.

                          Beyond that you can see that READ performance if pretty good. The write performance is much worse. What of the reason (in my option) is that locking beetweeb threads is killing it (zfs-fuse is currently using about 20 threads, and if each write request will go to random one, which isn't very good idea).

                          As of comparing to ext4, i also would need to ask was journal=data used? I think it should to have good comparission.


                          Fuse based file system CAN be fast (see at commercial version of ntfs-3g), but still zfs-fuse is not there.

                          Comment


                          • #43
                            Originally posted by locovaca View Post
                            I do. I would be fired within 12 hours of making backups like this at my job. Obviously you've never heard of differential/incremental/transaction log backups.
                            zfs snapshot completes in an instant. How it is done guarantees consistency of the database state (ie. no lost/duplicate log because of slight time difference in backup. There is only incomplete transaction which needs to be rolled back/forward).

                            Also from postgresql documentation:

                            An alternative file-system backup approach is to make a "consistent snapshot" of the data directory, if the file system supports that functionality (and you are willing to trust that it is implemented correctly). The typical procedure is to make a "frozen snapshot" of the volume containing the database, then copy the whole data directory (not just parts, see above) from the snapshot to a backup device, then release the frozen snapshot. This will work even while the database server is running. However, a backup created in this way saves the database files in a state where the database server was not properly shut down; therefore, when you start the database server on the backed-up data, it will think the previous server instance had crashed and replay the WAL log. This is not a problem, just be aware of it (and be sure to include the WAL files in your backup).

                            Comment


                            • #44
                              Originally posted by edogawaconan View Post
                              zfs snapshot completes in an instant. How it is done guarantees consistency of the database state (ie. no lost/duplicate log because of slight time difference in backup. There is only incomplete transaction which needs to be rolled back/forward).

                              Also from postgresql documentation:
                              As I said, it's the exact same as powering down your computer with the switch.

                              Just because the fire extinguisher in my house should do a good job doesn't mean I burn every bag of trash in the can and put it out with the extinguisher.

                              If you're going through all of the effort to set up a cron job, there's no point to ZFS snapshots for database backups. Cron an incremental backup to a remote device every hour. It's the same amount of effort, at worst the same amount of drive space, and it is 100% guaranteed to give you a sane database file.

                              Comment


                              • #45
                                Originally posted by locovaca View Post
                                As I said, it's the exact same as powering down your computer with the switch.
                                Powering down computer with switch has risk of corrupting partially written data. It doesn't happen in zfs snapshot. How is it exactly same?

                                Comment

                                Working...
                                X