Announcement

Collapse
No announcement yet.

FreeBSD ZFS vs. Linux EXT4/Btrfs RAID With Twenty SSDs

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by Beherit View Post
    I take it hardware RAID is a thing of the past?
    I guess that's the only meaningful conclusion from these tests. Not just hw raid, any kind of raid. If you have fast storage devices, raid will just slow them down. Remember, D in RAID stands for (spinning) disks And I have yet to see a sensible implementation of RA(I)F.

    So for now on my approach is to set SSDs or NVMes as independent devices, each with its own filesystem, and take care of reliability / redundancy in a layer above them. Think things like moosefs, ceph or beegfs.

    What I'm really curious tho is if it makes sense to put a lvm cache of optane nvme drive in front of tlc / qlc capacity ssd. Will need to make a large(ish) purchase decision on that soon ...

    Comment


    • #22
      i will love to have a test of ZFS linux vs freeBSD also you should contact the writers of the ZFS books to have best optimisations and not fall into a biased test. I know they works with the postgres guy to have better performance. For exemple not having postgres ADN ZFS do the same job twice, just choose one to do it.

      https://twitter.com/allanjude
      https://twitter.com/mwlauthor

      that could help all you ZFS test in the future especially if you compare FreeBSD to anything as they are FreeBSD admins.

      Comment


      • #23
        Originally posted by pegasus View Post
        What I'm really curious tho is if it makes sense to put a lvm cache of optane nvme drive in front of tlc / qlc capacity ssd. Will need to make a large(ish) purchase decision on that soon ...
        Or just use Optane for the whole storage: http://www.linuxsystems.it/2018/05/o...t4-benchmarks/
        ## VGA ##
        AMD: X1950XTX, HD3870, HD5870
        Intel: GMA45, HD3000 (Core i5 2500K)

        Comment


        • #24
          Originally posted by ryao View Post
          Also, he is still using compilebench, which is an utterly useless benchmark because it does not tell us what would be faster on a filesystem. Compilation takes about the same time on any filesystem because it is CPU bound, not IO bound.
          How would you then explain huge measured differences in compilebench across different filesystems?
          If "compilation takes about the same time" then surely these benchmarks must be measuring something related to the filesystem, don't they...

          Comment


          • #25
            It's neat to see how each of these stacks up with the defaults. That might not be useful in the field, though. This article has some good advice and configuration benchmarks for setting up ZFS for performance:
            https://calomel.org/zfs_raid_speed_capacity.html
            The ZFS tuning guide has some Postgresql advice too. A rough mantra is that more vdevs mean more performance and you're sorta limited to a single drive's speed within a vdev. That's not exactly true, but sorta true.
            it would be neat if we had "profiles" to make it easier for sysadmins to assign tuning for known quantities like PostgreSQL, but, again, I suppose that would be related to defaults and workloads. Anyway, poorly tuned for pgsql is probably much better than not tuned for pgsql. There's the Evil Tuning Guide for voiding the warranty.
            Somebody above asked about ashift on FreeBSD - it's controlled by a sysctl rather than a zpool create option, but at least on FreeNAS the minimum is 12. Since it's 2^n the max is 13 or 14 and that's plenty.

            Comment


            • #26
              Originally posted by ryao View Post

              I have given up on expecting Michael to benchmark meaningful configurations.

              Also, he is still using compilebench, which is an utterly useless benchmark because it does not tell us what would be faster on a filesystem. Compilation takes about the same time on any filesystem because it is CPU bound, not IO bound.
              compilebench does not benchmark compilation, it's a benchmark program that simulates all the IO that heavy compilation does so it's 100% IO bound.

              Compilebench tries to age a filesystem by simulating some of the disk IO common in creating, compiling, patching, stating and reading kernel trees. It indirectly measures how well filesystems can maintain directory locality as the disk fills up and directories age. Thanks to Matt Mackall for the idea of simulating kernel compiles to achieve this.

              Comment


              • #27
                Originally posted by chilinux View Post
                With 12TB hard drives available, it is not hard to build an petabyte array. According to the ZFS rule of thumb of providing 1GB of RAM for every 1TB of disk, that petabyte array should be used with a system that has 1,000 gigabytes of RAM?!?! Majority of server motherboards I have worked with top out around below a fifth of that!
                That's your own problem. Servers that can have 512GB or even 1TB RAM exist (usually dual or quad CPU boards) and if you want any semblance of performance on such large array size in a server that isn't just cold storage you will probably want them.

                Comment


                • #28
                  Originally posted by Beherit View Post
                  I take it hardware RAID is a thing of the past?
                  At these scales maybe, but hardware raid cards are still a thing in many servers. They are most common on windows servers of course.

                  Comment


                  • #29
                    Originally posted by pegasus View Post
                    I guess that's the only meaningful conclusion from these tests. Not just hw raid, any kind of raid. If you have fast storage devices, raid will just slow them down. Remember, D in RAID stands for (spinning) disks And I have yet to see a sensible implementation of RA(I)F.
                    Technically speaking the "D" in RAID stands for "storage device", which at the time was only "disk" so they used "disk".
                    There is nothing in RAID spec that says the algorithms should work with spinning drives better than with flash.

                    So for now on my approach is to set SSDs or NVMes as independent devices, each with its own filesystem, and take care of reliability / redundancy in a layer above them. Think things like moosefs, ceph or beegfs.
                    I'd like to see some numbers on this type of approach too, I don't have much first-hand experience with these type of deployments but I'm interested.

                    Comment


                    • #30
                      Originally posted by starshipeleven View Post
                      There is nothing in RAID spec that says the algorithms should work with spinning drives better than with flash.
                      Technically, true. But with rotational media algorithms have 10 of miliseconds of time to do whatever they're doing and that is why raid pays off. With solid state media you only have microseconds to do your magic which makes it much more difficult for algorithms to add value to the whole setup. You can observe similar situation with io schedulers ... those that are the best for disks are typically not the best for flash.

                      Comment

                      Working...
                      X