FreeBSD ZFS vs. Linux EXT4/Btrfs RAID With Twenty SSDs

F.Ultra replied

15 December 2018, 09:15 PM
Originally posted by ryao View Post

I have given up on expecting Michael to benchmark meaningful configurations.

Also, he is still using compilebench, which is an utterly useless benchmark because it does not tell us what would be faster on a filesystem. Compilation takes about the same time on any filesystem because it is CPU bound, not IO bound.

compilebench does not benchmark compilation, it's a benchmark program that simulates all the IO that heavy compilation does so it's 100% IO bound.

Compilebench tries to age a filesystem by simulating some of the disk IO common in creating, compiling, patching, stating and reading kernel trees. It indirectly measures how well filesystems can maintain directory locality as the disk fills up and directories age. Thanks to Matt Mackall for the idea of simulating kernel compiles to achieve this.
Leave a comment:
bill_mcgonigle replied

15 December 2018, 03:25 PM
It's neat to see how each of these stacks up with the defaults. That might not be useful in the field, though. This article has some good advice and configuration benchmarks for setting up ZFS for performance:

ZFS Raidz Performance, Capacity and Integrity Comparison @ Calomel.org

https://calomel.org/zfs_raid_speed_capacity.html

The ZFS tuning guide has some Postgresql advice too. A rough mantra is that more vdevs mean more performance and you're sorta limited to a single drive's speed within a vdev. That's not exactly true, but sorta true.
it would be neat if we had "profiles" to make it easier for sysadmins to assign tuning for known quantities like PostgreSQL, but, again, I suppose that would be related to defaults and workloads. Anyway, poorly tuned for pgsql is probably much better than not tuned for pgsql. There's the Evil Tuning Guide for voiding the warranty.
Somebody above asked about ashift on FreeBSD - it's controlled by a sysctl rather than a zpool create option, but at least on FreeNAS the minimum is 12. Since it's 2^n the max is 13 or 14 and that's plenty.
Leave a comment:
pkese replied

15 December 2018, 01:09 PM
Originally posted by ryao View Post

Also, he is still using compilebench, which is an utterly useless benchmark because it does not tell us what would be faster on a filesystem. Compilation takes about the same time on any filesystem because it is CPU bound, not IO bound.

How would you then explain huge measured differences in compilebench across different filesystems?
If "compilation takes about the same time" then surely these benchmarks must be measuring something related to the filesystem, don't they...
Likes 2
Leave a comment:
darkbasic replied

15 December 2018, 11:24 AM
Originally posted by pegasus View Post

What I'm really curious tho is if it makes sense to put a lvm cache of optane nvme drive in front of tlc / qlc capacity ssd. Will need to make a large(ish) purchase decision on that soon ...

Or just use Optane for the whole storage: http://www.linuxsystems.it/2018/05/o...t4-benchmarks/
Leave a comment:
gadnet replied

15 December 2018, 09:38 AM
i will love to have a test of ZFS linux vs freeBSD also you should contact the writers of the ZFS books to have best optimisations and not fall into a biased test. I know they works with the postgres guy to have better performance. For exemple not having postgres ADN ZFS do the same job twice, just choose one to do it.

https://twitter.com/allanjude

https://twitter.com/mwlauthor

that could help all you ZFS test in the future especially if you compare FreeBSD to anything as they are FreeBSD admins.
Likes 1
Leave a comment:
pegasus replied

15 December 2018, 06:06 AM
Originally posted by Beherit View Post

I take it hardware RAID is a thing of the past?

I guess that's the only meaningful conclusion from these tests. Not just hw raid, any kind of raid. If you have fast storage devices, raid will just slow them down. Remember, D in RAID stands for (spinning) disks And I have yet to see a sensible implementation of RA(I)F.

So for now on my approach is to set SSDs or NVMes as independent devices, each with its own filesystem, and take care of reliability / redundancy in a layer above them. Think things like moosefs, ceph or beegfs.

What I'm really curious tho is if it makes sense to put a lvm cache of optane nvme drive in front of tlc / qlc capacity ssd. Will need to make a large(ish) purchase decision on that soon ...
Likes 1
Leave a comment:
untore replied

15 December 2018, 01:06 AM
Originally posted by chilinux View Post

I'm disappointed in the number of ZFS comparison benchmarks get published without discussing the FS implementation use of RAM. Phoronix is not the only one that has done this but I expected Phoronix to know better.

Try settings up a server dedicated to Postgresql and try to optimize RAM usage of the database (upping max_connections, shared_buffers, effective_cache_size, etc) on a system running ext4 or xfs. Once you get that tuning to take full advantage of the RAM in the database application, move the same configuration over to a ZFS setup. The result I get is a system will thrash because ZFS takes a great deal of the RAM for itself and Postgresql's tuning to attempt to use the same RAM causes the system into swapping. If you reduce that impact by lowering the Postgresql optimization parameters, you end up with a system that doesn't provide the same performance as the ext4 or xfs configuration. ZFS demand that memory be used for file system caching instead of application caching ultimately results in a poorly tuned database server configuration.

Even worse is if you need large amount of storage for the database server. ZFS stands for "Zettabyte file system" which is ironic given how poorly it actually scales in real world terms. With 12TB hard drives available, it is not hard to build an petabyte array. According to the ZFS rule of thumb of providing 1GB of RAM for every 1TB of disk, that petabyte array should be used with a system that has 1,000 gigabytes of RAM?!?! Majority of server motherboards I have worked with top out around below a fifth of that!

Lastly, it seems like with the release candidates of RHEL 8 that Red Hat is strongly pushing XFS with a btrfs like configuration interface provided by Stratis Storage. When doing FS comparisons, it would be nice if XFS was also included in the benchmarking. And again, it would be nice to see the amount of RAM left available for application services to take advantage of and how much RAM is monopolized with the FS kernel module.

ZFS on Linux Module Parameters

https://github.com/zfsonlinux/zfs/wiki/ZFS-on-Linux-Module-Parameters

OpenZFS on Linux and FreeBSD. Contribute to openzfs/zfs development by creating an account on GitHub.
Likes 1
Leave a comment:
Beherit replied

15 December 2018, 01:00 AM
I take it hardware RAID is a thing of the past?
Leave a comment:
waxhead replied

15 December 2018, 12:05 AM
To try to explain why BTRFS is slow in, BTRFS's implementation of "RAID" 1 and 10 is NOT yet optimized for parallel workloads. It simply use a scheme where it selects the storage device to use based on the PID of the process. Even or odd PIDS' may hug the same disk and there is not (yet) any optimization to balance the workload based on the storage device's queue length.

Note that patches has been posted on the mailing list to address this multiple times (by someone called Timofey Titovets) , but for some reason they have not been merged as far as I an tell by looking at the source code ( Ref: https://git.kernel.org/pub/scm/linux...4.20-rc6#n5187 )
Likes 1
Leave a comment:
linner replied

14 December 2018, 11:38 PM
With all the computing power available these days it's a shame we have to still manually tune something as basic as a filesystem.

Now if I could just keep my Linux servers from freezing the whole damn machine every once in a while when doing very heavy disk write activity... Makes me miss Solaris.
Leave a comment:

Announcement

FreeBSD ZFS vs. Linux EXT4/Btrfs RAID With Twenty SSDs

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: