FreeBSD ZFS vs. Linux EXT4/Btrfs RAID With Twenty SSDs

torsionbar28 replied

14 December 2018, 11:37 PM
Originally posted by ryao View Post

Also, many enterprise CPUs are limited to 256GB of RAM, which is 1/4 of 1TB.

If you bought a lesser chip that maxes out at 256 GB, and your requirement was for 1 TB, you bought the wrong server. Better enterprise CPU's can do 1 TB per socket. Heck, our old Dell R815 (4 socket Opteron) has 512 GB in it.
Leave a comment:
ryao replied

14 December 2018, 10:35 PM
Originally posted by edenist View Post

You mentioned tuning Postgresql to optimize for memory usage. Likewise with ZFS. There are many parameters which can help optimize if you're operating in a memory-constrained [or memory contended] system, notably arc_max, which will limit how much memory ZFS can use for it's caching. I don't think you can talk about tuning postgres, then complain when you haven't done the same for ZFS.

That memory 'rule-of-thumb' with ZFS is when using deduplication, which isn't something a lot of people need or use. If you're wanting to use de-dup on a petabyte worth of storage, on a system with 84 hard drives in a single vdev, I'd say 1TB of memory isn't exactly crazy.

Using edge-cases to argue against mainstream use of something seems like grasping at straws to me. If you just don't like ZFS, then that's fine I suppose. Just state it as it is.

Unfortunately, that rule of thumb was always wrong even for deduplication. The correct way of calculating it is a mathematical formula that varies based on your data’s deduplicability, your recordsize and some. ARC parameters, not a constant number. I have posted it enough times that I am not going to post it again unless asked. I do not keep it on hand and it is moderately annoying to derive.

By the way, my standard advice is to use primarycache=metadata on the application’s dataset if you are doing caching in userspace, but I imagine that adjusting the maximum arc size could potentially give better results when the machine is dedicated to a single application such as postgresql. You would have potentially more limited direct reclaim and ARC would able to act as a second level cache for anything on the dataset that the postgresql cache would not cache. Nice tip.

Last edited by ryao; 14 December 2018, 10:41 PM.
Likes 1
Leave a comment:
ryao replied

14 December 2018, 10:24 PM
Originally posted by some_canuck View Post

20 disks in a single vdev is suboptimal

I have given up on expecting Michael to benchmark meaningful configurations.

Also, he is still using compilebench, which is an utterly useless benchmark because it does not tell us what would be faster on a filesystem. Compilation takes about the same time on any filesystem because it is CPU bound, not IO bound.

Last edited by ryao; 14 December 2018, 10:29 PM.
Leave a comment:
ryao replied

14 December 2018, 10:19 PM
Originally posted by chilinux View Post

According to the ZFS rule of thumb of providing 1GB of RAM for every 1TB of disk, that petabyte array should be used with a system that has 1,000 gigabytes of RAM?!?! Majority of server motherboards I have worked with top out around below a fifth of that!

Every word of this is false. There is no such rule. I could have 1EB of storage on ZFS under on a RPi and it would work just about as well as you can imagine it would with any filesystem. There is no penalty for having less memory as cache (and ZFS does release memory as the kernel requests it). Also, many enterprise CPUs are limited to 256GB of RAM, which is 1/4 of 1TB.

By the way, for postgresql, set primarycache=metadata if you want ZFS’ cache to get out of your way. Also, set the recordsize to 8KB to avoid read-modify-write and put the PostgreSQL transaction log on its own dataset. Also, add a small SLOG device. That will make it perform really well.
Likes 4
Leave a comment:
ryao replied

14 December 2018, 10:09 PM
“With the basic SQLite embedded database benchmark, ZFS on FreeBSD 12 was faster than Linux with either EXT4 or Btrfs. Btrfs with its default copy-on-write behavior led to noticeably slower performance.”
Michael Are you trying to say that ZFS is not Copy on Write? ZFS is only copy on write. There is no option to turn it off.
Leave a comment:
Zan Lynx replied

14 December 2018, 08:59 PM
Micheal, forgive me if I didn't see it anywhere, but could you run a straight dd read of every drive at the same time in parallel? That would give us a baseline for where the IO read tops out from SATA, controller, or NVMe lane limits. That's always interesting to compare to the filesystem numbers, I think.
Likes 5
Leave a comment:
edenist replied

14 December 2018, 08:46 PM
Originally posted by chilinux View Post

I'm disappointed in the number of ZFS comparison benchmarks get published without discussing the FS implementation use of RAM. Phoronix is not the only one that has done this but I expected Phoronix to know better.

Try settings up a server dedicated to Postgresql and try to optimize RAM usage of the database (upping max_connections, shared_buffers, effective_cache_size, etc) on a system running ext4 or xfs. Once you get that tuning to take full advantage of the RAM in the database application, move the same configuration over to a ZFS setup. The result I get is a system will thrash because ZFS takes a great deal of the RAM for itself and Postgresql's tuning to attempt to use the same RAM causes the system into swapping. If you reduce that impact by lowering the Postgresql optimization parameters, you end up with a system that doesn't provide the same performance as the ext4 or xfs configuration. ZFS demand that memory be used for file system caching instead of application caching ultimately results in a poorly tuned database server configuration.

Even worse is if you need large amount of storage for the database server. ZFS stands for "Zettabyte file system" which is ironic given how poorly it actually scales in real world terms. With 12TB hard drives available, it is not hard to build an petabyte array. According to the ZFS rule of thumb of providing 1GB of RAM for every 1TB of disk, that petabyte array should be used with a system that has 1,000 gigabytes of RAM?!?! Majority of server motherboards I have worked with top out around below a fifth of that!

Lastly, it seems like with the release candidates of RHEL 8 that Red Hat is strongly pushing XFS with a btrfs like configuration interface provided by Stratis Storage. When doing FS comparisons, it would be nice if XFS was also included in the benchmarking. And again, it would be nice to see the amount of RAM left available for application services to take advantage of and how much RAM is monopolized with the FS kernel module.

You mentioned tuning Postgresql to optimize for memory usage. Likewise with ZFS. There are many parameters which can help optimize if you're operating in a memory-constrained [or memory contended] system, notably arc_max, which will limit how much memory ZFS can use for it's caching. I don't think you can talk about tuning postgres, then complain when you haven't done the same for ZFS.

That memory 'rule-of-thumb' with ZFS is when using deduplication, which isn't something a lot of people need or use. If you're wanting to use de-dup on a petabyte worth of storage, on a system with 84 hard drives in a single vdev, I'd say 1TB of memory isn't exactly crazy.

Using edge-cases to argue against mainstream use of something seems like grasping at straws to me. If you just don't like ZFS, then that's fine I suppose. Just state it as it is.
Likes 7
Leave a comment:
chilinux replied

14 December 2018, 08:12 PM
I'm disappointed in the number of ZFS comparison benchmarks get published without discussing the FS implementation use of RAM. Phoronix is not the only one that has done this but I expected Phoronix to know better.

Try settings up a server dedicated to Postgresql and try to optimize RAM usage of the database (upping max_connections, shared_buffers, effective_cache_size, etc) on a system running ext4 or xfs. Once you get that tuning to take full advantage of the RAM in the database application, move the same configuration over to a ZFS setup. The result I get is a system will thrash because ZFS takes a great deal of the RAM for itself and Postgresql's tuning to attempt to use the same RAM causes the system into swapping. If you reduce that impact by lowering the Postgresql optimization parameters, you end up with a system that doesn't provide the same performance as the ext4 or xfs configuration. ZFS demand that memory be used for file system caching instead of application caching ultimately results in a poorly tuned database server configuration.

Even worse is if you need large amount of storage for the database server. ZFS stands for "Zettabyte file system" which is ironic given how poorly it actually scales in real world terms. With 12TB hard drives available, it is not hard to build an petabyte array. According to the ZFS rule of thumb of providing 1GB of RAM for every 1TB of disk, that petabyte array should be used with a system that has 1,000 gigabytes of RAM?!?! Majority of server motherboards I have worked with top out around below a fifth of that!

Lastly, it seems like with the release candidates of RHEL 8 that Red Hat is strongly pushing XFS with a btrfs like configuration interface provided by Stratis Storage. When doing FS comparisons, it would be nice if XFS was also included in the benchmarking. And again, it would be nice to see the amount of RAM left available for application services to take advantage of and how much RAM is monopolized with the FS kernel module.
Likes 4
Leave a comment:
supertin replied

14 December 2018, 07:07 PM
I get why people like to do these comparisons with the latest, fastest stuff they have available.... But when you want to know how lower spec systems will perform, it's almost impossible to find a decent comparison.

I'd be interested in seeing a similar comparison test using an i3 or similar, about 4GB of RAM, 2 or 4 basic everyday 4-8TB HDDs. Most home users looking at home brew NASes would probably like to know how to save on CPU and RAM while maximizing performance of the storage cost.
Likes 5
Leave a comment:
Shnatsel replied

14 December 2018, 05:47 PM
I wonder how XFS would perform. Red Hat has been investing heavily into that for server workloads that assume RAID and now they're also building Stratis on top of it, which is also RAID-oriented.
Leave a comment:

Announcement

FreeBSD ZFS vs. Linux EXT4/Btrfs RAID With Twenty SSDs

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: