A Quick Look At EXT4 vs. ZFS Performance On Ubuntu 19.10 With An NVMe SSD

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • jrch2k8
    replied
    Originally posted by make_adobe_on_Linux! View Post
    Hopefully some ZFS devs and experts can agree on what precisely qualifies as a properly configured and optimized ZFS setup and let Phoronix know before their next benchmarks. I really want to see ZFS over Optane and other top SSDs with cryptography on! Definitely needs to be tested in raidz2 also.
    I can give you some base rules(and i have bunch more in older posts in other threads as well):

    Hardware:
    • if possible always prefer ECC over fast ram, all technicalities aside it protects you from getting trash written from RAM to Pools.
    • ZFS use a lot of RAM by default(unless fine tuned), so if you don't wanna spend hours with arcstat just set you minimum RAM to 32GB for a regular home usage
    • If you wanna encrypt get a CPU at least modern enough for 4 cores with AES-NI, Zen based CPU's are great choices.
    • If you plan to use NVME in as disk not as ZIL/SLOG drives, set you minimum requirement to at least ThreadRipper or X299 if you love to waste money. ZFS can handle RAID on NVME on any system and won't require any sort of BIOS or dongle extras but regular desktop boards lacks PCI-e bandwith, so you will always be limited to 1 drive speed or worse depending the mobo.
    • Never use ZFS on RAID 0 or with a single drive because basically you will get all the downsides of CoW with literally 0 benefit.
    Generals:
    • know you data, depending on your data the performance can be great or unbearable trash.
    • not all properties on a pool are for you, just use what you need.
    • never ever ever use ZFS on a bare pool, volumes exist for a reason and if you don't use them then you should consider why are you using ZFS to start with.
    Basic Optimization:
    • Compression is a great tool but a deceptive one, if you have a volume with lots of office files or highly compressible files(let say your documents folder ) compression will work great and boost you transfer rate because you save lots of bandwidth but if you have lots of non-compressible files like videos or already compressed files enabling compression on that volume will skyrocket you latency with 0 bandwidth savings, aka you are wasting CPU cycles for no reason.
    • Deduplication is a nice feature but it requires huge amounts of RAM/CPU and can make latency spike harshly if misused, so keep in mind it should only be used on volumes with lots of small files that you know can be redundant and compressible(like a samba share for people to save office files or the likes) or a volume with big binaries that you know have lot in common like ISOs, Virtual Machine Drives(in case you share several instances of the same OS),etc.
    • Large Dnodes should be use on auto always unless you have a very specific reason not to(like Solaris compatibility).
    • Atime=off Relatime=on, this one don't need much explanation.
    • recordsize is a tricky one, my rule is 16k for certain databases, 128k for general bunch of small compressible files and 1M for volumes where most of your files are bigger than 1M(like videos/iso, etc), if you don't do this right you will get low performance and/or very high fragmentation
    • Sync goes always standard unless you really know what you are doing.
    • xattr=sa, acltype=posixacl and aclinherit=(up to you) worked great for me through the years but as always check the documentation first.
    • Encryption require testing because depending on you data/compression/recordsize could be great or a slow dog, so do some tests before blindly go and start encrypting.

    FAT WARNING:
    • Never use ZFS on RAID 0 or with a single drive because basically you will get all the downsides of CoW with literally 0 benefit.
    • Most changes made to a Volume affect only new/modified files be careful to make most of your changes before adding data to Volumes/Pools/etc.
    • A lot of fine tuning can be done through kernel module parameters as well but is way more complex than a simple post can handle.

    Leave a comment:


  • deusexmachina
    replied
    Hopefully some ZFS devs and experts can agree on what precisely qualifies as a properly configured and optimized ZFS setup and let Phoronix know before their next benchmarks. I really want to see ZFS over Optane and other top SSDs with cryptography on! Definitely needs to be tested in raidz2 also.

    Leave a comment:


  • DanglingPointer
    replied
    Originally posted by jrch2k8 View Post

    Is amazing the irony in this phrase after that statement.

    Your statement makes no sense, specially the dumb analogy "if facebook use it means is cool, duh!!!"

    1.) Facebook use a myriad of filesystems same as any enterprise because they have millions of dollars to keep focus engineering teams for each use case and even more teams for integration, so yeah nobody is doubting Facebook or any other enterprise uses BTRFS the issue is where, for what and in which conditions.

    For example nobody in its sane mind will use BTRFS(and the likes) for a cluster hot backend(put here whatever service make you happy) but BTRFS(and the likes) could be awesome for massive raids 1/10/(5+0/6+0 in the case of ZFS) of spinning disk +cache drives for a nice a cold backend, why simply because no CoW filesystem could match read bursts of a journaled filesystem for latency but on the other no journaled filesystem can do data integrity checks(properly).

    2.) Facebook probably uses BTRFS(and i am taking a huge leap of faith in believing they don't also use ZFS for other stuff because they are not mutually exclusive) because is included in the kernel already and for their use case i'm pretty sure they debugged and optimized the living crap out of it(i can almost bet money they have their own implementation in house or at least hyper specific patches for their workloads since this is common practice on those huge enterprise and the main reason they go FOSS in the first place).

    So, in resume everything in my post is accurate including the part you cherry picked because i've never said BTRFS is worse in every scenario and condition(i gave a very specific set of problems that are even recognized by the uptream developers btw) to ZFS and in fact there are scenarios where BTRFS could be better than ZFS.

    My point was this simple as a general availability, battle tested of most scenarios, fully featured(as in all claimed features work properly) CoW filesystem ZFS is superior in most scenarios to BTRFS and this is factually true but again this does not mean BTRFS is 100% bad for you as it has it good points as well(is not black and white).

    Also please again take into account when you mention Enterprise as huge as Facebook, Microsoft, Google, etc. normal common sense get thrown out of the window since they have enough engineering power to handle massive teams to make anything work with anything because their control over data is ultra fine grained and ultra focused, so the fact they can make work VFAT with PostgreSQL(for example) doesn't mean that for people working on regular out of the mill Enterprises/SMB/etc. will hold true or that this FS is better than that FS because X mega enterprise uses it will hold true because you don't know the exact conditions of usage of either(for example VFAT/Ext4/NILFS/etc. could be better than anything under the sun for a very specific table setting at facebook with very controlled data types for certain operation that probably will only ever be used at facebook but that still qualifies as Facebook uses VFAT/Ext4/NILFS/etc, duh!! and that is no way means your databases won't run like crap on VFAT/Ext4/NILFS/etc. because you don't meet that specific conditions)
    Meh...

    I expected the diatrab from the fanboy...

    Leave a comment:


  • jrch2k8
    replied
    Originally posted by DanglingPointer View Post

    Meh...
    If you're on Facebook, you're on BTRFS! Last I checked, they're one of the largest enterprises on the world and their business is data.... If you've been with them for years, then your data has been on BTRFS for years!

    Fanboys will be fanboys and judge from afar.
    Is amazing the irony in this phrase after that statement.

    Your statement makes no sense, specially the dumb analogy "if facebook use it means is cool, duh!!!"

    1.) Facebook use a myriad of filesystems same as any enterprise because they have millions of dollars to keep focus engineering teams for each use case and even more teams for integration, so yeah nobody is doubting Facebook or any other enterprise uses BTRFS the issue is where, for what and in which conditions.

    For example nobody in its sane mind will use BTRFS(and the likes) for a cluster hot backend(put here whatever service make you happy) but BTRFS(and the likes) could be awesome for massive raids 1/10/(5+0/6+0 in the case of ZFS) of spinning disk +cache drives for a nice a cold backend, why simply because no CoW filesystem could match read bursts of a journaled filesystem for latency but on the other no journaled filesystem can do data integrity checks(properly).

    2.) Facebook probably uses BTRFS(and i am taking a huge leap of faith in believing they don't also use ZFS for other stuff because they are not mutually exclusive) because is included in the kernel already and for their use case i'm pretty sure they debugged and optimized the living crap out of it(i can almost bet money they have their own implementation in house or at least hyper specific patches for their workloads since this is common practice on those huge enterprise and the main reason they go FOSS in the first place).

    So, in resume everything in my post is accurate including the part you cherry picked because i've never said BTRFS is worse in every scenario and condition(i gave a very specific set of problems that are even recognized by the uptream developers btw) to ZFS and in fact there are scenarios where BTRFS could be better than ZFS.

    My point was this simple as a general availability, battle tested of most scenarios, fully featured(as in all claimed features work properly) CoW filesystem ZFS is superior in most scenarios to BTRFS and this is factually true but again this does not mean BTRFS is 100% bad for you as it has it good points as well(is not black and white).

    Also please again take into account when you mention Enterprise as huge as Facebook, Microsoft, Google, etc. normal common sense get thrown out of the window since they have enough engineering power to handle massive teams to make anything work with anything because their control over data is ultra fine grained and ultra focused, so the fact they can make work VFAT with PostgreSQL(for example) doesn't mean that for people working on regular out of the mill Enterprises/SMB/etc. will hold true or that this FS is better than that FS because X mega enterprise uses it will hold true because you don't know the exact conditions of usage of either(for example VFAT/Ext4/NILFS/etc. could be better than anything under the sun for a very specific table setting at facebook with very controlled data types for certain operation that probably will only ever be used at facebook but that still qualifies as Facebook uses VFAT/Ext4/NILFS/etc, duh!! and that is no way means your databases won't run like crap on VFAT/Ext4/NILFS/etc. because you don't meet that specific conditions)

    Leave a comment:


  • foobaz
    replied
    ZFS never comes across well in benchmarks. It's one of the slowest filesystems I've used, and yet it's also one of the best. ZFS prioritizes data integrity above all else. It's possible to corrupt an Ext4 filesystem and lose data. Features like journaling and fsck mitigate this to a huge degree, so ext4 is very safe. But not as safe as ZFS.

    The three main features that make ZFS so safe are block checksumming, copy-on-write, and plugging the RAID write hole. Block checksums protect against bit rot and hard drive errors. Data is only recoverable on pools with redundancy. A single disk ZFS pool can detect bit rot, but not repair it.

    Copy-on-write means ZFS never modifies data in place. It writes a new block with the modified data, and then changes the extent table to point at the new block. This is bad for performance for frequently modified files, most notably databases. But it means that if you lose power or experience hardware failure in the middle of the write, the old data is still there on disk.

    The RAID write hole is an effect that can cause data loss if a write operation is interrupted in the middle. The classic solution is to use a RAID controller with battery backup, so in the event of a power failure, it can finish writing data to disk. However, these cards are expensive and don't protect against other events that can cause data loss like a hardware failure in the card.

    The only other filesystem with the ingredients to compete with ZFS is Btrfs. It supports block checksumming and is also a copy-on-write filesystem. Unfortunately it was plagued with bugs in early versions, most recently the parity modes in 2016. Since data integrity is supposed to be the main feature of Btrfs, it's not very appealing if you don't trust it.

    So although ZFS is slow, it's the most reliable filesystem available. It's possible to mitigate the speed issues by throwing money at the problem - use flash storage, or add more disks. It's not so simple to mitigate reliability issues.

    Leave a comment:


  • cjcox
    replied
    Originally posted by DrYak View Post

    Yes and no.
    Yes, periodic scrubing on a new-gen filesystem (BTRFS, ZFS and BcacheFS once that hit mainstream) is "good enough".
    And no, it's not on most RAID subsystems.
    Really? Shoot even elcheapo subsystems like Nexsan support scrubbing. I just figured if they did it, that most did. Good to know though.

    Scrubbing is designed to be "rot" prevention (refresh the bits so that they aren't so musty and dusty)

    Not a checksumming thing like with ZFS and Btrfs... do those scrub? I guess they must. How else to you prevent rot?


    Leave a comment:


  • DanglingPointer
    replied
    Originally posted by jrch2k8 View Post
    [/LIST]Don't get me wrong, i do believe for desktop/small servers BTRFS is sufficient and stable enough, specially considering is a lot simpler than ZFS and give similar features but for Enterprise stuff ...
    Meh...
    If you're on Facebook, you're on BTRFS! Last I checked, they're one of the largest enterprises on the world and their business is data.... If you've been with them for years, then your data has been on BTRFS for years!

    Fanboys will be fanboys and judge from afar.

    Leave a comment:


  • jrch2k8
    replied
    Originally posted by kcmichaelm View Post

    I'm going to support these benchmarks because I think they provide an excellent service that is needed right now. The word "default" is under-appreciated throughout your response.

    This is ZFS going into a shipping, standard, desktop distribution. By the numbers, FAR more people are going to be using this default "hilariously wrong setup" than a properly tuned version, and I think that needs exploration.

    It is perfectly true that no one should pick single-disk-ZFS for a performance benefit over Ext4 - and therefore it's very important to measure that difference so people are informed.

    There could be many examples (let's pick photographers as one, with the example in this thread) who run Linux due to their appreciation of the open-source tools, and now they see there's a supported filesystem option which we all tend to agree is pretty darn good against bitrot, and they want to try it. However, if Canonical isn't providing them tools to tune it like you say it must be done, then their installation won't be tuned unless they really feel like digging into it.
    That is a fair point, maybe benchmarks touching ZFS should have a very visible caveat as a middle ground solution, so those who pick ZFS on Ubuntu will at least notice that ZFS is going to need intervention(being CLI as of today or some future tool canonical develops) for optimal performance/security.

    I guess the bigger issue is ubuntu bringing ZFS to the desktop version(for some reason i did completely though was only for server images) because ZFS was never designed for desktop usage and basically have 0 sane defaults for that scenario and honestly i'm not even sure how are they going to handle this outside is just an option if you pick you should know what you are doing!!! kinda attitude

    Leave a comment:


  • kcmichaelm
    replied
    Originally posted by jrch2k8 View Post
    Please Michael again, stop using ZFS on this benchmarks if you don't have the time to set it properly or at least bearably, you are only hurting ZFS because the average phoronix user don't have enough context to understand why those result are so horrible or why this setup is so hilariously wrong and will never show any real world performance or benefit for using ZFS in the first place.
    I'm going to support these benchmarks because I think they provide an excellent service that is needed right now. The word "default" is under-appreciated throughout your response.

    This is ZFS going into a shipping, standard, desktop distribution. By the numbers, FAR more people are going to be using this default "hilariously wrong setup" than a properly tuned version, and I think that needs exploration.

    It is perfectly true that no one should pick single-disk-ZFS for a performance benefit over Ext4 - and therefore it's very important to measure that difference so people are informed.

    There could be many examples (let's pick photographers as one, with the example in this thread) who run Linux due to their appreciation of the open-source tools, and now they see there's a supported filesystem option which we all tend to agree is pretty darn good against bitrot, and they want to try it. However, if Canonical isn't providing them tools to tune it like you say it must be done, then their installation won't be tuned unless they really feel like digging into it.

    Leave a comment:


  • jrch2k8
    replied
    Originally posted by DrYak View Post
    Sorry, but wasn't one of the big advantage that ZFS touted over BTRFS that it can auto-detect and auto handle some corner cases? (Like a big quantity of random writes - e.g.: databases, VM images, torrent - and autotune the CoW ?)
    I'm more a BTRFS guy, so my impressions might be wrong.


    because that's what is available in his tools? The benchmarking tools are opensource, by the way: Feel free to help carve decent tests for ZFS...


    I think by now people have more or less learned that CoW and Log-structured filesystems are a different beast.
    You either go for EXT4 if you care only about raw speed.
    Or go BTRFS and co if you want the extra features.


    And with very few exceptions, isn't built-in by default into most Linux distributions.
    Meanwhile BTRFS is available even on smartphones (Sailfish OS).
    Well, i have used ZFS since Solaris 10 days and i have never heard of of auto tuning, i know you can tune ZFS to any scenario in a ridiculously fine grained fashion but never automatically, i think you may be wrong about the AUTO part.

    Automate reproducible tests for ZFS is near impossible(unless a top notch dev offer some insides that i don't have) and my issue is not with his tools but the fact of using them on ZFS regardless(is not my first post about this and i've gone to long lengths before, im just tired to keep at it).

    Well, phoronix is the land of people that are fast to run their mouth about things they don't understand but very glacial slow to actually try to understand or accept why are they wrong. So i wouldn't bet money on it but ok

    This is a fair point but i did give BTRFS few tries over the years and i still sometimes do but it always fail me one way or the other for my use cases, for example:
    • RAID 5/50/6/60 is still very nuclear on BTRFS, it may work or it may eat you data and kill you kitten
    • Is very slow on big LUNs specially on PostgreSQL(last tried with PG10/lin5.1), at least compared to ZFS but it may be related to the first issue since i didn't test with RAID1(which i think is the strongest one on BTRFS atm)
    • I don't think it works ok on NVME(as with several drives), i can't prove it since the logs say nothing but on nvme i just noticed some services get random huge latency spikes and the only difference is ZFS vs BTRFS but well i may be wrong(i also didn't bother much to go the extra mile i just nuked the server and went back to ZFS to test another stuff <-- was a test server ofc).
    • In general i believe BTRFS lacks flexibility in the volume/snapshot department as well but this may be subjective depending on what you do.
    • BTRFS and virtualization are not very good friends.
    Don't get me wrong, i do believe for desktop/small servers BTRFS is sufficient and stable enough, specially considering is a lot simpler than ZFS and give similar features but for Enterprise stuff or really important stuff i do believe ZFS is without peers, i simply trust it with my life(that does not imply i bypass having proper backups) since the solaris 10 era i never have a failure with data/hardware loss that ZFS couldn't recover from completely or a workload i couldn't find a way to optimize the living bejesus out of it, damn, even today i have a client with an old server that has lost in the last 10 years 23 of its original 24 hard drives(is implied i've been replacing the damaged drives for new ones and resilvering, of course) but have never lost a bit of data.

    Leave a comment:

Working...
X