Linux 5.5 SSD RAID 0/1/5/6/10 Benchmarks Of Btrfs / EXT4 / F2FS / XFS

lsatenstein replied

28 January 2020, 12:50 PM
I am an xfs fan!! I do not have to do shrinking. If your xfs partition is close to maximal, beware. You can run into issues with 10% free space or less.

By the way, use of the word lastly bothers me. Something can be last, but then should lastly should mean, not last or finally, but close to last. If close to the last, how close?
Just as last is a bother, in and into bother me. Put something into the system. Where is the something? It is in the system.

Back to xfs and tests. Would data centers be using terrabyte SSDs or Spinners? Raid on disk may favour btrfs.
Leave a comment:
S.Pam replied

28 January 2020, 12:31 PM
XFS does NOT do data checksum, so do not compare it to Btrfs or ZFS. Without data checksums a filsystem cannot guarantee the integrity of stored data.

Comparison of file systems - Wikipedia

https://en.m.wikipedia.org/wiki/Comparison_of_file_systems
Leave a comment:
oiaohm replied

28 January 2020, 11:46 AM
Originally posted by DrYak View Post

Oh, I didn't realise the feature got finally declared stable now. I was still remembering it as an experimental feature only.
Is it enabled by default, or is it still at the "must be configured" phase ?

https://blogs.oracle.com/linux/xfs-d...haring-reflink the link January 6, 2020 reflink is declared production ready on xfs and on by default when you create a new xfs file system using xfsprogs 5.1 and newer yes that 19 Jul 2019 https://lwn.net/Articles/794288/. Its now that getting out to distributions.

Reflinks and Filesystem meta data check-summing on by default on XFS in a growing number of distributions now, So at this point XFS is Part CoW due to Reflinks. Different way of doing copy on write. There is still the weakness of not check-summing data at this stage and you don't have your snapshots feature yet.

XFS feature list road map has quite a few still coming.

Originally posted by DrYak View Post

Also F2FS is log-structured, which shares some of the benefit (no-inplace overwrite, possibility to always recover by reverting to an older version, friendlier on append-mostly / overwrite averse media such as Flash, Shingeld magnetic, etc.) that CoW also provides. It is NOT checksuming its data though.

I forgot about log-structured type of file systems they are a different beast.
Leave a comment:
starshipeleven replied

28 January 2020, 11:38 AM
Originally posted by Spam View Post

To be more fair, checksums do help as you'd discover bad data before it migrates over to your backups. This is a perfectly valid use case.

When you do not need high availability / uptime you may as well run data=single and use hourly snapshots with frequent backups. This has the advantage of file versioning AND data integrity without the need for tipple space usage (raid mirror + backup). In other words typical home and small office users.

Yeah, this is the another valid usecase, given different requirements.

If you scale it a little bigger you can use btrfs with data=single as the "lower part" of a cluster filesystem in a SAN, and do away with RAID entirely as now you have multiple servers anyway and they are your availability. If an entire server drives blow up you just replace them and it restores the data from the others SAN.

The main issue with btrfs is that these usecases are what brings most $$$ so they are the functions that work well and are used in production, while stuff used by home users and enthusiasts like RAID5/6 is somewhat neglected still.

While filesystems that were developed for the reality of a decade ago like ZFS can do that and are fine.

Personally, for most home-user systems (and also my own NAS if I wasn't a fucking nerd) I would go with something far simpler like SnapRaid and a scheduled scrub, as the feature set is much more aligned with their usecase than ZFS and btrfs are, and it's also much simpler to understand, set up, maintain and recover from disk failures. https://www.snapraid.it/compare
Leave a comment:
S.Pam replied

28 January 2020, 09:30 AM
To be more fair, checksums do help as you'd discover bad data before it migrates over to your backups. This is a perfectly valid use case.

When you do not need high availability / uptime you may as well run data=single and use hourly snapshots with frequent backups. This has the advantage of file versioning AND data integrity without the need for tripple space usage (raid mirror + backup). In other words typical home and small office users.

Last edited by S.Pam; 28 January 2020, 02:47 PM.
Leave a comment:
starshipeleven replied

28 January 2020, 07:52 AM
Originally posted by GreenReaper View Post

Recommended by the way it's been used by Facebook and Synology

Recommended? Where did they recommend this?

That's how they use it for their own specific usecase, with their own specific decisions on tradeoffs.

Facebook isn't using it for data storage but for easy restore and snapshot of frontline, expendable containers and VMs for webservers and other computing services that don't store data inside themselves (there are database servers and storage servers for that). Their approach to data integrity issues is "terminate the container/VM and restore a backup".

I'm not aware of them using it on top of mdadm.

Synology (and others in the NAS sector) are using it only for its snapshot capabilities, not for its data integrity ability, see below why.

as checksumming and snapshot layer over the top of the block storage

this is NOT recommended by btrfs developers and it's useless for data integrity as btrfs without any parity can't fix any of the problems the checksumming will detect.

Running a btrfs volume with data=dup (so that you have two copies of the data and can therefore fix data integrity issues) is a RAID1, and running a RAID1 on top of a RAID5/6 is nonsense, you are wasting space for no reason.

In our case we expect to use to store original media files, so the checksumming is important to us

You really should have done your homework and not have randoms on the internet correct you.

Btrfs cannot do what you ask at the level of reliability you require. Layering it on top of mdadm is a tradeoff where you agree that you don't need some of its features (self-healing data integrity).

The only filesystem that can do what you ask is ZFS.
Leave a comment:
S.Pam replied

28 January 2020, 07:47 AM
I would not suggest using nocow for databases. Instead, you can turn off double-writes (MariaDB/MySQL) since double writes are not needed on COW filesystems. Also a nocow file will still be COWed if you do snapshots and the like.

With regard to RAID5 I think the write hole "worries" are exaggerated. You need two faults for it to be problematic; both a crash AND and disk failure. If you scrub after unclean shut downs you should be safe. Further, since Btrfs supports different RAID modes (aka profiles) for metadata and data, it is recommended to run RAID1 or RAID1c3 with metadata together with RAID5 for data. The write hole exists for other RAID implementations as well (mdadm, BIOS raid, HW raid cards...) unless you take specific precautions.

Though, I must admit, I almost always use RAID1. Disk space is not that often an issue, and it is usually easier and quicker to rebuild.

https://btrfs.wiki.kernel.org/index.php/Status has some stats
Likes 2
Leave a comment:
DrYak replied

28 January 2020, 07:22 AM
Michael : Given that BCacheFS is slowly nearing upstream inclusion, it would be good to start it appear in filesystem benchmarks.

Having ZFS (to have another point of comparison with modern CoW / snapshotting / checksumming filesystems) would be great.

Originally posted by oiaohm View Post

You close "XFS isn't a CoW filesystem" is wrong. XFS is a part CoW file system.

Oh, I didn't realise the feature got finally declared stable now. I was still remembering it as an experimental feature only.
Is it enabled by default, or is it still at the "must be configured" phase ?

-----

Also other detail regarding the tech behind filesystems:

in addition of being fully CoW, BTRFS, ZFS (and BCacheFS) are also fully checksuming (Everything including the data is checksummed. Most of the other filesystems only checksum their metadata).
(Which also burns cycles, and slows down perfs, in exchange of more reliability).

Also F2FS is log-structured, which shares some of the benefit (no-inplace overwrite, possibility to always recover by reverting to an older version, friendlier on append-mostly / overwrite averse media such as Flash, Shingeld magnetic, etc.) that CoW also provides. It is NOT checksuming its data though.

So it's surprising that it performs that well compared to EXT4/XFS/etc.

Oh and the usual warnings:
- RAID5/6 are *still* not considered stable by BTRFS.
- CoW file system are bad at multiple random writes inside large files (eg.: databases, virtual disks, torrents). The current tips are: mount the filesystem with "autodefrag" (= tries to group several writes into one) and mark these specific files as nocow* (touch to creat an empty file, chattr +C on the empty file, then optionnal write any data that you need (e.g.: use cat >> to copy from an older CoW version of the disk image, or truncate to reserve empty space for your torrent), enjoy)

For obvious reasons, nocow files also drop checksums. Which isn't critical because said application tend to have their own internal integrity checks (torrents uses hash as part of their design, database rely on advanced integrity mecanics implemented at the file-level, and virtual drive rely on whatever the filesystem inside the image has... which could actually be a FAT32 filesystem in which case you don't get much).
Likes 1
Leave a comment:
CochainComplex replied

28 January 2020, 03:44 AM
Michael what btrfs-progs version have you been using to manage the btrfs configs?
Leave a comment:
CochainComplex replied

28 January 2020, 03:40 AM
Originally posted by GreenReaper View Post

Nah, it's fair. I lost six hours of my user's new content to the btrfs committed transaction without writeback bug in 5.2. The only reason it wasn't more is the server's memory filled up with new data in that time and it finally froze writes. And the only reason that data wasn't all lost completely is that some of it had been served from memory to our caches - which use ext4 and mdadm - that had stored it safely.

Sure, other filesystems have bugs. But this was a doozy and it happened just a few kernel revisions ago. Then there was that poor combination of btrfs send and delayed allocation which could lead to it not sending any data for inodes it hadn't written out yet, quietly corrupting snapshots. And neither of those are new features, nor was the bug itself in new code - it existed since btrfs send was merged.

Btrfs can do a lot. Unfortunately this means it has a lot of bugs, especially when one component reacts unfavourably with another.

maybe not totally related. But it is recommended to use the kernel corresponding btrfs-progs version to make sure not having some strange behavior dealing with operations on the fs side.
Leave a comment:

Announcement

Linux 5.5 SSD RAID 0/1/5/6/10 Benchmarks Of Btrfs / EXT4 / F2FS / XFS

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: