ZFS is known for its de-duplication support
and there are other file-systems (such as Dragonfly's HAMMER
, plus work-in-progress support for Btrfs) that support this data compression feature of eliminating duplicate data. There's also a new project that we have just learned about which is SDFS, a file-system that offers inline de-duplication support.
Opendedup SDFS is a file-system that supports in-line and batch mode de-duplication on both Linux and Windows systems, along with VMware virtualized environments. This file-system claims it can reduce storage utilization by up to 90~95%, can de-duplicate more than a Petabyte of data, can de-dupe/re-dupe at a speed of more than 1GB/s, and can do this de-duplication process either locally, on the network, or in the cloud (including Amazon S3). In fact, SDFS is particularly suited for the cloud with focusing on VMware, Xen, and KVM. SDFS also supports file and folder snapshots. These claims are rather impressive, especially from an unheard of open-source project (they only have 18 Twitter followers).
Earlier this month they put out the SDFS file-system 1.0.8 feature for Windows and Linux. While the file-system is portable to Windows, to the dismay of some, this file-system is built atop FUSE, which Linus Torvalds argues is for toys and misguided people
. This file-system requires Linux x86_64, FUSE 2.8+, at least 2GB of RAM, and even Java 7.
For those not familiar with data de-duplication, they have a page about it
and more information on opendedup.org
, including an SDFS architecture presentation. This user-space file-system is hosted at Google Code
and is developed under the GNU GPLv2 license.