Announcement

Collapse
No announcement yet.

Virtual Data Optimizer (VDO)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Virtual Data Optimizer (VDO)

    Hi,

    Any chance that there will be some tests using the recently opensourced vdo kernel module ?
    It's enables you to use data compression, deduplication and zero block elimination on quite a lot of other filesystems.

    In short vdo creates a dm device that can be used as a block device for other filesystems (or lvm)
    It can be used on a single disk or a md device. (and fiber or whatever you use to store your data on)
    The vdo block device will then do compression deduplication and zero block elimination.

    I was wondering how this would impact performance.
    and maybe seeing how this would work compared to
    btrfs
    xfs on vdo
    xfs on lvm on vdo
    ext4 on vdo
    ext4 on lvm on vdo

    It used to be closed source from permabit, but after red hat bought it they open sourced it and now you can find the git tree here :
    A kernel module which provide a pool of deduplicated and/or compressed block storage. - dm-vdo/kvdo


    Cheers
    Tjako

    for reference here are some sources :
    # nice explanation of how it works
    In the Red Hat Enterprise Linux 7.5 Beta, we introduced virtual data optimizer (VDO). VDO is a kernel module that can save disk space and reduce replication bandwidth. VDO sits on top of any block storage device and provides zero-block elimination, deduplication of redundant blocks, and data compression. These are the key phases of the data reduction process that allows VDO to reduce data footprint on storage. VDO applies these phases inline and on-the-fly.  Now, lets see what happens in each process (download the beta yourself and try): Zero-Block Elimination: In the initial phase, any blocks that consist entirely of zeros are identified and recorded only in metadata. This can be understood with an example of sand mixed in a mug of water. We use filter paper (zero-block elimination) to collect only sand particles (non-zero data blocks) out of water.  Similarly, VDO only allows blocks that contain something other than all zeros to filter through to the next phase of processing.Deduplication: In this second phase, the incoming data is processed to determined whether it is redundant data (data that has been written before) or not. The redundancy of this data is checked through metadata maintained by the UDS (Universal Deduplication Service) kernel module delivered as part of VDO. Any block of data that is found to be redundant will not be written out. Instead, metadata will be updated to point to the original copy of the block already stored on media.  Compression: Once the initial zero elimination and deduplication phases are completed, LZ4 compression is applied to the individual data blocks. The compressed data blocks are then packed together into fixed length (4 KB) blocks and stored on media.  Because a single physical block can contain many compressed blocks, this can also speed up the performance for reading data off storage. How to create a VDO volume on a storage deviceWhen VDO creates a volume on a block storage device, it divides the device into 2 portions internally: UDS Portion: The size of this portion is fixed unless additional capacity is explicitly specified when the VDO volume is created. It is used store the unique name and location of each block seen as deduplication advice is requested of the UDS driver by the VDO device.VDO Portion: The VDO portion is the space that VDO uses to add, delete and modify user data and its associated metadata. Now, let's create a VDO volume and observe how it interacts with other Linux components.For this example, I am using a local VM running on KVM Virt-manager on my bare metal machine.VM Specifications:            OS : RHEL 7.5 pre-release (Since, VDO is integrated in RHEL 7.5)            RAM : 4GB            OS DISK : 20GB            ADDITIONAL DISK : 15GB (VirtIO-blk)The following are the steps I used to create a VDO volume:# vdo create --name vdo_vol --device=/dev/vda]This creates the VDO volume on top of my virtio-blk device /dev/vda which can now be accessed as /dev/mapper/vdo_vol.   # mkfs.xfs -K /dev/mapper/vdo_vol]This creates an XFS filesystem on top of VDO volume. Note that the "-K" used with the  mkfs command speeds up the formatting of XFS   file-systems by not sending discard requests at file system creation time. Since the VDO volume has just been created, it is already initialized to zeroes.  # mkdir /vdo_vol    ]Makes the directory /vdo_vol on which to mount the XFS file system. # mount -o discard /dev/mapper/vdo_vol /vdo_vol]Mounts the file system for general use. The "discard" option while mounting the "/dev/mapper/vdo_vol" is used by the filesystem to inform VDO when blocks have been deleted. This can be done either by using  the discard option in the mount command or by periodically running the fstrim utility.  Discard behavior is required to free up previously allocated space on thin provisioned devices.At this point, we have successfully mounted the VDO volume. To confirm this, we observe that the "/dev/mapper/vdo_vol" is mounted on the "/vdo_vol" directory based on  the "df -hT" output below:] Observations: In this configuration, out of total 15GB disk space, only 12GB is available for user data the remaining 3GB is used for UDS and VDO metadata. "vda" partitioning in the "lsblk" output is shown below:] lsblk outputObservations:Again we can confirm from the vda size, that actually it is 15GB in size, but the user available vdo_vol is providing 12GB of total disk space.My next blog will look at how much space savings can be achieved with VDO.Need help ! Drop a comment below ..... 

    # docs

    # red hat permabit takeover
    Red Hat, Inc. (NYSE: RHT), the world's leading provider of open source solutions, today announced that it has acquired the assets and technology of Permabit Technology Corporation, a provider of software for data deduplication, compression and thin provisioning. With the addition of Permabit’s data deduplication and compression capabilities to the world’s leading enterprise Linux platform, Red Hat Enterprise Linux, Red Hat will be able to better enable enterprise digital transformation through more efficient storage options.As more enterprises move towards adopting the efficiencies offered by digital technologies like Linux containers and cloud computing, being able to run these services and store the resulting data requires new storage needs outside of what is offered by traditional storage technologies. Storage efficiency is a key piece in addressing these needs, particularly with the emergence of hyperconverged infrastructure (HCI) which blends storage and compute onto a single x86 server. Enterprise-class, open source solutions can help to address the storage challenges posed by these digitally transformative technologies by using software to increase the amount of storage available to applications without increasing the amount of physical storage.With Permabit’s technology, Red Hat can now bring powerful data deduplication and compression features into Red Hat Enterprise Linux itself, which will also enhance capabilities across Red Hat’s hybrid cloud and storage technologies, including Red Hat OpenStack Platform, Red Hat OpenShift Container Platform and Red Hat Storage. Consistent with its commitment to delivering fully open source solutions and upstream-first innovation, Red Hat plans to open source Permabit’s technology. This will enable customers to use a single, supported and fully-open platform to drive storage efficiency, without having to rely on heterogeneous tools or customized and poorly-supported operating systems.The transaction is expected to have no material impact to Red Hat’s guidance for its second fiscal quarter ending Aug. 31, 2017, or fiscal year ending Feb. 28, 2018.Supporting QuoteJim Totton, vice president and general manager, Red Hat“Digitally-transformative technologies, including cloud infrastructure, Linux containers and hyper-converged infrastructure, require enterprises to re-examine overlooked or previously commoditized technology decisions, especially storage, to gain as many efficiencies as possible for business evolution. With the addition of Permabit’s data deduplication and compression tools to Red Hat Enterprise Linux, Red Hat will be ready to support these organizations as they seek to derive a more efficient storage footprint to power business innovation.”




  • #2
    Isn't this the lovely thing where if you guess wrong about your data ratio you start to get block write errors if it wasn't able to compress or dedupe enough blocks?

    I think predictability is important. I wouldn't want to run this.

    Comment


    • #3
      Originally posted by Zan Lynx View Post
      I think predictability is important. I wouldn't want to run this.
      Predictability?

      Comment


      • #4
        What do you mean with predictability ?
        The amount of data storage to be used ?
        If so that would be a nice item to test.

        Tjako

        Comment


        • #5
          I heard from a friend who runs Linux servers about problems with a thing that sounds like this VDO.

          The way he told it, basically, you have to tell it ahead of time what compression and deduplication ratio you expect. If you guess incorrectly, everything works fine until it runs out of physical blocks to write to. Then your write returns a write error as if the disk block had gone bad.

          This is not the same as an out of space error. The kernel does not return the error in the same way. In fact, if you have a preallocated data file, as my friend did, and are writing into the middle of it, the failure is unrecoverable because user-space has no idea what write of its caused the problem because of the kernel's file cache and how actual physical writes are asynchronous to the program's writes.

          Comment


          • #6
            I'm not sure that you guys are talking about the same thing.
            When I read the example(first link in the original post) there is no mention off any guesses/estimates regarding the compression and/or de-duplication when setting up the vdo block device.

            T

            Comment


            • #7
              I think it's a different technique, in the example in the first post there is no mention about any estimates related to deduplication or compression.
              You only define the vdo-block devices and then format it to your own likings.

              T

              Comment


              • #8
                You made me go look at https://access.redhat.com/documentat...ating-a-volume

                It's right there at the top. They recommend using ten times logical space for containers and virtual machines and 3x space for other things. Then on one of the following pages they stress the importance of monitoring the vdo and not letting it exceed 80%.

                I note that the documentation there doesn't stress the horrible things that happen when it runs out.

                Comment


                • #9
                  Good point.

                  Didn't see that one yet, that does make this filesystem compression rather dangerous.
                  Even if the system wouldn't be completely lost it is rather 'irritating' to put it mildly that your filesystem corrupts as soon as it fills up.
                  Monitoring the free remaining space is a rather useless advise, cause we all know that shit happens including the failure of the monitor and the filling up of the harddrive when you're not looking.

                  Unless red-hat (the current owner of the vdo module) guarantees no file corruption on a full vdo-block device I won't use it either.

                  T

                  Comment

                  Working...
                  X