Originally posted by bezirg
View Post
Announcement
Collapse
No announcement yet.
Linux DM-VDO "Virtual Data Optimizer" Preparing To Land In The Upstream Kernel
Collapse
X
-
A dm-vdo target can be backed by up to 256TB of storageLast edited by patrakov; 06 March 2024, 08:04 AM.
Comment
-
Originally posted by Anux View PostOf course there is much more work to be done before the data actually reaches your block dev. This is only intended for raids with batteries and servers/PCs with UPSs.
I'm not sure if it respects write barriers and if they could even help here. For normal home use a filesystem that includes all those features might be a more robust solution.
Comment
-
-
Originally posted by patrakov View PostToday this can be seen as quite tight. Someone with enough money can easily put 40 SSDs in an enclosure and present this to a server as a JBOD and make a software RAID5. Let's say they use 8TB SSDs (yes I know bigger SSDs exist, yet I have not seen any in real use in datacenters), that's already 320 TB of raw capacity.
The 256TB of backing storage has a reason. Turns out live de-duplicating data has quite a bit of overhead on cpu and it gets worse as the size increases.
Please note Yes redhat 69 G ram (yes its the real number keep mind out gutter) of memory usage for 246TB of physical storage behind VDO is better than the ZFS 1 to 20G of ram per 1TB of storage. Yes the 20G per TB is to have ZFS de-duplication ram backed so high performance.
Yes it will be nice to support
You start seeing something when you do ram/storage maths VDO.
69G/256TB= ~0.27 G per TB.
27G/100TB=~0.27G per TB.
14G/50TB=~0.28G per TB
3G/10TB=~0.30G per TB
0.472G/1TB=~0.47G per TB
This is way better than ZFS but it also progressive gets better the larger the volume up to 256 you might think but those were all the best case values.
69G/101TB=~0.68G per TB.
27G/51TB=~0.52G per TB.
14G/11TB=~1.27G per TB.
3G/2TB=~1.5G per TB
0.472G/0.01TB=~47.2G per TB.
Worst case values this is where you see the problem. Notice the 69G/101TB is worse than 27G/51TB. Yes the next block above 256TTB will show reduced efficiency with VDO until you get get far enough away.
Spiting 320 TB of capacity in 2 will require less ram and CPU processing per TB of storage to operate VDO than if you expand VDO to support 320TB. Yes 256TB where redhat stopped with VDO is that this is the start of where gains of making the file system larger with de-duplication and compression stop with VDO. Yes it something to take into account that if Redhat or other developers alter VDO to allow larger if you are only just in the next larger there will be zero advantage compared to splitting the array in 2.
Yes the bottom worst case says doing a stack of small VDO partitions is a really bad idea. Yes you want partition sizes close to the max per level with VDO so it highest efficiency on ram and cpu usage.
VDO design has some interesting limitations.
There is something else to consider BTRFS/XFS out of band duplication like duperemove. Lets say you have a spike in load so needing the ram tool like duperremove can be killed for the time of high load so letting the memory be used. Something like VDO and ZFS online de-duplication on have it on you are stuck with the memory usage come hell or high water.
Really I would love to see some better middle ground between online de-duplication and out of band duplication. As in a online file system de-duplication that you can start up and shutdown on demand.Last edited by oiaohm; 06 March 2024, 11:35 AM.
Comment
-
Originally posted by S.Pam View Post
Inline deduplication isn't as great as you might believe. On the contrary with user-space/online deduplication tools where you can focus deduplication on data that benefits from it.
- Likes 1
Comment
Comment