VM deduplication

Question

Hi all,

Maybe a silly question, but if a bunch of files exist on one VM and are copied to another VM, are they deduplicated against each other? Or are they considered unique?

In this example, Hyper-v backups with cbt on 2012 hosts, crash consistent.

VM01\E:\Docs\ has 10,000 word documents

VM02\F:\Files\ has exactly the same 10,000 word documents

VM01 is Windows Server 2008R2

VM02 is Windows Server 2012

Thanks all.

Stuart Painter · Accepted Answer

Hi @ShaneDeduplication isn’t quite that simple, comparing one file against another.Please take a look at Optimize Storage Space Using Deduplication for a very simplified, high level overview on how deduplication works.Basically, files are collected on file systems, compressed and then a signature generated for comparison with data already seen in the Deduplication DataBase (DDB) to determine if it is duplicate or unique data.The same principles still apply for VM backups, but since we’re not collecting files for protection, VSA operates at a different level, deduplication still works on chunks of data, compressed first, then signature generation against a DDB.I hope that helps explains the deduplication process a little.Your second setof 10,000 files should have good dedupe ratios, but since the files laid down on the disks may not be physically identical on the respective virtual disks, it’s likely it won’t be 100% deduped, but should still achieve a pretty high dedupe ratio.Of course if some of that data also exists on other clients, then you’ll benefit from dedupe savings potentially many times over.Thanks,Stuart

Shane · Answer

Hi@ShaneDeduplication isn’t quite that simple, comparing one file against another.Please take a look at Optimize Storage Space Using Deduplication for a very simplified, high level overview on how deduplication works.Basically, files are collected on file systems, compressed and then a signature generated for comparison with data already seen in the Deduplication DataBase (DDB) to determine if it is duplicate or unique data.The same principles still apply for VM backups, but since we’re not collecting files for protection, VSA operates at a different level, deduplication still works on chunks of data, compressed first, then signature generation against a DDB.I hope that helps explains the deduplication process a little.Your second setof 10,000 files should have good dedupe ratios, but since the files laid down on the disks may not be physically identical on the respective virtual disks, it’s likely it won’t be 100% deduped, but should still achieve a pretty high dedupe ratio.Of course if some of that data also exists on other clients, then you’ll benefit from dedupe savings potentially many times over.Thanks,StuartThanks, Stuart.I understand that normal File System backups dedupe well against similar files, the doubt was around files within a VHD.But I think I have the answer I wanted: Won’t be 100% deduped, but deduped pretty well.Many thanks.

VM deduplication

2 replies

Readiverse Academy Certifications

Readiverse Academy Certifications

Sign up

Login to the community