Solved

VM deduplication

  • 3 November 2021
  • 2 replies
  • 419 views

Userlevel 3
Badge +8
  • Commvault Certified Expert
  • 74 replies

Hi all,

 

Maybe a silly question, but if a bunch of files exist on one VM and are copied to another VM, are they deduplicated against each other? Or are they considered unique?

 

In this example, Hyper-v backups with cbt on 2012 hosts, crash consistent.

VM01\E:\Docs\ has 10,000 word documents

VM02\F:\Files\ has exactly the same 10,000 word documents

 

VM01 is Windows Server 2008R2

VM02 is Windows Server 2012

 

Thanks all.

icon

Best answer by Stuart Painter 3 November 2021, 08:02

View original

2 replies

Userlevel 7
Badge +15

Hi @Shane 

Deduplication isn’t quite that simple, comparing one file against another.

Please take a look at Optimize Storage Space Using Deduplication for a very simplified, high level overview on how deduplication works.

Basically, files are collected on file systems, compressed and then a signature generated for comparison with data already seen in the Deduplication DataBase (DDB) to determine if it is duplicate or unique data.

The same principles still apply for VM backups, but since we’re not collecting files for protection, VSA operates at a different level, deduplication still works on chunks of data, compressed first, then signature generation against a DDB.

I hope that helps explains the deduplication process a little.

 

Your second set of 10,000 files should have good dedupe ratios, but since the files laid down on the disks may not be physically identical on the respective virtual disks, it’s likely it won’t be 100% deduped, but should still achieve a pretty high dedupe ratio.

Of course if some of that data also exists on other clients, then you’ll benefit from dedupe savings potentially many times over.

 

Thanks,

Stuart

Userlevel 3
Badge +8

Hi @Shane 

Deduplication isn’t quite that simple, comparing one file against another.

Please take a look at Optimize Storage Space Using Deduplication for a very simplified, high level overview on how deduplication works.

Basically, files are collected on file systems, compressed and then a signature generated for comparison with data already seen in the Deduplication DataBase (DDB) to determine if it is duplicate or unique data.

The same principles still apply for VM backups, but since we’re not collecting files for protection, VSA operates at a different level, deduplication still works on chunks of data, compressed first, then signature generation against a DDB.

I hope that helps explains the deduplication process a little.

 

Your second set of 10,000 files should have good dedupe ratios, but since the files laid down on the disks may not be physically identical on the respective virtual disks, it’s likely it won’t be 100% deduped, but should still achieve a pretty high dedupe ratio.

Of course if some of that data also exists on other clients, then you’ll benefit from dedupe savings potentially many times over.

 

Thanks,

Stuart

Thanks, Stuart.

I understand that normal File System backups dedupe well against similar files, the doubt was around files within a VHD.

But I think I have the answer I wanted: Won’t be 100% deduped, but deduped pretty well.

Many thanks.

Reply