Skip to main content
Solved

VM deduplication

  • November 3, 2021
  • 2 replies
  • 542 views

Forum|alt.badge.img+8
  • Commvault Certified Expert
  • 74 replies

Hi all,

 

Maybe a silly question, but if a bunch of files exist on one VM and are copied to another VM, are they deduplicated against each other? Or are they considered unique?

 

In this example, Hyper-v backups with cbt on 2012 hosts, crash consistent.

VM01\E:\Docs\ has 10,000 word documents

VM02\F:\Files\ has exactly the same 10,000 word documents

 

VM01 is Windows Server 2008R2

VM02 is Windows Server 2012

 

Thanks all.

Best answer by Stuart Painter

Hi @Shane 

Deduplication isn’t quite that simple, comparing one file against another.

Please take a look at Optimize Storage Space Using Deduplication for a very simplified, high level overview on how deduplication works.

Basically, files are collected on file systems, compressed and then a signature generated for comparison with data already seen in the Deduplication DataBase (DDB) to determine if it is duplicate or unique data.

The same principles still apply for VM backups, but since we’re not collecting files for protection, VSA operates at a different level, deduplication still works on chunks of data, compressed first, then signature generation against a DDB.

I hope that helps explains the deduplication process a little.

 

Your second set of 10,000 files should have good dedupe ratios, but since the files laid down on the disks may not be physically identical on the respective virtual disks, it’s likely it won’t be 100% deduped, but should still achieve a pretty high dedupe ratio.

Of course if some of that data also exists on other clients, then you’ll benefit from dedupe savings potentially many times over.

 

Thanks,

Stuart

View original
Did this answer your question?

2 replies

Forum|alt.badge.img+15

Hi @Shane 

Deduplication isn’t quite that simple, comparing one file against another.

Please take a look at Optimize Storage Space Using Deduplication for a very simplified, high level overview on how deduplication works.

Basically, files are collected on file systems, compressed and then a signature generated for comparison with data already seen in the Deduplication DataBase (DDB) to determine if it is duplicate or unique data.

The same principles still apply for VM backups, but since we’re not collecting files for protection, VSA operates at a different level, deduplication still works on chunks of data, compressed first, then signature generation against a DDB.

I hope that helps explains the deduplication process a little.

 

Your second set of 10,000 files should have good dedupe ratios, but since the files laid down on the disks may not be physically identical on the respective virtual disks, it’s likely it won’t be 100% deduped, but should still achieve a pretty high dedupe ratio.

Of course if some of that data also exists on other clients, then you’ll benefit from dedupe savings potentially many times over.

 

Thanks,

Stuart


Forum|alt.badge.img+8
  • Author
  • Commvault Certified Expert
  • 74 replies
  • November 3, 2021
Stuart Painter wrote:

Hi @Shane 

Deduplication isn’t quite that simple, comparing one file against another.

Please take a look at Optimize Storage Space Using Deduplication for a very simplified, high level overview on how deduplication works.

Basically, files are collected on file systems, compressed and then a signature generated for comparison with data already seen in the Deduplication DataBase (DDB) to determine if it is duplicate or unique data.

The same principles still apply for VM backups, but since we’re not collecting files for protection, VSA operates at a different level, deduplication still works on chunks of data, compressed first, then signature generation against a DDB.

I hope that helps explains the deduplication process a little.

 

Your second set of 10,000 files should have good dedupe ratios, but since the files laid down on the disks may not be physically identical on the respective virtual disks, it’s likely it won’t be 100% deduped, but should still achieve a pretty high dedupe ratio.

Of course if some of that data also exists on other clients, then you’ll benefit from dedupe savings potentially many times over.

 

Thanks,

Stuart

Thanks, Stuart.

I understand that normal File System backups dedupe well against similar files, the doubt was around files within a VHD.

But I think I have the answer I wanted: Won’t be 100% deduped, but deduped pretty well.

Many thanks.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings