Skip to main content
Answer

Windows Data Deduplication in VMware thin provision server

  • May 13, 2025
  • 2 replies
  • 78 views

SteveF
Vaulter
Forum|alt.badge.img+6

We have a customer running 11.32.89 that are protecting VMWare thin-provisioned VMs. We do primary backups with Pure snapshots. The customer informed us that they have Windows Data Deduplication (WDD) running in the VM. 

I am seeing older articles online around using thin provisioning and WDD that indicates larger consumption of disk because of WDD cleanup behavior. WDD doesn’t dynamically clean up; it cleans once a week. With that behavior, the thin provision VM keeps growing until the clean up happens.

 

During the backups, we're getting horrible dedupe rates in some servers (10%). Along with that behavior, we are seeing Data Analytics (Indexing) of the VM contents taking longer than 24hrs. Finally, the CVLT provided information seems off - in that we show in the "Data on Media" view that Size on Media is at 6.3TB, but data written says 2.5TB. They're asking what's going on with that.

I’m thinking the consumption is due to VM protection in that we protect what the VM size is and not what Windows says in use. We have to because we aren’t using in-guest backups. This is leading to bloat in the backend storage consumption.

Couple questions:

  1. Do we have a best practices for VM (vmware) protection, especially with Windows Data Deduplication turned on? 

  2. Do we know what the behavior is like with a snapshot level backup of a VM with Windows Data Deduplication?

Thanks for any assistance here.

Best answer by sbhatia

Hi Steve, 

I don’t think there’s a straightforward answer to your questions, we’d need a more holistic view of the environment, but let me give it a shot. 

  1. Best practice for VM (VMware) protection with Windows Data Deduplication?
    The key is to align backup schedules with the WDD optimization cycle. Since WDD runs on a schedule (usually weekly), running backups after this helps capture deduplicated data, improving backend dedupe and avoiding bloat. If WDD is heavily used, consider in-guest backups instead of VM-level, since snapshots grab the full disk, including bloated or undeduped data.

  2. Behavior of snapshot-level backup with WDD?
    It captures the disk as-is. So if WDD hasn’t run, you’ll back up more unique data than necessary, hurting dedupe rates and increasing storage usage. This can also lead to longer indexing times and backend bloat.

  3. Why is “Size on Media” higher than “Data Written”?
    "Size on Media" includes all data chunks, index data, and metadata, while "Data Written" is only the unique data after dedupe and compression. The gap could be due to inefficient deduplication, retention settings, or pruning not working as expected.

Recommendation:
Given the gap between "Size on Media" and "Data Written" and the indexing delays, it’s worth involving Commvault Support. They can review the DDB, pruning logs to make sure expired data is being properly cleared and reported accurately.

2 replies

sbhatia
Vaulter
Forum|alt.badge.img+9
  • Vaulter
  • Answer
  • May 14, 2025

Hi Steve, 

I don’t think there’s a straightforward answer to your questions, we’d need a more holistic view of the environment, but let me give it a shot. 

  1. Best practice for VM (VMware) protection with Windows Data Deduplication?
    The key is to align backup schedules with the WDD optimization cycle. Since WDD runs on a schedule (usually weekly), running backups after this helps capture deduplicated data, improving backend dedupe and avoiding bloat. If WDD is heavily used, consider in-guest backups instead of VM-level, since snapshots grab the full disk, including bloated or undeduped data.

  2. Behavior of snapshot-level backup with WDD?
    It captures the disk as-is. So if WDD hasn’t run, you’ll back up more unique data than necessary, hurting dedupe rates and increasing storage usage. This can also lead to longer indexing times and backend bloat.

  3. Why is “Size on Media” higher than “Data Written”?
    "Size on Media" includes all data chunks, index data, and metadata, while "Data Written" is only the unique data after dedupe and compression. The gap could be due to inefficient deduplication, retention settings, or pruning not working as expected.

Recommendation:
Given the gap between "Size on Media" and "Data Written" and the indexing delays, it’s worth involving Commvault Support. They can review the DDB, pruning logs to make sure expired data is being properly cleared and reported accurately.


SteveF
Vaulter
Forum|alt.badge.img+6
  • Author
  • Vaulter
  • May 14, 2025

Thank you for this really solid reply.
I came here because CVLT Support has already looked at this via many tickets and basically came back each time with “That’s how the product works.”

Armed with this information, which validated my thinking on the process, we have something to get back to the customer with for further troubleshooting - reschedule both WDD and our backups or turn of WDD.

I’ve some older VMWare information on turning things off and vmotion the VM to clean out ‘inactive’ blocks.
Again, thank you for this reply.