Question

Space reclamation / orphan data cleaning against cloud storage library

  • 4 May 2023
  • 2 replies
  • 335 views

Badge +2

Hi all,

Wondering if by chance anyone has experience running DDB space reclamation with orphan data cleanup against a cloud library. We have a cloud library in Azure cool tier which we expect may have some data that was not pruned successfully, thus increasing our storage consumption in Azure. The deduplication database for this data lives on local storage.

We’d love to run a space reclamation with orphan data cleanup against this cloud library, but we’re concerned about the possible cost of storage transactions against the Azure cool library.

Has anyone performed this operation before and observed the related cloud storage costs? For reference, we have just under 100 million blobs and a total of about 400TB of storage utilization in Azure. 

Many thanks for any input folks may have!


2 replies

Userlevel 4
Badge +10

Hi @justin000,

I’m not sure what the cloud costs would be here but for performing space reclaim on Cloud libraries, CV recommends deploying an extra small VM in in the same Azure region as the storage:

https://documentation.commvault.com/2022e/expert/127689_performing_space_reclamation_operation_on_deduplicated_data.html

Once configured, specify this MA as the machine to run the Space Reclaim and it will not incur cloud costs.

Let me know if you have any questions

 

Badge +2

Hi @justin000,

I’m not sure what the cloud costs would be here but for performing space reclaim on Cloud libraries, CV recommends deploying an extra small VM in in the same Azure region as the storage:

https://documentation.commvault.com/2022e/expert/127689_performing_space_reclamation_operation_on_deduplicated_data.html

Once configured, specify this MA as the machine to run the Space Reclaim and it will not incur cloud costs.

Let me know if you have any questions

 

Hi Matt,

Thanks for the info. As luck would have it we have a MA in the cloud already. My concern is more around operation costs rather than bandwidth costs. i.e. with 100 million blobs, is it expected that there will be a total of 100 million read operations and no writes - one read call per blob, alongside any required delete operations to remove orphaned data? Or are there scenarios in which there will be more than one read operation per blob, writes, etc?

Hoping that someone has ventured down this path before and might be able to provide their experience before I hit go and we venture off into the great unknown. :-)

Thanks again.

Reply