Solved

AWS S3 Bucket Cleanup


Badge +6

I am looking for information on how Commvault cleans up its target S3 bucket.  Is there any documentation on how this works?

Thanks

Chuck

 

icon

Best answer by Damian Andre 21 October 2022, 03:01

View original

11 replies

Userlevel 5
Badge +11

Hi @ChuckC 

 

Please see documentation here on which cloud vendors are supported for pruning:

https://documentation.commvault.com/2022e/essential/9236_supported_cloud_storage_products.html

 

Let me know if you have further questions.

 

Thanks

Badge +6

Jordan, thanks for the doc.  I have looked up Mirco pruning and I believe this has to do with expired chunks.  Since deduplication we can only delete chunks that are not referenced.  Thus data chunks may stay longer that the specific backup policy retention.  A little confusing.  This is a new AND the 1st time I have cloud implementation and we are preparing for management to ask why we may have older data in the Bucket.  I also saw something called macro pruning.  Looking up Space reclamation also.  Thanks and if you have anymore insight into why there could be expired data in S3 bucket, I would be glad to hear it. 

Chuck

Userlevel 5
Badge +11

@ChuckC 

 

This is the same concept as deduplication on on prem disk, you may have data blocks that were written a long time ago needing to be retained because newer jobs are referencing them. There’s nothing really that different or unique to cloud. 

Userlevel 4
Badge +9

Hi @ChuckC,

There’s a few considerations when it comes to cloud pruning:
- Does your cloud storage support micro pruning? 
-- Combined storage tier, archive storage tiers both do not support Micro Pruning
- If your configuration does not support micro pruning it is crucial to ensure that automatic ddb sealing is enabled and configured within the ddb properties.
https://documentation.commvault.com/2022e/expert/139246_combined_storage_tier.html

“Note: For longer retention periods, this value can be manually setup to be half of the retention days. For example, if the retention is set to 4 years, this value can be set to 2 years or 730 days.”

Additionally, check to ensure that there is not lifecycle policies, versioning, soft deletes or worm configurations setup which may lead to holding data longer than expected.

 

If we have ruled out some of the above and you are still concerned that you have more data held in cloud than expected I recommend having a new incident opened with Commvault Support and we’ll review further with you. If an incident is created please upload Commserve logs and database and logs for any associated media agents.

Userlevel 7
Badge +23

If you are really wanting to clear out old data in a bucket, you could set this option on your deduplication database:

Deduplication Database Properties - Settings

Do not Deduplicate against objects older than n day(s)

The number of days after which a unique data block cannot be used for deduplication during new data protection jobs. Setting this value ensures that very old data blocks are not allowed as the 'origin' data for newer data protection jobs that are deduplicated.

Important: If you set a value for less than 30 days, then the window will display the value but internally it will default to 365 days. For example, if you set the value to 29 days, then the window will display 29 days but data blocks that are as old as 365 days will be used for deduplication during new data protection jobs.

 

I don’t recommend this, but what it will do is stop referencing blocks older than the days you specify (30 is the minimum) for new jobs. Over time those new jobs will re-write those blocks and release the old ones to be pruneable. This wont work if you have long term retention because we can’t tell the older jobs to reference a newer block - but as the older jobs meet retention those older blocks can start being released.

But this is just cosmetic - the reason those older blocks hang around as Jordan stated, is that they are still needed by other jobs in retention. No need to create/store them again if we already have them.

Userlevel 4
Badge +11

Hi @ChuckC 

apart from the Commvault prunig functionality, ensure that “S3 Versioning” is disabled on the bucket (see https://documentation.commvault.com/11.26/assets/pdf/public-cloud-architecture-guide-for-amazon-web-services11-25.pdf, page 85):

Ensure S3 versioning is disabled, Commvault does not support the use of bucket/object versioning (which is disabled by default). Enabling versioning will cause storage growth and creation of orphaned objects, which Commvault does not track or manage (see Using versioning in S3 buckets > for information on versioning). Commvault does not support bucket versioning. Commvault can remove bucket versions created if bucket versioning is accidentally enabled. An auxiliary copy > of all data in the cloud library will be required to remediate.

As indicated above, S3 Versioning is a AWS S3 bucket option which does not integrate with Commvault, so we are not aware of it. So, if S3 versioning is enabled on the S3 bucket, this might explain why there is more space used in the bucket than Commvault reports.

Badge +6

Everyone,  thanks for all the information.  I will review our AWS S3 bucket setup a let all you know what we decide to do to improve space utilization.

Thanks again

Chuck

 

Badge +6

Team,

The above was very helpful.  It seems we had versioning turned on.  Our admin guys turned off versioning however does anyone know if commvault or aws can cleanup the “old” items that had versioning turned on?

 

Thanks

Chuck

Userlevel 4
Badge +11

Hi Chuck,

unfortunately, there is not much we can do from out of Commvault. If the versioning is turned off, version data may get deleted automatically from AWS side, presuming that “Object Lock” (WORM functionality of AWS S3 Buckets) was not enabled at the same time. If it does not clean up by itself, check the following link on how to delete versions on S3 buckets: https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeletingObjectVersions.html. But be careful with that and ensure you are not deleting the wrong ones.

If that doesn’t help you either, please open a Support ticket with AWS to get help with deleting versions.

Other than that, for example in case you had Object Lock enabled on the bucket, I’m affraid you will not get rid of the versions as they are locked. In that case there is no other way than to create another S3 bucket (without S3 versioning and with Object Locks disabled of course!) with a fresh disk library on it, and to do an AuxCopy of all data to that new disk library. Afterwards, you may delete the old disk library and the S3 bucket underneath it.

Hope that helps.

 

Regards,

Markus

 

Badge +6

Markus, Thanks!!! We are looking at that now.  I really appreciate all the information that has been forward to me.

 

Thanks

Chuck

Userlevel 7
Badge +19

@ChuckC other option could be to create a fresh bucket and add that bucket as part of an additional storage policy copy and start an aux-copy to copy all jobs in retention. As soon as the initial aux-copy is finished you just start a last copy to sync the remaining jobs who were created in between and make this copy the new primary copy. 

assuming you used the bucket for one storage policy copy you now have the possibility to decommission this copy entirely including the related S3 bucket,  

Reply