Skip to main content
Solved

Large S3 Bucket Backup


Mohit Chordia
Byte
Forum|alt.badge.img+11

Hi Community ,

Can we take a backup of S3 bucket which is 80 TB in size using Commvault ?

Consider 10-15% daily change of data.

 

How does Commvault takes backup of S3 . I

Is it streaming backup , reading objects 1 by 1 which i expect would be very slow or some sort of Intellisnap capability is available for S3 backup ?

Regards, Mohit

Best answer by Onno van den Berg

Hi @Mohit Chordia,

Yes, this is possible. There is no snapshot capability in object land, currently, so there is no IntelliSnap here. It all drawls back to streams (concurrently). Depending on the data structure and offering of the access node you should just add more and more streams (Commvault uses the term readers).  The first backup will take some time to complete but after that it will create synthetic fulls. We ourselves have seen the scan process taking a very long time to complete. I think towards the future there is still much to improve and to gain when it comes to optimizations. 

One thing that could be added to the documentation are some guidelines. Eg. in case FET is 100TB you should consider having this in place to make sure you can create at least one recovery point per 24 hours.

Onno



 

View original
Did this answer your question?

7 replies

Onno van den Berg
Commvault Certified Expert
Forum|alt.badge.img+19
  • Commvault Certified Expert
  • 1232 replies
  • Answer
  • July 28, 2022

Hi @Mohit Chordia,

Yes, this is possible. There is no snapshot capability in object land, currently, so there is no IntelliSnap here. It all drawls back to streams (concurrently). Depending on the data structure and offering of the access node you should just add more and more streams (Commvault uses the term readers).  The first backup will take some time to complete but after that it will create synthetic fulls. We ourselves have seen the scan process taking a very long time to complete. I think towards the future there is still much to improve and to gain when it comes to optimizations. 

One thing that could be added to the documentation are some guidelines. Eg. in case FET is 100TB you should consider having this in place to make sure you can create at least one recovery point per 24 hours.

Onno



 


Mohit Chordia
Byte
Forum|alt.badge.img+11

@Onno van den Berg 

Thanks for the reply .

If anyone has tested and can share the stats for taking backups of large buckets S3 , Blob etc that would be great.

 


Onno van den Berg
Commvault Certified Expert
Forum|alt.badge.img+19

@Mohit Chordia,
We protected very large buckets in the past +40TB and this was working fine. I unfortunately cannot share the stats, but it also depends on the performance of your access node(s) + object storage solution in case you run it on-premise. In AWS it also very important to leverage S3 VPC endpoints to reduce traffic cost. 

 


Mohit Chordia
Byte
Forum|alt.badge.img+11

What are the other options to take backup of S3 ?

Is there any advantage which Commvault can provide as compared to AWS backups when backing up large S3 buckets ?

Regards, Mohit


Onno van den Berg
Commvault Certified Expert
Forum|alt.badge.img+19

The only true advantages that Commvault bring to the table is that it allows you to create a backup of data stored in Amazon S3 to an external location, for example to on-premise storage or any other public cloud provider and it offers a single pane of glass for data management. But AWS backup also offers advantages because it also backups ACLs, object metadata, tags.


Mohit Chordia
Byte
Forum|alt.badge.img+11

Does AWS backups offers deduplication and compression as similar to Commvault ?

Does CV offers deduplication and compression benefits for S3 backups ?

Can i use source side deduplication while backing up S3 to limit my data transfer and reduce network transfer cost ?


Onno van den Berg
Commvault Certified Expert
Forum|alt.badge.img+19

Ahh that's the other advantage that Commvault brings to the table which is the compression and deduplication which is something AWS Backup doesn't offer. This is done on the access node which is pulling in the data from S3. From a cost perspective it means you can expect added cost for the access node, but you can combine it with for example the VSA access node.

Do note that for AWS Backup you will have to enable versioning on the bucket level which can incur additional cost.  


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings