I have a single s3 bucket that has held all of our Aux copy backups for many years, and it looks like it goes back to 2017. This bucket gigantic sitting at about 1.5PB and there are hundreds or millions of files in it.
I inherited this setup when I came on board a bit back so I don’t know much about the history of it. I do know that a couple years back Commvault was rebuilt/recreated and everything was setup new. Our current storage policies have a year retention going out to this s3 bucket, and I see it in Commvault for the jobs that they do get aged off after a year. This does go through a DDB, and I ended up sealing it a few weeks ago, and at the same time cleaned up (aged/space reclamation) as much as I could in order to shrink down our bucket.
It dropped the bucket size very slightly when I did this. I was hoping for a lot more. That started to make me think that there is a discrepancy from what the bucket has to was Commvault believes it has. I think there is a lot more old data that has nothing to do with the current Commvault stuff, and I would like to manually delete it, but I don’t want to break any backups we currently have by getting rid of ones that the DDB relies on or such.
Here is my question - So our oldest backups go back to March of 2022 and I see that in Commvault. When I look at the s3 bucket files directly, there are a ton of chunk files etc.. going back to 2017. I also see on the s3 bucket folders that it shows what looks to be Barcode info such as V_101970 and in that same range (V_101xxx and so on from 2017). When I go to the storage policy and right click on the Aux Copy and select media it brings up all of the media info Commvault knows about in that s3 bucket location. All of the barcodes here are in the range of V_799xxx through V_938xxx. Nothing even remotely close to the V_101xxx anywhere. So would it be safe to assume these old 2017 files directly in the s3 bucket could be manually deleted with this information? The DDB or anything shouldn’t be holding on to these super old files, especially since the Barcodes don’t show up in Commvault would it?
Best answer by Scott MosemanView original
Do NOT manually delete anything. That will most likely delete data.
Open a case with Support and have them review the environment to determine next steps. DDB verifications, space reclamations, etc. will probably be necessary. You absolutely cannot use the dates on files and directories to know what to delete.
In a deduplication context, data written can exist potentially forever. Files and folders from a long time ago could definitely still be required for current backup jobs.