Hey @steveg,
Great question, and I highly recommend you take a look at a few chapters of the AWS architecture guide located here.
https://documentation.commvault.com/commvault/v11_sp20/others/pdf/public-cloud-architecture-guide-for-amazon-web-services11-20.pdf
Check out page 16 for “Storing Data Efficiently”. - here is one extract relevant to your question.
A note on S3 Intelligent-Tiering: You will note that the S3 Intelligent-Tiering storage class is not represented, this is due to the fact that Intelligent Tiering makes data placement decisions based on access frequency. In Commvault, data is split into warm indexing data that allows for locating data chunks distributed in large data vaults, and cool/cold stored data. Commvault does not recommend the use of S3 Intelligent-Tiering, but instead advocates the use of Commvault combined storage classes (more below) to ensure you can efficiently locate and surgically recall data in minimal timeframes. The benefit is using Commvault combined storage tiers, means that small, warm indexes are kept in low-latency storage classes, available within millisecond first byte latency, meaning a surgical restore for cold/cool data occurs within minimal delay, while leveraging the low-cost of cooler storage classes
There are always penalties with the cheaper cost storage, and the table on page 17 tells part of the story. Additionally, data in glacier cannot qualify for granular pruning since we do not have that level of access to it, so generally, you have to scope out the retention and then seal the deduplication database periodically matched to the retention to allow the data to be deleted. Without doing that, new jobs rely on the old ones, and data can’t be aged.
It may seem daunting at first, but if you get the parameters right you can save a significant chunk of change. You just need to consider the costs of recovery, time to recover, and retention of the data. The document I referenced above is the perfect guide to help you plan it out.