Solved

Large deduped Secondary Copy in Cloud - Do we recommend periodic sealing?

  • 27 October 2021
  • 2 replies
  • 269 views

Userlevel 2
Badge +6

Hi Team,

 

We currently have quite a large Secondary Copy of data, writing to object-based storage and I am just reviewing its health status (note - this is NOT cold storage in any way).

At the moment it looks pretty good, but a while ago I seem to remember it was Commvault’s recommendation to periodically seal these large DDB’s.

 

I believe this was in order to maintain healthy DDB performance.

Does anyone know if this still applies? The reason I’m asking, is that I so far haven’t seen that recommendation in the documentation, but it is possible it is squirreled away somewhere.

 

To help assist there is a summary here (approx):-

 2 Partitions

Size of DDB                        950 GB (each).

Unique Blocks                    500 million

Q&I 3 days                         1200

Total Application Size          15 PB (yes, PB)

Total Data Size on Disk        550 TB

Dedupe Savings                 96.5%

Infinite retention (critical data needed for longterm and potential legal needs).

 

If I have this right, and based current docco:-

 

https://documentation.commvault.com/11.23/expert/111985_hardware_specifications_for_deduplication_mode_01.html

 

With a two partition, one disk setup (which we have), we can hit 1000TB of physically written BET

 

 

So being at 550 we are well within parameters.

 

 

However, the question remains about sealing the store.

I am aware that although a Q&I of 1220 is ok, it is steadily creeping up.

Additionally, as we are using Infinite retention, then one day we are obviously going to hit the 1000TB mark.

 

I would summarize my questions as:-

 

1 - Should I seal this store to improve DDB performance

2 - Assume that with the underlying SP’s being infinite retention, no data blocks will be impacted or age-out whatsoever.

3 - How do I mitigate reaching that 1000TB BET limit? Do I simply add an extra DDB to each of the current Media Agents, or am I likely to need extra MA’s? Hopefully I can just add another partition to each MA, as looking at the above docco we have scope to go to “Extra Large 2 DDB Disk” mode.

 

Thanks in advance ….

 

 

icon

Best answer by jgeorges 27 October 2021, 09:58

View original

2 replies

Userlevel 2
Badge +6

Thanks for the very thorough reply Jase.

 

I have checked a bit more of our history.

The GDSP was created almost two years ago.

So I think if I seal it, then we will have almost another two years for it to build up to current performance levels so we have plenty of time to manage this.

 

Userlevel 5
Badge +9

Hey @MountainGoat 


I was almost ready to go to the publishers for my short story of a reply. But i have culled it down about as best i can!
 

1 - 
Sealing the store will start a fresh one as im sure you already understand. This will absolutely result in improved perfomrance however how long is determined by how quickly it will grow again. If you manage to seal it annually, then its a good outcome. If not, than shopping for better disks may be a cheaper solution.

2 - 

If all data being written is infinite, than you’re correct. There will never be an aged block, so no space will be reclaimed. So keeping the store active as long as possible will minimise the footprint to cloud. 

3 - 

In some way YES however its not quite as simple. Signatures are balanced between partitions using an algorithm. When a new partition is added, many existing signatures are now written to the new partition, and will stop referencing those old partitions. HOWEVER, as you’re retention is infinite, you’ll never shrink those original partitions and as new signatures come in, all partitions will continue to grow.

When you seal the store, you’ll start a new DDB with new 1PB Limits. You can also expand upto 4 nodes with 2 disks each to enjoy upto 4PB limits:
https://documentation.commvault.com/11.24/expert/111985_hardware_specifications_for_deduplication_mode.html

Cloud Storage actually provides slightly higher tollerances when compared with local disk, as there is less workload on the DDB. This is due to how granular our pruning goes for cloud storage (Disk storage that supports Sparse can prune chunks at 128kb size [Drillholes], while cloud prunes at 8mb chunks, or truncates if able).
Given you’ll have zero pruning workload, you may even exceed these guidlines with your only workload being during backup operations.

 

The choice you make will be unique to your environment and experience, however i’d hazard that if you get a few years out of the existing setup before seeing any performance issues than Sealing the store without adding partitions would be a good balance. 
You’ll also have some peace of mind that every new DDB comes with a ‘fresh’ start and hard stop to some risk. By sealing, you’re mitigating that if any data is impacted by corruption/malware/ransomware that you won’t have exposed many years of data, rather that THAT year of data.


Cheers,

Jase

Reply