Commvault Official Deduplication Limits

  • 23 June 2021
  • 9 replies
  • 3227 views

I think there are several deduplication-related subjects that require thorough documentation by Commvault.     

1.) There is a long-standing recommendation to limit deduplication database disks to TWO PER MEDIA AGENT.’

 I was able to track this recommendation as far back as Simpana 9.0, but I’m not sure if it pre-dates that.   

This limit feels rather arbitrary, and it doesn’t take into account the performance capabilities of the host platform, or advances in computing kit since the original recommendation was made.   I can’t find any references showing WHY such a limit should exist.

In my case, I have three Deduplication disks on my media agents (all NVMe SSD), and that runs with zero issues.  The Media Agents are spec’d over the recommended Extra Large spec, and they don’t even sweat.  

I would like to issue a  call to Commvault to really explain why this limitation is in place. The Deduplication Building Block section of the documentation would be an ideal place for this information.  The cost of standing up additional Media Agents to comply with this seemingly arbitrary limit can be prohibitive.  We’re talking hundreds of thousands of dollars of additional cost.  

2.) There are many references to limiting number of DDBs to TWO PER MEDIA AGENT.

This recommendation really bugs me, as it doesn’t take into account the size of the hosted deduplication databases.    You can run twenty 50,000,000 block DDBs on a Media Agent with brilliant results.   I would need to stand up a minimum of nine additional Media Agents to be in compliance with the recommendation.  

This limitation requires an explanation, as the consequences of complying with it may price Commvault out of the enterprise.   This should be a key part of the Deduplication Building Block portion of the documentation.  

 

Thanks!

 


9 replies

Userlevel 7
Badge +19

@Prasad Nara  : 

How many/much Unique & Secondary Blocks can supports the MA with single partition DDB Store? and what is the best practice even the DDB on SAS SSD disk. 

 

 

One 2TB disk may holds up to 2 billion unique records but DDB partition performance may start degrading after 1 billion records.

 

 

@Prasad Nara It still depends on the type of disk ;-) If it is a NVMe disk than the negative impact will be far less intrusive, but even in that case you will notice that NVMe disk A will not deliver the same performance as NVMe disk B. 

In general a lot of people search for some form of guidance. A recommendation is still a recommendation and it is not a hard requirement and bare you that keeping the documentation up-to-date from both a Commvault product evolution point but also from an environmental point of view (AWS, Azure, GCP, VMware, OpenStack, Windows, Linux, Oracle) is super challenging. 

So maybe they should just use the wording to use rule of thumb and put more smartness in the solution itself so you are informed pro-actively to:

  • Increase system resources CPU/RAM or even change Azure/AWS…. compute offering. 
  • Improve disk IO performance monitoring and alerting (is already in the product by the means of QI but not really visible in Command Center) or even suggest customer to increase amount of IOps because Commvault can retrieve current compute offering from hypervisor. 
  • Suggest user to upgrade DDB to newer version. E.g. show a exclamation mark in Command Center that notifies the user that the DDB should be upgraded. 
  • Suggest user to scale-out the storage pool/DDBs.
  • Introduce a more aggressive and automatic job randomizer when using server plans to spread the load on the underlying non-public cloud environments. Of course if your storage doesn't feel any pain when all your jobs are fired than you can ignore this one but in a lot of environments you will definitely see the backup having a (huge) impact. 

Most of the times the documentation in regards to sizing is read in the beginning once the environment is build, but overtime everything changes: software improvements, protected data grows, requirements change, environmental changes. All of them will have a positive impact or negative impact on your backup and recovery performance.
 

We ourselves always use partitions DDBs to introduce redundancy. I rather spend a little bit more money (rule of thumb / 2) to have this in place. Of course for backup you will have to enable the ability to continue writing data and you have to acknowledge the downside of writing and storing data duplicate for a temporary period of time, but it is all worth it. 
 

And the reason to use dedicated is indeed most of the times related to multi-tenancy and/or to restrict the failure domain from becoming to big although Commvault also added a lot of DDB recovery/reconstruction enhancements in the recent past. 

 

 

Userlevel 4
Badge +6

#2. We already removed number of DDBs to TWO PER MEDIA AGENT. limit from documentation. You can configure multiple DDBs on single MediaAgent if total size of these DDBs fits within the limit of MediaAgent BET.  Please refer to “Scaling and Resiliency” section in below doc page. 

https://documentation.commvault.com/11.23/expert/111985_hardware_specifications_for_deduplication_mode_01.html

Userlevel 4
Badge +6

@Prasad Nara  : 

How many/much Unique & Secondary Blocks can supports the MA with single partition DDB Store? and what is the best practice even the DDB on SAS SSD disk. 

 

 

One 2TB disk may holds up to 2 billion unique records but DDB partition performance may start degrading after 1 billion records.

 

 

Userlevel 7
Badge +23

That’s a great question and observation.

I’m tagging in @Seema Ghai from our documentation team so we can get that additional context and guidance cleared up on the building guide.

Userlevel 3
Badge +5

I have often wondered when/why someone would setup multiple DDB’s on one MA?   We currently run a Partitioned Global Dedupe database, with 4 partitions each on their own MA.   Would it be better to divide them differently? 
  Are the multiple DDB’s for different data types/block sizes?

Userlevel 4
Badge +6

I have often wondered when/why someone would setup multiple DDB’s on one MA?   We currently run a Partitioned Global Dedupe database, with 4 partitions each on their own MA.   Would it be better to divide them differently? 
  Are the multiple DDB’s for different data types/block sizes?

In below 2 cases you may need to create multiple Global DDB policies aka Storage Pools using same set of MAs. 

  1. Primary copy going to disk storage and secondary copy going to cloud storage. Here you can use primary copy MAs to configure cloud storage pool. 
  1. Multi-tenancy case where you need to segregate data from each tenant, in such case you may needs to create separate storage pool for each tenant on the same set of MAs. 

In all other cases I would recommend to create one single storage pool per site. We have horizontal scaling feature to scale-out DDBs where we create multiple DDBs within the storage pool for different data types. Don’t treat these horizontal scaling DDBs as separate DDBs, I would still treat all DDBs of a storage pool as a single DDB.

I have often wondered when/why someone would setup multiple DDB’s on one MA?   We currently run a Partitioned Global Dedupe database, with 4 partitions each on their own MA.   Would it be better to divide them differently? 
  Are the multiple DDB’s for different data types/block sizes?

That’s a good question, @Farmer92 .   Our setup dates back to V11.8, which is based on the front-end capacity of the data being protected.   Based on reading I’ve been doing for Commvault v11.24, they’ve addressed many of the concerns I cited in my opening post so that the DDBs are more scalable than they once were.   Also, pegging capacity to the back-end (Commvault data written size) is a huge improvement, and will greatly reduce the number of Media Agents required to comply with Commvault Best Practice.  

As far as breaking up workloads into different DDBs, you get wildly different deduplication benefits from different databases, and it’s wonderful to be able to measure that granularly.    Also, the capacity limits recommended in earlier version 11 releases were far less generous (due, I believe, to limits of the deduplication database design), and you could only scale up to a recommended limit of 1 billion blocks per DDB.   In truth, you could run with several billion blocks in a DDB, but if you ran into corruption issues, recovery is severely impeded by an oversize DDB (and this in turn impacts backup performance).  

I’m going to continue to update this thread as I move along, but I’m already feeling better about the direction Commvault is moving.   My biggest fear was the prospect of needing to stand up tens of media agents to be in compliance with best practice, but it seems Commvault has assuaged that concern with the newer releases.  

 

Thanks!

Userlevel 1
Badge +4

@Prasad Nara  : 

How many/much Unique & Secondary Blocks can supports the MA with single partition DDB Store? and what is the best practice even the DDB on SAS SSD disk. 

 

 

Userlevel 1
Badge +4

@Prasad Nara  : 

How many/much Unique & Secondary Blocks can supports the MA with single partition DDB Store? and what is the best practice even the DDB on SAS SSD disk. 

 

 

One 2TB disk may holds up to 2 billion unique records but DDB partition performance may start degrading after 1 billion records.

 

 

@Prasad Nara  : There is no straight forward answer, earlier when I work with support(2 year ago) they have given certain rules and did not get ticket number with me to validate the information.  

 

As of | No. of Unique Blocks | No of Secondary Blocks | Pending Records | Data Size to be Freed | No of Connection | Avg Q&I Time(ms)

08/23/2021 11:22 PM 147,654,048 676,959,040 0 0 MB 26 299

Reply