Skip to main content
Solved

Estimating "Back-end Size"

  • 9 November 2023
  • 5 replies
  • 1026 views

Loved Ghostbusters.  One of the best quips: 

Spangler: “I’m fuzzy on the good/bad thing.  Define ‘bad’”

Egon: “Imagine all life as you know it ending instantly as every molecule in your body explodes at the speed of light.”

 

I’m looking to replace our Media Agents, one with a dedupe database that’s getting up in the 1.5 µs range.
 

Looking at this (properties of our dedupe storage policy):

… and the dedupe drive/path has ~2.2TB in it. 

For the “Back-end size for disk storage” on the “Hardware Specs for Dedupe Mode” (https://documentation.commvault.com/11.24/expert/111985_hardware_specifications_for_deduplication_mode.html), I’m looking for the “Extra-large” with 2 x DDB… (1), as possibly referenced in the “Total Data Size on Disk...” or simply the “Extra small” (2) as shown here:

My thoughts are #1 “up to 1000 TB” is the “total size on disk for all DBs” from the policy.  I’d just like to be clear on that before crossing the streams.

Thank you for your help ~ 

 

Hello @roc_tor 

 

Thanks for raising this question, in the foot notes of the documentation it gives you a few examples of how to calculate the Back end size but in short it is the size on disk after Deduplication.

Please review the following and advise if it answers all your questions:
 

Back-end storage size (BET) requirements approximately range from 1.0–1.6 times of the front-end data size (FET). This factor will vary directly in proportion with the amount of retention required and the amount of daily change rate for the front-end data. For example, if the data is retained for lower number of days, then it reduces the predicted amount of back-end storage requirement. Whereas, if the extended retention rules are applied to a larger portion of the managed data, then it can increase the back-end storage consumption. The FET estimate can be used in the storage policy design to help size the appropriate resources to support the use-case.

The following are examples of commonly used settings for the backup retention.

Example 1:

  • 80% VM/File data
  • 20% Database data
  • Daily change rate 2%
  • Compression rate 50%
  • Daily backups are retained for 30 days

Factoring in these parameters, the back-end storage size requirements can range from 1.0 - 1.2 times the front-end data size.

Example 2:

  • 80% VM/File data
  • 20% Database data
  • Daily change rate 2%
  • Compression rate 50%
  • Daily backups are retained for 30 days
  • 8 weekly backups are retained for 1 year
  • 9 monthly backups are retained for 1 year
  • 1 yearly backup is retained

Factoring in these parameters, the back-end storage size requirements can range from 1.4 - 1.6 times the front-end data size.

https://documentation.commvault.com/11.24/expert/111985_hardware_specifications_for_deduplication_mode.html


Based on what you shared in your screenshot your Backend size is 500TB and to future proof your environment i would recommend 2 MA’s with 2 DDB disks each of 400GB SSD. This will protect your environment up to 1PB of back end storage and keep your Q&I time down. Its important to note that Q&I is most commonly impacted by Disk performance and Disk Queue lengths so having 4 disks will help keep it down and provide high performance. 


Thank you very much for not only showing me the destination, but how to get there.  

 

Question tho - you suggest having 4 disks.  What I was actually considering recommending is 8 x 2 TB disks in a RAID 10 on each MA (yes, two MAs).  

 

So two MAs, each one:

CPU:  32 (threaded 64)

Memory:  128 GB (we have 256 now.. it NEVER uses more than half that)

c: - OS (actually suggestiong two mirrored disks here)

RAID 10, 8 TB available:

d: index   (2 TB)

e: dedupe partition 1 (3 TB) 

f: dedupe partition 2 (3 TB)

RAID card:  4 TB (2 recommended) 

Would that be overkill? :P  My thoughts are that multiple disks could fail, and we might still be going.

 

Note that we have two sites in a single comcell and … by policy and dedupe, basically treat them as two separate sites (that duplicate their backups to each other).

 

So we’re going to have 4 of these MAs.. if I can keep my boss happy enough 🙂.


Also… does anyone have a good (or bad) opinion of using Linux for media agents?  We use FC shared SAN drives - never done that in linux and if we do do that, it has to work for meh.  It looks as though Redhat is the defacto recommended linux flavour for CV.  Would that be an accurate statement?  How well would CV support work if we chose to run with Oracle?  Ubuntu?

At this point, I’m planning on Windows.  But we obviously need to back up the DDBs, etc.  Thus, shadow copies.  M$ Shadow copies are a crapshoot at best.  If we run with Linux/LVM - has anyone had issues performing backups on this?  


Also… does anyone have a good (or bad) opinion of using Linux for media agents?  We use FC shared SAN drives - never done that in linux and if we do do that, it has to work for meh.  It looks as though Redhat is the defacto recommended linux flavour for CV.  Would that be an accurate statement?  How well would CV support work if we chose to run with Oracle?  Ubuntu?

At this point, I’m planning on Windows.  But we obviously need to back up the DDBs, etc.  Thus, shadow copies.  M$ Shadow copies are a crapshoot at best.  If we run with Linux/LVM - has anyone had issues performing backups on this?  

Almost all our MAs are based on Linux (AlmaLinux, CentOS & RHEL). It works as a charm, but I would recommend to stick to the supported distros as documented here: https://documentation.commvault.com/2023e/expert/2822_mediaagent_system_requirements.html#linux


Thank you very much for not only showing me the destination, but how to get there.  

 

Question tho - you suggest having 4 disks.  What I was actually considering recommending is 8 x 2 TB disks in a RAID 10 on each MA (yes, two MAs).  

 

So two MAs, each one:

CPU:  32 (threaded 64)

Memory:  128 GB (we have 256 now.. it NEVER uses more than half that)

c: - OS (actually suggestiong two mirrored disks here)

RAID 10, 8 TB available:

d: index   (2 TB)

e: dedupe partition 1 (3 TB) 

f: dedupe partition 2 (3 TB)

RAID card:  4 TB (2 recommended) 

Would that be overkill? :P  My thoughts are that multiple disks could fail, and we might still be going.

 

Note that we have two sites in a single comcell and … by policy and dedupe, basically treat them as two separate sites (that duplicate their backups to each other).

 

So we’re going to have 4 of these MAs.. if I can keep my boss happy enough 🙂.


Hello @roc_tor 

This is a risky setup when you have a very active environment and you are sharing the workload of your Index and Dedup on the same group of MA’s,

In all my experience with Dedup, as long as you have a dedicated disk for each partition of the DDB you should be in the clear and if you are doing 100% client side deduplication ( default behaviour ) then you will have a large amount of CPU and Memory free most of the time. The reason we have such high resource recommendations is that when you are running reconstructions of the DDB they are very CPU and Memory aggressive as well as verifications ( default scheduled once a week ).  

The average workload of the DDB does not require a huge amount of compute but you don’t want your maintenance tasks to over run and then intern impact your backup performance.

I also recommend on the side of caution so I would setup 3 servers, 2 MA’s with 2 disks each ( plus OS disks )  and the third MA to act as a index server. That will give you 4 partitions in the DDB to allow you to expand your environment and you are not at risk of your index operations impacting your backups and vise versa.

But as all recommendations, that is all they are. You can choose to play around and see what works for your environment. 

Good luck on the build!


Reply