Skip to main content
Solved

Calculating the rate of change

  • 29 June 2021
  • 3 replies
  • 3590 views

Hi Everyone,

 

I need to provide figures on the rate of change for our backup data, as we are looking to send data to another location with two weeks retention.

 

I have a mature, deduplicated environment so the figures I am seeing on reports and the like, are not too much use at the moment.

 

I need to really factor in two things:-

1 - Expected size of the baseline (I will be created a new copy, targeting the new location)

2 - The rate of change of future backups.

 

So I have two main questions:-

 

1 - How do I calculate my expected dedupe and compression savings for my first Auxcopy.

I realize this will effectively be copying over an equivalent full backup, since it will be seeding the new library.

I am thinking along the lines of assuming a 50% saving (compression and some dedupe combined) but I am wondering if there a better or more accurate way of doing this? My data is largely filesystem, so OS and server, but I may need to look at application data too (SQL\ Oracle).

 

2 - How do I calculate my future rate of change?

This should be much easier, but I am reading articles from people saying they use 6 or 7 incrementals  and average them out.

However, if I’m keeping two weeks worth of data, then should I also not be including the figures from a full backup? At the moment, I am seeing figures like 15TB written, from an Application Size of 200TB on ALL backups over a seven day period.

So that is a pretty good return on savings, but how do I align this to future needs?

 

if you have experience of this, it would be good to hear from you.

Thanks

 

 

 

MountainGoat,

 

 

1 - How do I calculate my expected dedupe and compression savings for my first Auxcopy.

I realize this will effectively be copying over an equivalent full backup, since it will be seeding the new library.

 

50% is  Safe assumption for compression with FileSystem Data.  It does really depend on the nature of that data, though.  A Large number of pdf, and video/audio files for instance would reduce the compression as they are already in a compressed format.  Conversely, if your clients are laden with more text based documents, html or other programming code  for instance you can see far greater than 50%.

 

From your description it sounds like the new Copy/Copies will be secondary copies in an already existing Policy.  If this is the case you can basically view the Estimated Baseline Size of the Source Copy’s DDB.  (See Example From my Lab).

 

 

For the most part the data will dash over to the new copies relatively the same.

Database Agents are an exception to this.  In order for Database Jobs to line up similarly on source and destination Copies you should follow the steps here:  https://kb.commvault.com/article/55258

 

FYI:  Baseline is calculated from the size of the compressed size of the most recent Full Backup  for all the Associated Subclients plus 20% to estimate Incremental data change.

 

 

2 - How do I calculate my future rate of change?

 

Rate of change is going to be unique from organization to organization, and even from client to client.

It heavily depends on the use case of each subclient and the data which is being protected.

 

VMs for instance will often have a relatively lower rate of change.  Consider that the OS and program  files for instance will have very little change.

 

Database and FileSystem can have vastly different change rates, all depending on how often the data/files is changed, updated or deleted on the client end.  

This is why it is easier to estimate based off of already existing jobs. Look at previous Cycles of your jobs and get an average of how much the Incrementals historically need to write.  You also might find instances where certain days of the week historically see heavier change than others.  Again this depends on the production use for the particular clients.

 

If you are using Traditional Fulls, than you would want to consider the same Incremental rate of change for the subsequent Fulls, since you are likely to have the same daily change from the previous days Incremental to the Full.  Note though that the Full will be able to take advantage of the Dedup data laid down by the previous Full and the subsequent Incrementals.

 

Synthetic Fulls are easier to calculate.  These are typically 99% savings.  A Synthetic Full is not backing up any new data.  Instead it utilizes the data already written by the previous Full and Incrementals to virtually create a new Full.  

 

 

At the moment, I am seeing figures like 15TB written, from an Application Size of 200TB on ALL backups over a seven day period. So that is a pretty good return on savings, but how do I align this to future needs?

 

The 15 TB written by this job is indicative of any unique data the job needed to write, plus the tiny reference pointers it had to write for data that already existed on the Storage.  This is where the Dedup benefits come into play.

This job saved 185TB of Storage Space due to A.) Compression, and   B.) Some portion of the baseline for that DDB. 

 

I hope this answers your questions.

 

 


Thanks for such an excellent reply @Mike Z-man . It has confirmed the direction I was thinking of heading in.

With regards to 15 TB of data being written over seven days, do you think it’s correct to apply a small formula to reach my daily rate of change?

 

Something like 15TB is 7.5% of 200 TB.

Given that 7.5% occurs over 7 days, then I divide by 7 to get my daily ROT.

 

So to summarize, my daily ROT is therefore 1.07% or 2.14 TB’s per day.

This is based on Filesystem data, OS and System State. But if this is correct, I can apply to my other data types too.

 

With regards to baseline that you mentioned, I was hoping to trust it, but I have a couple of GDSP’s where the baseline is significantly larger than the sum of the data stored in the associated library.

Although I’m not an expert on these “baselines” I’m fairly certain a new baseline should not be 40% larger than the total size of the data currently held. Which is kind of a shame, because as you highlighted

I suspect this an anomaly but I could be wrong.

 

Thanks again.


Calculating the Rate of Change is only going to be based on the data set you have available.

It is going to be different from client to client  to some degree and really depends on the actual use of that client and the data it holds.  However take a large enough sample and the average should relatively even out for you.

 

As for the Baseline, the important thing to remember here is it is an estimate.

The Estimate calculates the size of the latest Full for All Subclients Associated to that DDB + 20% for Incremental change over the cycle.

-- So you might not have a 20% change rate.

-- Also You might have Subclients that are no longer scheduled for backup, or have their activity Disabled.  However since they are Associated to the Policy, they will be included in the Baseline calculation.

 

 

 


Reply