Solved

Creating a Selective Copy Using a Library (Enable Deduplication)


Badge +5

Hi Guys,

 

I finally found the exact article that describes a solution I wanted to implement and am seeking opinions to do this or not. Basically I want to create multiple Selective Copies under a storage policy and associate with different subclients/computers to meet a client’s tiering model and different retention. Because I also want to implement Deduplication between the Primary and each selective copy, Weekly and Monthly, I’m weighing an option to Create Selective Copy using a Library instead of using a Storage Pool. In the article below;

Article: https://documentation.commvault.com/commvault/v11/article?p=119730.htm

Step 17b, can I select Partition Path as a normal Windows folder e.g. D:\<randomFolder>. If I create additional selective copies under the same storage policy, can I use the same D:\randomFolder> to deduplicate data between Primary Copy and additional selective copies or I have to create a D:\<randomFolder1> and so forth for each copy.

 

I ask because I do not have enough dedicated Premium SSDs provisioned to create Deduplication Engines. Does this method work and what are the pros and cons?

icon

Best answer by Sean Crifasi 10 May 2021, 17:09

View original

17 replies

Badge +5

@Sean Crifasi  and @Mike Struening able to help?

Userlevel 5
Badge +8

For each copy you create you can select the dedupe storage pool you want to use and each copy can use the same storage pool - so it will use the same DDB .

Userlevel 4
Badge +9

Taking your limitations into account the best way to deploy this would be as follows:

  • Create a new GDDB (global deduplication policy) for each retention period (weekly, monthly, yearly for example)
  • Create a new selective storage policy copy for your desired retentions and associations
  • The reason for this is because adding extended retention to the primary copy will bloat the ddb and lead to excess space being consumed or potentially degrade DDB performance
  • With the selective storage policy copies you can also limit the associated subclients selected from those available from the primary storage policy copy.
  • For example, perhaps subclients A, B and C need to be on the weekly selective copy but not yearly. Perhaps subclients D, E and F need to be on weekly, monthly, and yearly selective copies.
Badge +5

@DMCVault interesting! So I can use the same GDDB path used by the Primary Copy for any additional selective copy? 

Userlevel 5
Badge +8

no that's not what I meant, you would not use the same DDB as your primary -  you would create a separate ddb storage pool (global ddb policy) - you can use that for your copies.  You have to consider the scale and constraints of course.

Badge +5

@DMCVault ok gotcha. Thanks a lot. I will present both your suggestion and @Sean Crifasi’s and let the client pick. 

Userlevel 5
Badge +8

Sean is basically saying the same thing but with more specific details.  Essentially  a best case general rule of thumb you would try to avoid mixing very long term retention copies with shorter retention copies in the same DDB.  As an example you have a copy  you need for 7 years for compliance, but the other copies are 90 day retention.  Don't over think this too much - you will be fine with mixed retention, just keep compliance retention stuff separate if possible (very long retention).

If you can share more specific details (what data, how many copies and what retention looks like) we can provide more accurate guidance.

Badge +5

@DMCVault , @Sean Crifasi , This is an Azure environment with 3 subscriptions, Prod, Management and DevTest each having its own pseudo client. Data is going to one subclient on each virtualisation pseudo client and goes to the same storage policy e.g.

ProdClient > AllProdVMs subclient > VMsStoragePolicy

DevTestClient > AllDevTest subclient > VMsStoragePolicy > deduplicated Primary Copy to Hot Blob storage 

 

The customer however wanted their servers tiered with Prod and Mgmt servers spread across 3 tiers and DevTest in the 4th tier that has different retention to the other 3. 
 

Current retention on the primary copy is set to 90 days with no additional copies configured but required retention for tier 1 to 3 is

Daily > 15 days (GDDB exists)

Weekly > 35 days (DASH Copy)

Monthly > 13 months (DASH copy)

Yearly > 7 years (non-deduplicated)

Tier 3 only requires a synchronous copy to another region.

Current retention for Tier 4 is similar to 90days too because data is going to same SP across all subscriptions. Required is Daily 10 days, Weekly 35days and Monthly 6 months.

As for data we are looking at a footprint of around 60TB with Virtual Servers taking up most of that. There are a few application agents SQL and Oracle on some of the clients.

 

So to remediate the tiering and retention requirements this is where I am and the intention is to add selective copies associated with each subscriptions’s subclient under the existing storage policies or to define new subclients with tiered content and create new storage policies for each tier and create selective weekly, monthly and yearly where required. 
 

We have two media agents in the active Azure region and 1 in the DR region. My intention is to place each copy from Primary to Monthly on independent deduplicated storage libraries but the yearly (non-deduped) can share storage with the monthly copy.


If I go by the approach of adding subclient/client associated selective copies, would I have to use extended retention on the DASH copy or have to create additional associated copies for Prod, Dev Test and Mgmt subclients/computers?

 
Hope I exhausted everything.

Badge +5

Sean is basically saying the same thing but with more specific details.  Essentially  a best case general rule of thumb you would try to avoid mixing very long term retention copies with shorter retention copies in the same DDB.  As an example you have a copy  you need for 7 years for compliance, but the other copies are 90 day retention.  Don't over think this too much - you will be fine with mixed retention, just keep compliance retention stuff separate if possible (very long retention).

If you can share more specific details (what data, how many copies and what retention looks like) we can provide more accurate guidance.

@DMCVault , @Sean Crifasi Have shared the details.

Userlevel 4
Badge +9

@DMCVault , @Sean Crifasi , This is an Azure environment with 3 subscriptions, Prod, Management and DevTest each having its own pseudo client. Data is going to one subclient on each virtualisation pseudo client and goes to the same storage policy e.g.

ProdClient > AllProdVMs subclient > VMsStoragePolicy

DevTestClient > AllDevTest subclient > VMsStoragePolicy > deduplicated Primary Copy to Hot Blob storage 

 

The customer however wanted their servers tiered with Prod and Mgmt servers spread across 3 tiers and DevTest in the 4th tier that has different retention to the other 3. 
 

Current retention on the primary copy is set to 90 days with no additional copies configured but required retention for tier 1 to 3 is

Daily > 15 days (GDDB exists)

Weekly > 35 days (DASH Copy)

Monthly > 13 months (DASH copy)

Yearly > 7 years (non-deduplicated)

Tier 3 only requires a synchronous copy to another region.

Current retention for Tier 4 is similar to 90days too because data is going to same SP across all subscriptions. Required is Daily 10 days, Weekly 35days and Monthly 6 months.

As for data we are looking at a footprint of around 60TB with Virtual Servers taking up most of that. There are a few application agents SQL and Oracle on some of the clients.

 

So to remediate the tiering and retention requirements this is where I am and the intention is to add selective copies associated with each subscriptions’s subclient under the existing storage policies or to define new subclients with tiered content and create new storage policies for each tier and create selective weekly, monthly and yearly where required. 
 

We have two media agents in the active Azure region and 1 in the DR region. My intention is to place each copy from Primary to Monthly on independent deduplicated storage libraries but the yearly (non-deduped) can share storage with the monthly copy.


If I go by the approach of adding subclient/client associated selective copies, would I have to use extended retention on the DASH copy or have to create additional associated copies for Prod, Dev Test and Mgmt subclients/computers?

 
Hope I exhausted everything.

 Hi @Baba Imani,

Apologies for the delay, 

I’ve re-read your message a few times and discussed this with @DMCVault. I’d like to request clarification on a few points to ensure I fully understand.
 

  • You mentioned that there are 3 subscriptions, these would show as 3 separate Psuedo-Clients in Commvault, which can be associated to the same storage policy as you stated.
  • As your dev subscription has different retention criteria this should be a different storage policy
  • Within the Storage policy you will have a primary storage policy copy
    • The primary copy will backup all subclients to the primary location whether that is disk or direct to azure storage is up to you
    • You can use selective copies to retain only the full/synthetic full backups, or synchronous secondary copies to copy ALL backups
    • We recommend utilizing a media agent in the same cloud environment to improve performance and will potentially reduce egress costs depending on your service/agreement with your cloud provider
  • Additional storage policy copies would be created based upon your criteria
  • I’m unsure of your usage of the terminology tier1 through tier 3 as it relates to this matter and if you are referring to storage policy copies or cloud storage tiers such as hot/cold

Reviewing your writeup I would anticipate something such as the below to suit your needs
- Storage Policy A - Prod & Management associated

  1. Primary copy - 15 days x cycles basic retention
    1. Where x = Cycle retention configured to your desired needs
  2. Selective copy - Weekly - 7 days x cycles - basic retention
    1. Extended Retention - Weekly full - 35 days
  3. Selective copy - Monthly - 7 days  x cycles - basic retention
    1. Extended Retention - Monthly full - 395 days
  4. Selective copy - Yearly - 7 days x cycles - basic retention
    1. Extended Retention - Yearly full - 2555 days 

For each selective copy configuration and extended retention rule you can select either first full backup or last full backup as the selection criteria to be picked depending upon your needs.

https://documentation.commvault.com/11.22/expert/14088_copy_properties_selective_copy.html

 

- Storage Policy B - Development environment associated

10 days, Weekly 35days and Monthly 6 months.

  1. Primary copy - 10 days x cycles
  2. Selective copy - Weekly - 7 days x cycles - basic retention
    1. Extended Retention - Weekly full - 35 days
  3. Selective copy - Monthly - 7 days  x cycles - basic retention
    1. Extended Retention - Monthly full - 182 days
Badge +5

@DMCVault , @Sean Crifasi , This is an Azure environment with 3 subscriptions, Prod, Management and DevTest each having its own pseudo client. Data is going to one subclient on each virtualisation pseudo client and goes to the same storage policy e.g.

ProdClient > AllProdVMs subclient > VMsStoragePolicy

DevTestClient > AllDevTest subclient > VMsStoragePolicy > deduplicated Primary Copy to Hot Blob storage 

 

The customer however wanted their servers tiered with Prod and Mgmt servers spread across 3 tiers and DevTest in the 4th tier that has different retention to the other 3. 
 

Current retention on the primary copy is set to 90 days with no additional copies configured but required retention for tier 1 to 3 is

Daily > 15 days (GDDB exists)

Weekly > 35 days (DASH Copy)

Monthly > 13 months (DASH copy)

Yearly > 7 years (non-deduplicated)

Tier 3 only requires a synchronous copy to another region.

Current retention for Tier 4 is similar to 90days too because data is going to same SP across all subscriptions. Required is Daily 10 days, Weekly 35days and Monthly 6 months.

As for data we are looking at a footprint of around 60TB with Virtual Servers taking up most of that. There are a few application agents SQL and Oracle on some of the clients.

 

So to remediate the tiering and retention requirements this is where I am and the intention is to add selective copies associated with each subscriptions’s subclient under the existing storage policies or to define new subclients with tiered content and create new storage policies for each tier and create selective weekly, monthly and yearly where required. 
 

We have two media agents in the active Azure region and 1 in the DR region. My intention is to place each copy from Primary to Monthly on independent deduplicated storage libraries but the yearly (non-deduped) can share storage with the monthly copy.


If I go by the approach of adding subclient/client associated selective copies, would I have to use extended retention on the DASH copy or have to create additional associated copies for Prod, Dev Test and Mgmt subclients/computers?

 
Hope I exhausted everything.

 Hi @Baba Imani,

Apologies for the delay, 

I’ve re-read your message a few times and discussed this with @DMCVault. I’d like to request clarification on a few points to ensure I fully understand.
 

  • You mentioned that there are 3 subscriptions, these would show as 3 separate Psuedo-Clients in Commvault, which can be associated to the same storage policy as you stated. I agree and for clarity the Prod client has one subclient whose content has VMs that were meant to be separated/classified by severity so they also need to be in separate subclients and new storage policies
  • As your dev subscription has different retention criteria this should be a different storage policy The dev subscription is going to the same SP as PROD but at least all the VMs have the same severity/category and retention it would have been easy to associate with a selective copy. I however want to create a new storage policy for the dev VMs as well
  • Within the Storage policy you will have a primary storage policy copy
    • The primary copy will backup all subclients to the primary location whether that is disk or direct to azure storage is up to you
    • You can use selective copies to retain only the full/synthetic full backups, or synchronous secondary copies to copy ALL backups We intend to create Weekly deduplicated Fulls, Monthly deduplicated Fulls and non deduplicated Yearly Full for each new storage policy. We currently have 1 Global DDB for the primary copies. For each additional selective we want it to go to a different library so we will require a Global DDB for the Weekly and Monthly copies?
    • We recommend utilizing a media agent in the same cloud environment to improve performance and will potentially reduce egress costs depending on your service/agreement with your cloud provider We have two media agents in the active Azure region and 1 MA in the DR region so we should be good on this front
  • Additional storage policy copies would be created based upon your criteria Agreed
  • I’m unsure of your usage of the terminology tier1 through tier 3 as it relates to this matter and if you are referring to storage policy copies or cloud storage tiers such as hot/cold I used tiers to mean severity or service category e.g critical servers are Tier 1, High Severity Tier 2 etc.

Reviewing your writeup I would anticipate something such as the below to suit your needs
- Storage Policy A - Prod & Management associated

  1. Primary copy - 15 days x cycles basic retention
    1. Where x = Cycle retention configured to your desired needs
  2. Selective copy - Weekly - 7 days x cycles - basic retention
    1. Extended Retention - Weekly full - 35 days Question regarding the Retention config method you have above and below. Can’t I just set basic retention 35 days and cycles rather than using the method of 7 days then retain it via extended retention. I want to limit any additional storage of data as this is a cloud storage setup
  3. Selective copy - Monthly - 7 days  x cycles - basic retention
    1. Extended Retention - Monthly full - 395 days
  4. Selective copy - Yearly - 7 days x cycles - basic retention
    1. Extended Retention - Yearly full - 2555 days 

For each selective copy configuration and extended retention rule you can select either first full backup or last full backup as the selection criteria to be picked depending upon your needs. I could just select Weekly Full and select first or last full at the Selective Copy tab right?

https://documentation.commvault.com/11.22/expert/14088_copy_properties_selective_copy.html

 

- Storage Policy B - Development environment associated

10 days, Weekly 35days and Monthly 6 months.

  1. Primary copy - 10 days x cycles
  2. Selective copy - Weekly - 7 days x cycles - basic retention
    1. Extended Retention - Weekly full - 35 days
  3. Selective copy - Monthly - 7 days  x cycles - basic retention
    1. Extended Retention - Monthly full - 182 days

Hi Guys,

I’m seeking to finalise and present this design as soon as possible and have answered and expanded your questions above in underlined text in the hope to bring clarity. Let me know if anything else is not clear. Are there any considerations or risks in this design I may have missed. My next question is regarding excluding the old data as I’m creating new subclients and storage policies. I intend to disable subclient activity, run data aging and seal the DDB to start the new config with a fresh DDB for the primary data. My question is if I disable subclient activity and seal the DDB, can I still run aux copies from Primary to 1 selective copy in the old existing storage policy whilst subclient activity is disabled? I need this as I want to retain the data on the primary copy for longer.

 

@DMCVault , @Sean Crifasi 

Userlevel 4
Badge +9

@Baba Imani 

I’m going to post your replies as bullet points and my response as the sub-bullet point to try to keep this organized.
 

  • The dev subscription is going to the same SP as PROD but at least all the VMs have the same severity/category and retention it would have been easy to associate with a selective copy. I however want to create a new storage policy for the dev VMs as well
    • If you have a mix of vm’s with different retention criteria then create separate subclients accordingly and associate them to the corresponding storage policy
  • We intend to create Weekly deduplicated Fulls, Monthly deduplicated Fulls and non deduplicated Yearly Full for each new storage policy. We currently have 1 Global DDB for the primary copies. For each additional selective we want it to go to a different library so we will require a Global DDB for the Weekly and Monthly copies?
    • Each selective copy should utilize a GDDB for the respective retention
    • If you have a few weekly selective copies across several policies they can share a GDDB for weekly for example but shouldn’t be pooled with a GDDB that is used for monthly retention for example
  • We have two media agents in the active Azure region and 1 MA in the DR region so we should be good on this front
    • This is good, nothing further of concern here
  •  Question regarding the Retention config method you have above and below. Can’t I just set basic retention 35 days and cycles rather than using the method of 7 days then retain it via extended retention. I want to limit any additional storage of data as this is a cloud storage setup
  • I could just select Weekly Full and select first or last full at the Selective Copy tab right?
    • That is correct, you choose the desired retention and first or last backup of the time period. 
  • if I disable subclient activity and seal the DDB, can I still run aux copies from Primary to 1 selective copy in the old existing storage policy whilst subclient activity is disabled? I need this as I want to retain the data on the primary copy for longer.
    • Yes you can, this will only prevent the disabled subclients from running new backups but would not prevent an aux copy of existing data so long as you do not prune it from the copy before you perform the aux copy.

Let me know if that addresses all concerns or if I missed anything!

Badge +5

Thanks for that @Sean Crifasi. Makes it much more clearer now. Are there any other considerations or risks to look out for?

  • I already have SQL logs going to a dedicated storage policy and retained for 15 days only. I will negotiate with the customer to send Oracle logs to the same storage policy. If I have this configured, then it should be ok to use Basic Retention right instead of 7 days + Extended Retention for the other copies as I mentioned earlier? In that case only the Weekly Fulls will be copied to the Weekly Copy right? No incrementals and other non Weekly Fulls will be copied? I will set  manual aux copy schedules for this
     
  • My other question is I want to avoid mixing the new data (subclients and new SPS) with the old DDB but also want to run aux copies for old data whilst avoiding data getting pruned. Primary Copy retention is 150 days at the moment. If I leave it at 150 days, I should be able to Seal DDB and run aux copy for All Backups to the secondary copy, and avoid data getting pruned right? To copy data to secondary copy, I will use a separate Global DDB created temporarily for aux copy operations and when done will seal it as well. I will then lower retention on old Primary Copy to a lower number
     
  • And with aux copies I’m assuming are backend data so shouldn’t contribute to the front end terabytes for Commvault Complete for FETs right?

 

Userlevel 4
Badge +9

I still advise to use the extended retention rule, this would not cause jobs to be held for longer. This won’t cause other backup types such as incremental to be copied, that is the reason to use the selective copy instead of synchronous in the first place.

The jobs would enter extended retention if they are applicable for extended retention. 

If you configured the weekly selective copy as 7 days basic retention and extended retention for 90 days for all fulls then all full backups copied to the selective copy would be marked as extended retention and retained for 90 days at which time they would become prunable so long as all basic retention criteria was already met.

The below guide may help with this.
https://documentation.commvault.com/commvault/v11/article?p=43217_1.htm

 

For your question regarding pruning you are correct, you can disable backups so new backups don’t run then seal the ddb and aux copy the jobs to a new storage policy copy. You can lower retention on the original primary once you confirmed the necessary jobs were copied.

For licensing concerns please review the following:
https://documentation.commvault.com/commvault/v11/article?p=6911.htm

Badge +5

With regards to the source copy for the Monthly and Yearly Copy, with retention for the Primary Copy set at (15 days, 2 cycles), is it logical to make the Weekly copy the source copy for the Monthly Full and the Monthly Full the source for the Yearly or I’m overthiking? I have a feeling 15 days retention on the primary doesn’t give a consistent Monthly Full as by the time of the Monthly Full the retention rules for the Primary Copy will have been met and data aged so the last copy of the month won’t have all the month’s data. Am I correct? @DMCVault @Sean Crifasi 

Badge +5

With regards to the source copy for the Monthly and Yearly Copy, with retention for the Primary Copy set at (15 days, 2 cycles), is it logical to make the Weekly copy the source copy for the Monthly Full and the Monthly Full the source for the Yearly or I’m overthiking? I have a feeling 15 days retention on the primary doesn’t give a consistent Monthly Full as by the time of the Monthly Full the retention rules for the Primary Copy will have been met and data aged so the last copy of the month won’t have all the month’s data. Am I correct? @DMCVault @Sean Crifasi 

@DMCVault @Sean Crifasi I just asked one last question above.

Userlevel 4
Badge +9

Hi @Baba Imani 

Apologies for the delay, unfortunately several long running critical calls have left me quite backlogged.

You can modify the source copy if you would like, I don’t see the need to do so but there is no harm in doing so either. 

  • Once a backup job is written to the primary copy it is automatically marked to aux copy to all other aux copies that rely on the primary copy as its source copy.
  • If you choose to use last backup of time period for example and we are working with the monthly copy:
    • If we are in the beginning of the month backup full backup job 123 completes on the primary copy
    • This is now marked to be copied to the additional copies
    • Upon successful aux copy the job would be marked as held for extended retention even if it’s not the very end of the month
    • This is expected behavior and because we need to ensure we have a backup held for the extended retention criteria at all times
    • Full Job 321 writes the next week and this is now marked to be copied to the additional storage policy copies
    • This job is now marked for extended retention and the prior job will be released from extended retention and become prunable if there is no other criteria requiring the job
    • Each subsequent later job would be marked for extended retention, releasing the prior as we progress later into the month this way we ensure we always have the latest backup held in extended retention until the final job of the criteria is met at the end of the month
  • If you chose to use first backup of time period then it would skip this process and mark the first job that is copied to the storage policy copy for the configured time period as extended retention
  • Modifying the source copy for the other copies is not necessary but changing from a primary copy if this is on disk to another source copy that is in cloud may potentially lower egress costs

To address this specific piece:
“I have a feeling 15 days retention on the primary doesn’t give a consistent Monthly Full as by the time of the Monthly Full the retention rules for the Primary Copy will have been met and data aged so the last copy of the month won’t have all the month’s data.”

This is addressed in my example above, the data would be copied to the other storage policy copies before it ever became prunable. Furthermore if there is an issue with an aux copy and the data is not copied, the jobs will not prune from the respective source copies regardless of meeting retention as we will not prune data without having completed all required copies of the data.

Reply