Solved

AuxiIllary copy slow and architecture review

  • 9 September 2021
  • 17 replies
  • 232 views

Userlevel 2
Badge +6

Hi, Everyone.

 

I have a customer with this exact problem.

 

After the Commvault refresh/reconfiguration which was concluded some months back, we had some issues backing up to tape which we finally understood was related to the tape drives we were using then.

 

We have resolved the issue with the drives but we are still having copy to tape running at a very low speed (As low as 13GB/hr).

 

Kindly assist us with below.

 

  1. We have sister companies running this same Commvault and we want to know how their setup is different from ours that make them better.
  2. We need to review our architecture to be sure the copy to disk and copy to tape can happen at the same time from the primary source.
  3. The difference in storage in terms of I/O and disk rpm from what we have here and that of our sister companies.

 

Is there any way i can help them, please?

icon

Best answer by Damian Andre 22 September 2021, 18:11

View original

17 replies

Userlevel 2
Badge +6

Hi, @Damian Andre 

Thank you for the update.

This will definitely go a long way. 

I will propose this to them if the problem persists.

I will definitely keep the community updated.

Userlevel 7
Badge +15

Hello @Mike Struening 

I was onsight, today.

Their major point, is, is it possible to have the primary copy to disk running and have the copy to tape run at the same time?

Is it possible? And what’s the implication?

 

If you mean, can you copy data from running backups before they complete - the answer is yes. You enable it on the secondary copy properties I believe. (Copy Properties / Copy Policy) - there are caveats that the documentation notes below:

https://documentation.commvault.com/commvault/v11/article?p=14085.htm

 

  • Pick data from running backup jobs

    If selected, when an auxiliary copy job with the Use Scalable Resource Allocation option enabled is performed during a backup job, the data available on the primary copy is picked for copy by the running auxiliary copy job. This option causes the auxiliary copy operation to create the secondary copy faster. This option saves time for the auxiliary copy operation, especially when the backups running are huge.

    Notes:

    • For this parameter to work, enable the feature of replication of backup jobs completed during an auxiliary copy operation and specify the time interval to check for completed backup jobs. For instructions, see Enabling Frequent (Timely) Replication of Backup Jobs Completed during an Auxiliary Copy Operation.
    • This option is supported only for synchronous copies. This option is not supported for inline copy, snapshot copy, selective copy and silo copy.
    • This option is not supported for auxiliary copy operations that process Edge backup data.
    • This option is supported only when the source copy is on a disk or a cloud library.

 

Userlevel 2
Badge +6

Hello @Mike Struening 

I was onsight, today.

Their major point, is, is it possible to have the primary copy to disk running and have the copy to tape run at the same time?

Is it possible? And what’s the implication?

Userlevel 7
Badge +23

Hey @Mubaraq !  Hope all is well!

We can absolutely help.

Not sure if you saw this thread:

 

There’s a tool we have (mentioned in this thread here) that can give you the breakdopwn of each portion of the job and show you where the bottleneck is.

Userlevel 2
Badge +6

Hello, @Mike Struening 

I will reach out to the customer and revert.

Userlevel 2
Badge +6

Hi, @Mike Struening 

 

So sorry i have not responded.

 

They are having an interview review. I will be onsight, tomorrow. 

 

I will share updates.

 

NB:Kindly remove your answer as best answer. I clicked by mistake

Userlevel 7
Badge +23

Best answer removed!

Keep me posted, thanks!!

Userlevel 2
Badge +6

Hi, @Damian Andre & @Mike Struening 

I pulled the log from auxilliay job.

I have seen the suggestion. Will be glad if i get your input also.

NB: I removed the upper part and also the domain name from the MA FQDN

See below:

Job-ID: 100491
 Job Duration: [23,September,21 03:36:06  ~  23,September,21 10:32:38] 18h:56m:32s (68192 seconds)
 Total Data Read: 5044269408834 [4697.84 GB] [73.93 GBPH]
 Total Data Transfer: 5048917958228 [4702.17 GB] [119.97 GBPH]
 Total Data Write: 5044836057110 [4698.37 GB] [83.35 GBPH]
 Stream Count: 22
 
 
 Remediation(s): 
 --------------
 
 Stream 1:
 IDA: Replicator DashCopy
 Source: cvlt-m-agt07
 Destination: cvlt-m-agt04:CVLT-M-AGT04.
 
 

----------------------------------
| READS FROM THE SOURCE ARE SLOW |
----------------------------------
    - Make sure maximum reader streams are selected for AuxCopy.
    - Run CvDiskPerf on the source path to ensure the disk(s) are optimized for random I/O.
    - Ensure Distribute data evenly for offline reads are selected from the storage policy properties.
    - Review and increase the DataMoverLookAheadLinkReaderSlots to increase the read ahead factor (max value is 128). Suggested values 32, 64, 128.
    - If Auxcopy is copying synthetic fulls, ensure that the agent that ran synthetic full supports multiple streams to maximize AuxCopy read performance.

DOCUMENTATION
-------------
http://documentation.commvault.com/commvault/v11/article?p=8630.htm
http://documentation.commvault.com/commvault/v11/article?p=8855_1.htm

CONSIDERATION(S)
----------------
Increasing look ahead slots will increase the memory consumption on the MA. So increase the value gradually.

Userlevel 7
Badge +23

@Mubaraq , regarding the stream perfornance, what is the expected throughput?  The read speeds are the slowest as you mentioned, but they all that much slower than write and network transfer.

Userlevel 2
Badge +6

Hello, @Mike Struening 

 

They expect like a 200GB/Hr but i am going to propose running auxiliary jobs 4 hours to break the read operation and network transfers into chunks, instead of having one schedule per day.

Userlevel 7
Badge +23

Ok, see how that works.

Is 200 GB/hr reasonable for the hardware (read, network, and write)?  In any throughput question, a valid expectation is a must….too often someone expects something that the setup will never delivery :nerd:

Userlevel 2
Badge +6

Hi @Mike Struening 

Sorry i have been away.

There has not been complaint from the client about throughput since we did the tweakings.

 

Userlevel 7
Badge +23

@Mubaraq , I unmarked the other comment as the Best answer.

At this point, I would suggest opening a support case to have a deeper analysis done.

Can you share the case number so I can follow up accordingly?

Thanks!

Userlevel 7
Badge +23

Hi @Mubaraq , following up on your testing.  Were you able to get maximum outputs for each link in the chain to see if 200 GB/hr was reasonable?

Userlevel 2
Badge +6

@Mike Struening 

I meant everything is good for now as there has not been any complaint about throughput form the client.

We can leave the answer as BEST for now

Userlevel 7
Badge +23

Ok, great.  If anything changes, just update this thread and I’ll respond!

Userlevel 7
Badge +23

Hey @Mubaraq , following up to see if you were able to determine the issue based on the performance logs.

Thanks!

Reply