Solved

VSA backup via SAN runs very slowly

  • 28 February 2022
  • 8 replies
  • 1281 views

Userlevel 4
Badge +15

We had to move the VSA backups to the second data center because the backup storage in the first data center filled up and we can't delete any data. 
Since this has been moved, the whole thing has been running very slowly. Transport Mode is still SAN but it only backs up one VM instead of 15 at a time.
You can clearly see the throughput between SP_Alsterdorf_Standard and the change to SP_Norderstedt_Standard


Both media agents are set up so that both can access the backup storage and the production storage. 
Does anyone here have a quick solution to this problem? 

Kind Regards

Thomas

icon

Best answer by Mike Struening RETIRED 18 April 2022, 22:49

View original

If you have a question or comment, please create a topic

8 replies

Userlevel 6
Badge +14

Hi @thomas.S ,

 

If you check the Job Details, in the VM Status Tab does it definitely show the Transport Mode as SAN?

Does the new SP / DR Side MA/Library have the same amount of Streams Capability here?

With regards to the streams used, Did you observe 1 VM processed towards the end of the Job or throughout the Job?

If you check the vsbkp.log on the VSA Proxy taking the backup, You can check the “stat-” indicators in the log to see the speeds obtained. (ReadDisk, Datastore Read, WritePLBuffer).

 

Best Regards,

Michael

Userlevel 4
Badge +15

Hello @MichaelCapon , 

since the problem is urgent, we have now gone straight to opening a ticket directly.
220228-349
I have checked the configuration as far as I could and could not find any difference between the two sites. 


Kind Regards

Thomas

Userlevel 7
Badge +23

Appreciate the update, @thomas.S !

I see there is already some progress by the engineer:

reviewing of logging currently indicates a bottleneck between the transfer process of the vsbkp process (read) to cvd (write) process.  Since it is on the same physical server, the process would talk to each other internally via loopback addresses.  Can you confirm that all AV exclusions are in place against CV process on this particular media agent?  Also check additional settings on the media agents on the original storage policy vs this one to see what the discrepancy is, because the read performance is around ~400+ MB/sec so that is not the bottleneck for the latest job 3999997 that I reviewed.

 

You can try additional setting nNumPipelineBuffers to see if that can also help with performance.

 

https://documentation.commvault.com/11.24/expert/8603_increasing_pipeline_buffers.html

 

Let me know if you have any questions.

Userlevel 6
Badge +14

Hi Thomas,

 

I also checked your case and can see some progress.

It does seem like the inter-process transfer is slow on the machine here. - I did note one high value for signature processing which could affect, or be affected by this.

Since you have Client Side Dedup enabled, and the VSA is the MA here. Can you check/compare the results using Deduplication set to “On Media Agent”?
(For this Subclient: Properties > Storage Device > Deduplication.)

 

Best Regards,

Michael 

Userlevel 4
Badge +15

Hello, 
 

We are currently in the process of mounting an unused storage as a back-up target in order to be able to continue the data backup for the time being. 
When we have done that I will ask Commvault for a remote session to investigate how we can improve the performance on the one media agent. The suggestion with the parameter nNumPipelineBuffers we had already tried in the past, but without success. 
The deduplication is always set to client side. 

With kind Regards

 

Thomas

Userlevel 4
Badge +15

Hello, 

 

after a remote session with Commvault we could already improve the throughput a bit. It is still far from what we expect but we are getting away with the backup. 
We are currently in the process of scheduling a review of our environment. This will involve a Commvault senior taking a look at the environment to check for misconfiguration.
For this reason, this topic can be considered as done. 
After the environment has been reviewed, I will be happy to provide feedback. 

Kind Regards

Thomas

Userlevel 7
Badge +23

@thomas.S , ‘ll keep an eye on the thread and add the solution once one is discovered.  No need to close the thread until then. If you happen to share before me, that’s just fine as well!

Userlevel 7
Badge +23

Sharing the case resolution details (sans any identifying logs):

For the postgrel SQL backup, looks like read and transfer speed between the client and MA is the bottleneck.  You can find the performance Analysis log on the media agent under PerfAnalysis_.log.  Information below let me know if you have any questions/.

----------------------------------
| READS FROM THE SOURCE ARE SLOW |
----------------------------------
    - Increase the number of data readers from the subclient. Suggested values are 8, 12.
    - Change Application/Read size from the subclient. Suggested values are 512KB,1MB for FS. Refer documentation for Oracle, SQL, VSA.
    - Run CVDiskPerf tool on the source to verify the Disk Performance.

 

DOCUMENTATION
-------------
http://documentation.commvault.com/commvault/v11/article?p=8580.htm

http://documentation.commvault.com/commvault/v11/article?p=8596.htm

http://documentation.commvault.com/commvault/v11/article?p=8855_1.htm

 

CONSIDERATION(S)
----------------
Increasing streams to a high value may cause disk thrashing and also use more system resources.
Changing read app size will cause re-baseline. So increase the value gradually.

 

-------------------------------------------------------------------
| DEDUPLICATION PROCESS IS SLOW DUE TO EITHER SLOW NETWORK OR DDB |
-------------------------------------------------------------------
    - Check the corresponding Q and I times on the DDB from the UI if it is being flagged as high.
    - For slow DDB Q and I , increase IDX and DAT file memory within DDB to 50% of the physical memory( total not to exceed 50% of physical system memory).
    - For slowness due to network , Move signature process to the MA and observe throughput.
    - Add a pruning Operation window on the MediaAgent to disable prune during the peak backup window and hence improve Q and I performance on the DDB.

DOCUMENTATION
-------------
http://documentation.commvault.com/commvault/v11/article?p=6614.htm

---------------------------------------------------------------
| NETWORK TRANSFER IS SLOW BETWEEN THE SOURCE AND DESTINATION |
---------------------------------------------------------------
    - Increase nNumPipelinebuffers to a higher value by referring to documentation. Suggested values are 120, 300 for 1Gb link and 600 for 10Gb link.
    - Run CvNetworkTestTool between source and destination to determine the network throughput.

 

DOCUMENTATION
-------------
http://documentation.commvault.com/commvault/v11/article?p=8600.htm
http://documentation.commvault.com/commvault/v11/article?p=7598.htm

CONSIDERATION(S)
----------------
Increasing pipeline buffers will increase the memory consumption on the MA per active stream. So increase the value gradually.

---------------------------------------------------------------
| NETWORK TRANSFER IS SLOW BETWEEN THE SOURCE AND DESTINATION |
---------------------------------------------------------------
    - Increase nNumPipelinebuffers to a higher value by referring to documentation. Suggested values are 120, 300 for 1Gb link and 600 for 10Gb link.
    - Run CvNetworkTestTool between source and destination to determine the network throughput.

DOCUMENTATION
-------------
http://documentation.commvault.com/commvault/v11/article?p=8600.htm

http://documentation.commvault.com/commvault/v11/article?p=7598.htm

 

CONSIDERATION(S)
----------------
Increasing pipeline buffers will increase the memory consumption on the MA per active stream. So increase the value gradually.