Hello community,
Please help me solve the following problem.
I am using Commvault (11.24.25) to back up SAP HANA databases.
The architecture of the backup system is as follows:
CommServe - VMware virtual machine connected to network 192.168.22.0/24
simpana-ma01 - physical server Windows Storage Server 2016 Standard, connected to network 192.168.22.0/24
simpana-ma02 - physical server Windows Storage Server 2012 R2 Standard, connected to network 192.168.22.0/24
clients - VMware virtual machines with SLES for SAP 15 SP3 operating system, on which SAP HANA 2.0.054 is installed, are connected to the network 192.168.22.0/24
Of the features:
- use dedicated subnet for backup 192.168.22.0/24
- virtual machines are connected to vSwitch, which has 4 physical 10G interfaces. The balancing mode is Route based on IP hash.
- media agents have two 10G network interfaces, which are combined into NIC Teaming. Teaming Mode - LACP, Load balancing mode - Dynamic
- no firewalls between clients and media agents are used
Problem:
Periodically, backup jobs for SAP HANA transaction logs are completed after a little over 2 minutes after being started with an error:
"Unable to communicate with the remote machine [simpana-ma02] to start the Data Pipe. Please check the network connectivity between the local machine and the remote machine and verify this product's Communications Service is running on the remote machine, Error [Connect to 192.168.22.26:8405 failed: Connection timed out]."
This error occurs with all media agents. The following error is in the client logs:
"9674 25e4 03/09 11:51:09 871631 ERROR: CvFwClient::connect(): Connect to 192.168.22.26:8405 failed: Connection timed out
9674 25e4 03/09 11:51:09 871631 CPipelayer::connectToDest Failed to connect to simpana-ma02(simpana-ma02):8405/8405: Connect to 192.168.22.26:8405 failed: Connection timed out
9674 25e4 03/09 11:51:09 871631 CPipelayer::InitiatePipeline Cannot connect to the CVD port on machine [simpana-ma02]:[8405]
9674 25e4 03/09 11:51:09 871631 CCVAPipelayer::StartPipeline() - Failed to initiate pipeline
9674 25e4 03/09 11:51:09 871631 CVArchive::StartPipeline() - Startup of DataPipe failed
9674 25e4 03/09 11:51:09 871631 ClDBControlAgent::OnMsgInitPipe() - Setup pipeline failed
9674 25e4 03/09 11:51:09 871631 ClDBControlAgent::OnMsgInitPipe() - sending response FAIL to agent process
9674 25e4 03/09 11:51:09 871631 ClDBControlAgent::OnMsgInitPipe() - INITPIPE_RESP sent"
The task is abnormally closed, a new one is opened, which is successfully completed. This issue occurs on all clients, randomly and with both SystemDB and TenantDB.
And exactly 2 minutes after the start of job. Monitoring does not reveal any errors. Please advise what can be checked and how to diagnose this problem.