SAP HANA log backup error. Unable to communicate with the MediaAgent to start the Data Pipe
Hello community, Please help me solve the following problem. I am using Commvault (11.24.25) to back up SAP HANA databases. The architecture of the backup system is as follows: CommServe - VMware virtual machine connected to network 192.168.22.0/24 simpana-ma01 - physical server Windows Storage Server 2016 Standard, connected to network 192.168.22.0/24 simpana-ma02 - physical server Windows Storage Server 2012 R2 Standard, connected to network 192.168.22.0/24 clients - VMware virtual machines with SLES for SAP 15 SP3 operating system, on which SAP HANA 2.0.054 is installed, are connected to the network 192.168.22.0/24 Of the features: - use dedicated subnet for backup 192.168.22.0/24 - virtual machines are connected to vSwitch, which has 4 physical 10G interfaces. The balancing mode is Route based on IP hash. - media agents have two 10G network interfaces, which are combined into NIC Teaming. Teaming Mode - LACP, Load balancing mode - Dynamic - no firewalls between clients and media agents are used Problem: Periodically, backup jobs for SAP HANA transaction logs are completed after a little over 2 minutes after being started with an error: "Unable to communicate with the remote machine [simpana-ma02] to start the Data Pipe. Please check the network connectivity between the local machine and the remote machine and verify this product's Communications Service is running on the remote machine, Error [Connect to 192.168.22.26:8405 failed: Connection timed out]." This error occurs with all media agents. The following error is in the client logs: "9674 25e4 03/09 11:51:09 871631 ERROR: CvFwClient::connect(): Connect to 192.168.22.26:8405 failed: Connection timed out 9674 25e4 03/09 11:51:09 871631 CPipelayer::connectToDest Failed to connect to simpana-ma02(simpana-ma02):8405/8405: Connect to 192.168.22.26:8405 failed: Connection timed out 9674 25e4 03/09 11:51:09 871631 CPipelayer::InitiatePipeline Cannot connect to the CVD port on machine [simpana-ma02]:[8405] 9674 25e4 03/09 11:51:09 871631 CCVAPipelayer::StartPipeline() - Failed to initiate pipeline 9674 25e4 03/09 11:51:09 871631 CVArchive::StartPipeline() - Startup of DataPipe failed 9674 25e4 03/09 11:51:09 871631 ClDBControlAgent::OnMsgInitPipe() - Setup pipeline failed 9674 25e4 03/09 11:51:09 871631 ClDBControlAgent::OnMsgInitPipe() - sending response FAIL to agent process 9674 25e4 03/09 11:51:09 871631 ClDBControlAgent::OnMsgInitPipe() - INITPIPE_RESP sent"
The task is abnormally closed, a new one is opened, which is successfully completed. This issue occurs on all clients, randomly and with both SystemDB and TenantDB. And exactly 2 minutes after the start of job. Monitoring does not reveal any errors. Please advise what can be checked and how to diagnose this problem.
Page 1 / 1
Hey @Roman Kalyadin,
Thanks for the detailed information!
Can you try the following changes.
#1 - On the network properties of the media agent, exclude 8400 and 8403 as an additional port. Reduce the range from something like 8404-8424 - In fact, with any network topology / network config additional ports are generally not used, but adding the CVD port of 8400 as an additional port allows data transfer traffic to bypass the firewall (and inherently any throttling or encryption), so lets remove that variable.
#2 - On the network properties of the media agent, in the incoming connections tab set the HANA group to “Blocked”.
This way, The HANA client(s) will always initiate maintain an active (persistent) network connection towards the Media Agent. If there is a drop, it will automatically attempt a reconnect which can help insulate you from strange network conditions that could be causing disconnects or failure to make the initial connection.
Thanks for the answer. I made the suggested changes, but the error still appears.
An excerpt from the log:
25588 63f9 03/10 16:11:34 872316 ERROR: CvFwClient::connect(): Connect to 192.168.22.26:8405 failed: Connection timed out 25588 63f9 03/10 16:11:34 872316 CPipelayer::connectToDest Failed to connect to simpana-ma02(simpana-ma02):8405/8405: Connect to 192.168.22.26:8405 failed: Connection timed out 25588 63f9 03/10 16:11:34 872316 CPipelayer::InitiatePipeline Cannot connect to the CVD port on machine simpana-ma02]::8405] 25588 63f9 03/10 16:11:34 872316 CCVAPipelayer::StartPipeline() - Failed to initiate pipeline 25588 63f9 03/10 16:11:34 872316 CVArchive::StartPipeline() - Startup of DataPipe failed 25588 63f9 03/10 16:11:34 872316 ClDBControlAgent::OnMsgInitPipe() - Setup pipeline failed 25588 63f9 03/10 16:11:34 872316 ClDBControlAgent::OnMsgInitPipe() - sending response FAIL to agent process
Any other ideas? I put up a test bench and tried to reproduce the problem, but it didn't work for me.
Can a large number of SAP HANA clients affect this? I have over 30 of them, each with at least SystemDB and one or more TenantDB.
@Roman Kalyadin , one thing you could try on your end first is to telnet from the client to the MA on port 8405 (the port where it timed out) and see if you can connect and stay connected for an extended period. It very well could be an issue with the network and this would go a long way towards giving you that proof.
Have you shared this issue with your network team?
If it connects fine and remains there, I would open a support case and share the incident number here so I can track it.
Very interesting its trying to connect on port 8405 - is the Media Agent multi-instanced? 8405 is used as a CVD port if 8400 is in use on the local machine.
Just in case, you may want to remove all additional ports as a test - can you confirm on the ‘control’ tab of the properties of the Media Agent that “Optimize for concurrent LAN backups” is enabled or disabled?
@Roman Kalyadin , Can you run the following command on the mediaagent(simpana-ma02) and share the output please?
netstat -aof | findstr :8405
netstat -aof | findstr :8400
@Roman Kalyadin , one thing you could try on your end first is to telnet from the client to the MA on port 8405 (the port where it timed out) and see if you can connect and stay connected for an extended period. It very well could be an issue with the network and this would go a long way towards giving you that proof.
Have you shared this issue with your network team?
If it connects fine and remains there, I would open a support case and share the incident number here so I can track it.
I successfully connect to media agents via telnet, the connection is not interrupted.
Very interesting its trying to connect on port 8405 - is the Media Agent multi-instanced? 8405 is used as a CVD port if 8400 is in use on the local machine.
Just in case, you may want to remove all additional ports as a test - can you confirm on the ‘control’ tab of the properties of the Media Agent that “Optimize for concurrent LAN backups” is enabled or disabled?
simpana-ma02 - it’s a media agent with multi instance enabled. Previously, CommServe was installed here, which was moved to a separate virtual machine. The problem manifested itself even before it was done. Also, the problem manifests itself with another media agent simpana-ma01, on which multi-instance is not used.
The "Optimize for concurrent LAN backups" option is enabled on both media agents.
@Roman Kalyadin , Can you run the following command on the mediaagent(simpana-ma02) and share the output please?
netstat -aof | findstr :8405
netstat -aof | findstr :8400
The output of the command "netstat -aof | findstr :8405"
TCP 0.0.0.0:8405 0.0.0.0:0 LISTENING 2172 TCP 192.168.22.26:8405 192.168.22.28:51512 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.28:51522 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.28:51524 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.28:56488 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.28:56492 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.28:58864 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.29:19660 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.29:19662 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.29:24656 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.29:24658 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.36:56012 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.36:56022 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.36:61088 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.36:61090 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.40:56112 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.40:56122 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.40:56142 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.40:56144 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.47:29688 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.47:29690 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.47:64710 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.47:64712 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.48:19440 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.48:51512 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.48:56694 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.48:56696 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.48:56712 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.48:56722 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.69:10986 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.69:10988 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.69:17858 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.69:17860 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.70:40404 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.70:40406 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.70:51512 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.70:51522 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.71:21688 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.71:21690 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.71:26926 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.71:26928 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.71:26930 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.71:26932 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.71:27740 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.71:27742 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.76:62090 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.76:62092 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.76:62840 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.76:62842 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.87:22896 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.87:22898 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.87:24686 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.87:24688 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.90:51512 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.90:51522 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.90:51524 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.90:51526 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.92:52328 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.92:52332 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.92:53942 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.92:53944 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.99:40404 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.99:40406 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.99:44264 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.99:44266 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.100:27662 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.100:27664 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.100:27666 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.100:27668 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.100:42398 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.100:42404 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.100:52636 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.100:52638 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.104:51512 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.104:51522 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.104:61642 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.104:61644 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.111:40404 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.111:40412 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.111:61354 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.111:61356 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.113:13624 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.113:13626 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.113:51512 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.113:51522 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.114:53412 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.114:53422 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.114:61432 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.114:61434 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.115:40404 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.115:40406 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.115:41970 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.115:41972 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.135:51512 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.135:51522 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.135:51524 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.135:51526 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.135:51528 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.135:51532 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.135:51534 ESTABLISHED 2172 TCP 192.168.22.26:8405 192.168.22.135:51538 ESTABLISHED 2172
The output of the "netstat -aof | findstr :8400" command is empty on simpana-ma02
@Mahender Reddy I took a network dump with WireShark on the last error. Clearly there is some kind of network problem. But I'm not sure if this is causing the error.
@Roman Kalyadin , were you able to review the Wireshark output with your network team?