Hi @djustis and thanks for the post!
That’s normally a communications related error, though we need more logging to see what the actual cause is in your case.
The full error should mention a process that is reporting this issue (clbackup, etc.).Â
Look for something like:Â Source: <servername>, Process: clBackupChild
We need to see what is in the process log file as well as CVD.log on the server reporting the error at that time frame which will give us the full context.
Can you take a look and copy those excerpts here?
Also, what kind of jobs are failing? Windows File System? VSA? An assortment?
Thanks!
Here’s one, its a MA trying to do a DDB backup. Cut from the clbackup log on the MA:
10400 2d7c  04/06 17:11:06 19468916 CPipelayer::SendPipelineBuffer() - Tail has reported error r94][The SDT data transfer was terminated on a request from the Job Manager.]. Cannot continue.
10400 2d7c  04/06 17:11:06 19468916 8PIPELAYER  ] Error in flushing the current buffer.
10400 2d7c  04/06 17:11:06 19468916 CVArchive::WriteBuffer() - Cannot send the buffer. Ret r268435460]
10400 2d7c  04/06 17:11:06 19468916 CFileBackup::WriteBuffer(1683) - writeBuffer failed
10400 2d7c  04/06 17:11:06 19468916 CFileBackup::HandleReadAndSendFileDataError(1497) - WriteBuffer failed
10400 2d7c  04/06 17:11:06 19468916 CBackupBase::DoBackup(3689) - ReadAndSendFileData indicates FAIL_BACKUP
Error description from the job is:
Error Code: r10:62] Description: Other end XXXX encountered failure in receiving data vThe SDT data transfer was terminated on a request from the Job Manager.] Source: onecvt200mbbdsm, Process: clBackup Â
Most of the time they pickup and run and often finish, very intermittent.
Â
CVD from the MA:
68 Â c50 Â 04/06 15:06:24 ######## 6JOBCTRL Â Â ] Successfully registered control process for Job p19468916:7:5:8:51234] of type :1].
3968  13e0  04/06 15:26:56 ######## 1CVD     ] Remote Command Request from remotehost = <::1>, RemoteClient = <onecvt200caa>, RemoteIP(Sock) = <::1>. Launched Process: <clBackup.exe -j 19468916 -a 2:2378 -t 1 -i 3 -d onecvt200maadsm*onecvt200maadsm*8400*8402 -io 1  -jt 19468916:7:6:8:51234  -idxma onecvt200maadsm*onecvt200maadsm*8400*8402  -OSInfo  -h  -w  -ot 1  -numstreams 1  -ab 0 -r 1617627612 -c 0 -appType 33 -slt -id f6520a39-cea7-4160-a887-8e9b2ab95d27 -cn onecvt200mbbdsm -vm Instance001>. Pid=396
3968 Â c50 Â 04/06 15:26:57 ######## JOBCTRL Â Â ] Successfully registered control process for Job e19468916:7:6:8:51234] of type o1].
3968  2934  04/06 16:26:08 ######## 6CVD     ] Remote Command Request from remotehost = <::1>, RemoteClient = <onecvt200caa>, RemoteIP(Sock) = <::1>. Launched Process: <clBackup.exe -j 19468916 -a 2:2378 -t 1 -i 3 -d ONECVT200MABDSM*onecvt200mabdsm*8400*8402 -io 1  -jt 19468916:7:7:10:51234  -idxma ONECVT200MABDSM*onecvt200mabdsm*8400*8402  -OSInfo  -h  -w  -ot 1  -numstreams 1  -ab 0 -r 1617627612 -c 0 -appType 33 -slt -id 680b3caa-e30d-4cbe-9d91-cf4991d2d05d -cn onecvt200mbbdsm -vm Instance001>. Pid=10400
3968 Â c50 Â 04/06 16:26:13 ######## =JOBCTRL Â Â ] Successfully registered control process for Job Â19468916:7:7:10:51234] of type r1].
3968 Â 8a0 Â 04/06 17:11:06 ######## JOBCTRL Â Â ] Got request to terminate job 19468916
3968 Â 8a0 Â 04/06 17:11:06 ######## eJOBCTRL Â Â ] Stopping All pipelines for Job 19468916
Â
Hmm - Try to increase the liveliness check to max - JM maybe be incorrectly detecting a stalled job and terminating it
https://documentation.commvault.com/commvault/v11_sp20/article?p=11022.htm
Â
LAN MediaAgent liveliness check interval in Minutes | Definition:Â Specifies the interval at which the LAN MediaAgent (MediaAgent and Client are not on the same computer) will execute a liveliness check. These intervals tend to be smaller, as frequent liveliness checks are needed for a network environment. Default Value:Â 30 Range:Â 2 to 1440 Usage:Â Liveliness checks are conducted to ensure necessary services are running and listening. Increasing the interval value may be recommended to minimize network traffic if you have a large number of LAN MediaAgents and you have other mechanisms in place to verify network and services availability. |
Yup.
You can always immediately drop it down but it's rare this gets triggered