Our weekly secondary AuxCopy is stuck at 30% since this weekend (so that it is blocking all the primary disk to disk incremental copies), with the below 2 error messages
Thinking it might be some ports communication issue between the Media Server (S01190) where the tape library is attached to, and the CommCell Server (S02116), so I did the below ports check between the 2 Servers:
Telnet from the Media Server (S01190) to the CommCell Server (S02116)
Port 8400 OK
Port 8401 OK
Port 8403 OK
Telnet from the CommCell Server (S02116) to the Media Server (S01190)
Port 8400 OK
Port 8401 Not OK
Port 8403 Not OK
Now, before I speak to our Network/Security administrator who have recently installed SentinelOne AV on both of the above 2 Servers, I’m wondering if I’m heading the right direction, and if I have done all the ports checking ?
Best answer by KelvinView original
@Kelvin ! BEfore you even said it, I was going to ask if anything is on that server that might be blocking ports/services.
See if the Security team can exclude our services and directory (including the mount paths) from their scans. AV is notorious for stopping and/or slowing things down on our side.
You can also try a Check Readiness (right-click the server/client/MA and click Check Readiness:
There’s a bunch of operations and options (and it’s useful for future issues)!
All is good now.
We did 2 things. First, we switched off the AV on both the CommServe and the Media Agent, then rebooted them. After that, the AuxCopy job has been running since.
I re-ran the connectivity test too, which also came back OK. That being said, after I realised that I didn’t do the test correctly last time, so it could be the case that the communication between the 2 Servers was not the issue.
However, every now and then, we still get the same error message “Error occurred while processing chunk [xxxxxx] in media [xxxxxx], at the time of error in library [xxxxx]”, but it would always be auto-cleared up before long, and then, the job would be auto-resumed.
There could a numerous reasons for this error, just to name but a few
Overall, it’s an outdated backup solution that needs to be overhauled in the near future. But until then, we need to keep it going and put up with all the issues.
I did log a case via the support line but the case isn’t visible in my portal, nor have I received any call-back or email since.
Giving that the CommCell ID isn’t in my portal yet, I assume the case is still lined up in a queue waiting to be approved by the renewal team,
But I’m not here to complain though.
Because, to be honest, I’m happy enough that I could get most of my support through this forum which, to some extent, could give me quicker and more informative advice in helping me to troubleshoot our issues. I couldn’t ask for more.
The firewall port would be specified in the client’s firewall config ()and we could use CVPing to test that), however, looking at the error, the connection is flat out refused.
Dollars to donuts, it’s the AV tool preventing our service from acting.
Ask, and you shall receive!
I’ll do a reboot for both the Media Agent and the tape library tomorrow - if setting Commvault exclusion doesn’t fix the problem.
It looks like the URL is being blocked from the CommServe… I’ll check it with our FW administrator tomorrow. Cheers.
OK, I’ll speak to our Network Security administrator about that. In the meanwhile, could I ask you to give me a link which shows how to set Commvault exclusion in AV ?
Very much appreciated,
Thanks ! I’ve passed the message to my colleague who will look into this tomorrow.
However, there is another thing though, i.e. whereas the AuxCopy is failing, its Primary Copy is running fine on this same S01190 Media Serve (see below) - how come is this possible ?
That definitely puts a potential spin on it….
Let me ask a few questions:
The Aux Copy, where is the data going from and to? Meaning which 2 Media Agents?
The backups are all running to which of the Media Agents?
I see 2 machines referenced and want to be sure I know which is which:
It is possible that different executables are being blocked, but that’s unlikely a cause….it’s all the same directory.
I’m going to loop in a colleague and see if they can see if I’m missing anything. Still confident it is the AV, but want the whole picture to make sense
Spoke to a friend who confirmed, if you’re doing LAN free backups for those vms, then the proxy just goes right to the storage. Aux Copies, on the other hand have to do the whole connection and transfer, and the AV is likely preventing any of that (it’s an EXTREMELY common issue we encounter).
Both the Auxiliary Copy and Primary Copy come frome the same Storage Policy “Backup-LeLude”, and it is the weekly AuxCopy (highlighted below) that is failing with the communication error message, whereas its Primary Copy is running OK as we speak.
The above Storage Policy is set to backup all the VMs managed by the vCentre Server “S01145.neopost.grp”, which happens on the Media Agent S01190 that is physically located on the same site as the vCentre Server “S01145.neopost.grp”.
S02116 is the CommServe at another physical location which doesn’t have a VSA for vCentre backup, whereas the S01190 is a Media Server with VSA installed.
And you are right, since the vCentre and the Media Agent S01190 are physically located at the same site, so it was configured in this way which have both disk library (for Primary Copy) and tape library (for AuxCopy) attached to this Media Agent S01190, so as to confine the backup data flow within the same site without needing to travel a long way to the CommServe S02116 on another physical site.
This Media Server is FC connected to both the tape library and a SAN, which is where it’s got all the LUNs that are locally mounted on it to serve as disk libraries for the Primary Copy, in which case, SAN is the default transport mode for both of them.
Therefore, the data from the AuxCopy must be from its disk library E which is locally mounted to the Media Server S01190 (see below)
Same idea should go for the Primary Copy as well, i.e. data flow is contained locally between the Media Server S01190 and the vCentre Server S01145.
In other words, the CommSever S02116 should not be involved in the actual backup data transfer in either the Primary Copy or the AuxCopy, apart from the controlling data, perhaps…
Are you saying that, even if the AuxCopy is a LAN free backup, it still needs full communication ports open between the Media Agent and the CommServe, whereas a LAN free Primary Copy doesn’t even need any controlling and commanding data from the CommServe ?
AuxCopyMgr is on the Commserve so there is involvement, though it sounds like the Aux Copy is reading and writing to the same Media Agent? This may be something different (though definitely send the AV exclusion guide as that is known to cause all sorts of issues, and I can’t rule this one out just yet).
Can you take a look at AuxCopyMgr.log and CVD.log and share what they show at 17:37:22? Check CVMA.log and any Aux Copy related logs on the Media agent as well.
I’ll get some of my colleagues to chime.
The below is the message from AuxCopyMgr.log from the CommServe S02116
7684 2c80 07/07 17:37:22 1159504 processReceivedMessage Received FAIL message from remote AuxCopy binary for readerId . MA [s01190]. Type 
7684 2c80 07/07 17:37:22 1159504 AuxCopyManager::updateProgressToJM <Copy/Stream> Source <6/1> Target <11/1>: Application Size, Stream Throughput parameters:  bytes read in  seconds
7684 2c80 07/07 17:37:22 1159504 AuxCopyManager::handleFailReport <Copy/Stream> Source <6/1> Target <11/1>: AuxCopy binary on media agent [S01190.neopost.grp] encountered error  MM error  when sending chunk  to media agent [S01190.neopost.grp]: [Failed to write the data to the pipeline. ]
7684 2c80 07/07 17:37:22 1159504 AuxCopyManager::handleFailReport <Copy/Stream> Source <6/1> Target <11/1>: Partially Copied Archive File Info:Copy  CommCellId  AFID  Physical Size 
7684 2c80 07/07 17:37:22 1159504 AuxCopyManager::handleFailReport <Copy/Stream> Source <6/1> Target <11/1>: Setting jobstatus to FAIL and release resources - got error code CVA_DESTINATION_MA_ERROR and MM error 
7684 2c80 07/07 17:37:22 1159504 AuxCopyManager::sendFreeStreamRequest FREE STREAM Request for readerId  has been sent to media agent [s01190]
7684 2c80 07/07 17:37:22 1159504 AuxCopyManager::tryToReserveStreams No reservations were tried in this invocation.
7684 2c80 07/07 17:37:22 1159504 AuxCopyManager::run_innerLoop tryToSendCopyRequests() returned No-More-Chunk
7684 2c80 07/07 17:37:22 1159504 AuxCopyManager::sendStopRequest Ask remote AuxCopy binary to stop.
The below is from the CVD.log on the CommServe S02116
3856 1244 07/07 17:37:22 ### checkEventSocket() - setupConnection to EvMgrS...
3856 1244 07/07 17:37:22 ### checkEventSocket() - Socket : is eventSocket
The below is from the AuxCopy.log on the Media Agent S01190
10172 2884 07/07 19:36:26 1159504 Reader  <Copy/Stream> Source <6/1> Target <11/1>: Reporting PROGRESS to AuxcpyMgr, Err [0/0]. Chnk , bytes copied 
10172 2610 07/07 19:37:26 1159504 Sent alive request to AuxcpyMgr
10172 2610 07/07 19:37:26 1159504 Received AuxCopy alive confirmation response
Do the above 2 underlined messages indicate the communication between the 2 Servers was OK ?
CVMA.log is 0 sized on the Media Agent S01190
When you do, please share the case number so I can track it accordingly.
ps Yeah, looks like there is a connection made there in your underlined lines….you can see the chunk errors, though these could be from the library as well…..definitely so many possibilities.
Opening a case is exactly part of the problem because our support expired on the 1st of April, and we are still in the process of renewing it LOL
I sent you a pm about that.
In the meantime, see if the tape library itself has any issues. Definitely a potential/likely cause.
Just tried to set up “the upload to cloud” but having the below error
Assuming the username and password are OK, what else could go wrong ?
For example, do I need to wait until the renew of the CommCell is finished ?
The below is the error message found in the log file CVCloudService
4824 bb4 07/20 15:50:59 ### LIBCURL::CvInternetDirectConnection::setSecured() - CURL certificate bundle path [D:\Program Files\Commvault\ContentStore\Base\curl-ca-bundle.crt]
4824 bb4 07/20 15:51:01 ### CVCloudService::init() - Failed to create DR backup folder. Error Message [U]. CommcellGUID [ ]
4824 47c 07/20 15:51:25 ### LIBCURL::CvInternetDirectConnection::setSecured() - CURL certificate bundle path [D:\Program Files\Commvault\ContentStore\Base\curl-ca-bundle.crt]
4824 47c 07/20 15:51:26 ### CVCloudService::init() - Failed to create DR backup folder. Error Message [U]. CommcellGUID [ðÅ]
4824 434 07/20 15:51:55 ### LIBCURL::CvInternetDirectConnection::setSecured() - CURL certificate bundle path [D:\Program Files\Commvault\ContentStore\Base\curl-ca-bundle.crt]
4824 434 07/20 15:51:56 ### CVCloudService::init() - Failed to create DR backup folder. Error Message [U]. CommcellGUID [ðçÄ]
Can you please confirm that you have registered your Commserve as described in the following documentation?
After some fiddling around, I’ve finally registered the CommCell (FAEE1) in the Cloud Poral, thanks to your links (see below)
But, within the CommServe Console, I still get the same error message while setting up the “Upload backup metadata to Commvault Cloud” which has the same error message in the logfile as below
4824 13d4 07/20 17:49:07 ### LIBCURL::CvInternetDirectConnection::setSecured() - CURL certificate bundle path [D:\Program Files\Commvault\ContentStore\Base\curl-ca-bundle.crt]
4824 13d4 07/20 17:49:08 ### CVCloudService::init() - Failed to create DR backup folder. Error Message [U]. CommcellGUID [0¹Ÿ]
What else could go wrong ?
Was that the same job that was running prior to registering? If not please kill the job. Before starting a new job try disabling the setting in the control panel → DR backup → Press ok then go back in an enable it. If it continues to fail I would suggest you get a ticket opened to review further.
Check if you can access https://cvdrbackup1.blob.core.windows.net from a web browser on the CommServe - and that its not being blocked. It will come up with a bogus XML error but that is normal, we just want to see if it is accessible.
If you have a proxy server set it might be trying to use that and may need to be configured in IE proxy settings.
It looks like the URL is being blocked from the CommServe… I’ll check it with our FW administrator tomorrow. Cheers.
Ok see how you go. I think that on I.E error 400 may be OK - it means the request is reaching the server. I checked on chrome on my side so it could be a little different.
@Kelvin , how are things looking on this? Curious to see what your Network Administrator said.
Since we disabled the SentinelOne on both of the Servers, the AuxCopy has been working fine, finger crossed though...
BTW, we have finally renewed our support for the CommCell faee1, however, there is a problem…
Because the CommCell is registered under the name of our sub-company in France, so it doesn’t show up in my usual support portal which is registered under the name of our US company.
At the moment, the only person who can access this faee1 support portal (by using his own email address) is my colleague in France, who is, by the way, handing over his backup job to me, therefore, I’m sure, before long, he won’t be keen to log any more new cases...
So, what do I need to do to have the access to this faee1 support portal ? Or, is there a way to move this faee1 to my own support portal ?