Has anyone recently upgraded to MR13 (11.26.13) and using Application aware VM backups for SQL encountered problems with the VSAAppAwareBackupWorkflow job getting stuck at the "Waiting for backups" task?
- This appears to be an intermittent problem.
It fails three out of five times.
- There is no connectivity problem.
- There is no mismatch between the client and host names.
- The credentials used to instal application plug-ins and backup virtual machines and application data are the same.
-Repaired the CV updates on the SQL clients (after which there was a successful backup and the one after that is back at hung state).
- VSA jobs for VMDKs appear to be fine, and it appears that only client-based agent backups are affected.
There are no crashes or error messages printed in the logs, so it appears to be hung.
I have raised a Commvault support ticket a few days ago so is actively worked on, I posted on here in case anyone might have had a similar issue and could possibly advise the fix at the time.
SQLiDA.log
15400 3c04 03/12 15:52:09 ###### SdtTailSrvPool::Rel: Resetting SrvPool as ref. count is 0.
15400 3c04 03/12 15:52:09 284019 CvReplicationEngineVSAAppAware::CvRepReplicateSources() - Started CvRepReplicateSources.
15400 3c04 03/12 15:52:09 284019 CvReplicationEngineVSAAppAware::CreateVolGuidsConfFile() - File [E:\cvVolume.conf] Contents [\\?\Volume{a3d58720-0000-0000-0000-100000000000}\]
15400 3c04 03/12 15:52:09 284019 CvReplicationEngineVSAAppAware::CreateVolGuidsConfFile() - File [F:\cvVolume.conf] Contents [\\?\Volume{a3d58721-0000-0000-0000-100000000000}\]
15400 3c04 03/12 15:52:09 284019 CvReplicationEngineVSAAppAware::CvRepReplicateSources() - ==Ready for VSA backup of this VM ==
15400 3c04 03/12 15:52:09 284019 CCvNetworkServer::CCvNetworkServer() - CvNetworkServer is using Asynchronous IO model
15400 3c04 03/12 15:52:09 284019 CvReplicationEngineVSAAppAware::StartSessionServer() - Starting listening as service :[VSAAppAwareIdaSrvc_284019]
15400 3c04 03/12 15:52:09 284019 CCvIOCPServer::Start() - Launching 4 IOCP worker threads. To override, use session/SQLiDA_IOCPThreads DWORD registry entry
15400 3c04 03/12 15:52:09 284019 Thread pool monitor interval [300] seconds, Refresh interval [60] seconds
15400 3c04 03/12 15:52:09 284019 CCvNetworkServer::SelectData() - No port is specifed by user. Check if saved port can be reused
15400 3c04 03/12 15:52:09 284019 CCvNetworkServer::SelectData() - Listening on port 55195/0
15400 3358 03/12 15:57:09 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [4], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 3358 03/12 16:02:09 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 3358 03/12 16:07:09 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s].
.
.
.
.
15400 3358 03/12 21:32:10 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 3358 03/12 21:37:10 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 3358 03/12 21:42:10 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 3358 03/12 21:47:10 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 3358 03/12 21:52:10 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 3358 03/12 21:57:10 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 14 03/12 21:59:37 ### ### Logger:RefreshLogParams - Changing Debug level from 1 to 3
15400 37dc 03/12 22:00:11 284019 CvReplicationEngineVSAAppAware::GetJobStatus() - Job Response is -> <?xml version='1.0' encoding='utf-8' ?><JobManager_JobListResponse totalRecordsWithoutPaging="1"><jobs><jobSummary jobId="284017" commcellId="2" destClientName="***********" jobType="VSAAppAwareBackupWorkFlow" opType="90" status="Running" currentPhaseName="Wait for backups" percentComplete="43" pendingReason="<Output>

<br>Source: ***********, Process: Workflow<br><br>Waiting on jobs [284019].<br>Source: **********, Process: Workflow" jobStartTime="1647060660" jobElapsedTime="22151" appTypeName="" backupSetName="" subclientName="" statusColor="black" lastUpdateTime="1647060879" isVisible="1" isPreemptable="1" jobEndTime="0" sizeOfApplication="0" sizeOfMediaOnDisk="0" percentSavings="0.000000000000000e+000" totalNumOfFiles="0" totalFailedFiles="0" totalFailedFolders="0" backupLevel="0" jobSubmitErrorCode="0" isAged="0" pendingReasonErrorCode="19:857" localizedOperationName="VSAAppAwareBackupWorkFlow" localizedStatus="Running" errorType="0" alertColorLevel="0" vsaParentJobID="0" currentPhase="570203" attemptStartTime="1647060879" distributedAppsClusterType="0" jobAttributesEx="0"><subclient commCellName="***********" clientId="2" clientName="*********" applicationId="0" appName="" instanceId="0" instanceName="" backupsetId="0" backupsetName="" subclientId="0" subclientName=""/><destinationClient clientName="*********" displayName="*******" clientId="2"/><userName userName="admin" userId="1"/><storagePolicy storagePolicyName="" storagePolicyId="1"/><logStoragePolicy storagePolicyId="0"/><dataSource dataSourceId="0"/><clientGroups clientGroupId="1" clientGroupName="Infrastructure"/><clientGroups clientGroupId="3" clientGroupName="Media Agents"/><clientGroups clientGroupId="7" clientGroupName="******"/></jobSummary></jobs></JobManager_JobListResponse>
Workflowcustom.log
1632 711 03/11 13:32:44 283759 VSAAppAwareBackupWorkFlow: [Get Status of VSA Archive Index phase (Script_1)] Executing Query select state from JMJobInfo where jobId =283758 and state in (2,3,5,10,11)
1632 711 03/11 13:33:44 283759 VSAAppAwareBackupWorkFlow: ERROR: [Get Status of VSA Archive Index phase (Script_1)] VSA Backup job [283758] completed. Now lets check its status.
1632 711 03/11 13:33:44 283759 VSAAppAwareBackupWorkFlow: [Get Status of VSA Archive Index phase (Script_1)] Could not find JMBkpStats row for backup job [283758]
1632 711 03/11 13:33:44 283759 VSAAppAwareBackupWorkFlow: [Get Status of VSA Archive Index phase (Script_1)] Could not find JMBkpAttemptInfo row for backup job [283758]
1632 711 03/11 13:33:44 283759 VSAAppAwareBackupWorkFlow: [Get Status of VSA Archive Index phase (Script_1)] Executing Query select state from JMJobInfo where jobId =283758 and state in (2,3,5,10,11)
1632 711 03/11 13:33:44 283759 VSAAppAwareBackupWorkFlow: [Get Status of VSA Archive Index phase (Script_1)] VSA Parent Job is in Pending/Suspend/Kill state, Proceed to Executing Query SELECT TOP 1 attrVal from App_VMProp where jobid = 283758 and attrName='vmStatus' and VMclientId=82
1632 711 03/11 13:34:44 283759 VSAAppAwareBackupWorkFlow: ERROR: [Get Status of VSA Archive Index phase (Script_1)] VSA Backup job [283758] completed. Now lets check its status.
1632 711 03/11 13:34:44 283759 VSAAppAwareBackupWorkFlow: [Get Status of VSA Archive Index phase (Script_1)] Could not find JMBkpStats row for backup job [283758]
1632 711 03/11 13:34:44 283759 VSAAppAwareBackupWorkFlow: ERROR: [Get Status of VSA Archive Index phase (Script_1)] Archive index for VSA Backup job [283758] completed.
1632 823 03/11 13:34:44 283759 VSAAppAwareBackupWorkFlow: [GetComma separated jobids (Script_24)] 283761,144
Any help is appreciated!