Skip to main content
Solved

VSA Application Aware MS SQL Backup stuck at 43%


Forum|alt.badge.img+5

Has anyone recently upgraded to MR13 (11.26.13) and using Application aware VM backups for SQL encountered problems with the VSAAppAwareBackupWorkflow job getting stuck at the "Waiting for backups" task?


- This appears to be an intermittent problem.
It fails three out of five times.

- There is no connectivity problem.

- There is no mismatch between the client and host names.

- The credentials used to instal application plug-ins and backup virtual machines and application data are the same.

-Repaired the CV updates on the SQL clients (after which there was a successful backup and the one after that is back at hung state).

- VSA jobs for VMDKs appear to be fine, and it appears that only client-based agent backups are affected.


There are no crashes or error messages printed in the logs, so it appears to be hung. 

 

I have raised a Commvault support ticket a few days ago so is actively worked on, I posted on here in case anyone might have had a similar issue and could possibly advise the fix at the time.

 

SQLiDA.log
15400 3c04  03/12 15:52:09 ###### SdtTailSrvPool::Rel: Resetting SrvPool as ref. count is 0.
15400 3c04  03/12 15:52:09 284019 CvReplicationEngineVSAAppAware::CvRepReplicateSources() - Started CvRepReplicateSources.
15400 3c04  03/12 15:52:09 284019 CvReplicationEngineVSAAppAware::CreateVolGuidsConfFile() - File [E:\cvVolume.conf] Contents [\\?\Volume{a3d58720-0000-0000-0000-100000000000}\]
15400 3c04  03/12 15:52:09 284019 CvReplicationEngineVSAAppAware::CreateVolGuidsConfFile() - File [F:\cvVolume.conf] Contents [\\?\Volume{a3d58721-0000-0000-0000-100000000000}\]
15400 3c04  03/12 15:52:09 284019 CvReplicationEngineVSAAppAware::CvRepReplicateSources() - ==Ready for VSA backup of this VM ==
15400 3c04  03/12 15:52:09 284019 CCvNetworkServer::CCvNetworkServer() - CvNetworkServer is using Asynchronous IO model
15400 3c04  03/12 15:52:09 284019 CvReplicationEngineVSAAppAware::StartSessionServer() - Starting listening as service :[VSAAppAwareIdaSrvc_284019]
15400 3c04  03/12 15:52:09 284019 CCvIOCPServer::Start() - Launching 4 IOCP worker threads. To override, use session/SQLiDA_IOCPThreads DWORD registry entry
15400 3c04  03/12 15:52:09 284019 Thread pool monitor interval [300] seconds, Refresh interval [60] seconds
15400 3c04  03/12 15:52:09 284019 CCvNetworkServer::SelectData() - No port is specifed by user. Check if saved port can be reused
15400 3c04  03/12 15:52:09 284019 CCvNetworkServer::SelectData() - Listening on port 55195/0
15400 3358  03/12 15:57:09 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [4], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 3358  03/12 16:02:09 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 3358  03/12 16:07:09 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s].
.
.
.
.
15400 3358  03/12 21:32:10 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 3358  03/12 21:37:10 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 3358  03/12 21:42:10 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 3358  03/12 21:47:10 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 3358  03/12 21:52:10 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 3358  03/12 21:57:10 284019 TPool [IOCPServerPool_VSAAppAwareIdaSrvc_284019]. Ser# [1] Tot [4], Pend [4], Comp [0], Max Par [0], Avg Work Time [0.00 s], Avg Wait Time [0.00 s]
15400 14    03/12 21:59:37 ### ### Logger:RefreshLogParams - Changing Debug level from 1 to 3
15400 37dc  03/12 22:00:11 284019 CvReplicationEngineVSAAppAware::GetJobStatus() - Job Response is -> <?xml version='1.0' encoding='utf-8' ?><JobManager_JobListResponse totalRecordsWithoutPaging="1"><jobs><jobSummary jobId="284017" commcellId="2" destClientName="***********" jobType="VSAAppAwareBackupWorkFlow" opType="90" status="Running" currentPhaseName="Wait for backups" percentComplete="43" pendingReason="<Output&gt;&#x0D;&#x0D;&lt;br&gt;Source: ***********, Process: Workflow&lt;br&gt;&lt;br&gt;Waiting on jobs [284019].&lt;br&gt;Source: **********, Process: Workflow" jobStartTime="1647060660" jobElapsedTime="22151" appTypeName="" backupSetName="" subclientName="" statusColor="black" lastUpdateTime="1647060879" isVisible="1" isPreemptable="1" jobEndTime="0" sizeOfApplication="0" sizeOfMediaOnDisk="0" percentSavings="0.000000000000000e+000" totalNumOfFiles="0" totalFailedFiles="0" totalFailedFolders="0" backupLevel="0" jobSubmitErrorCode="0" isAged="0" pendingReasonErrorCode="19:857" localizedOperationName="VSAAppAwareBackupWorkFlow" localizedStatus="Running" errorType="0" alertColorLevel="0" vsaParentJobID="0" currentPhase="570203" attemptStartTime="1647060879" distributedAppsClusterType="0" jobAttributesEx="0"><subclient commCellName="***********" clientId="2" clientName="*********" applicationId="0" appName="" instanceId="0" instanceName="" backupsetId="0" backupsetName="" subclientId="0" subclientName=""/><destinationClient clientName="*********" displayName="*******" clientId="2"/><userName userName="admin" userId="1"/><storagePolicy storagePolicyName="" storagePolicyId="1"/><logStoragePolicy storagePolicyId="0"/><dataSource dataSourceId="0"/><clientGroups clientGroupId="1" clientGroupName="Infrastructure"/><clientGroups clientGroupId="3" clientGroupName="Media Agents"/><clientGroups clientGroupId="7" clientGroupName="******"/></jobSummary></jobs></JobManager_JobListResponse>

 

Workflowcustom.log

1632 711 03/11 13:32:44 283759 VSAAppAwareBackupWorkFlow: [Get Status of VSA Archive Index phase (Script_1)] Executing Query select state from JMJobInfo where jobId =283758 and state in (2,3,5,10,11)
1632 711 03/11 13:33:44 283759 VSAAppAwareBackupWorkFlow: ERROR: [Get Status of VSA Archive Index phase (Script_1)] VSA Backup job [283758] completed. Now lets check its status.
1632 711 03/11 13:33:44 283759 VSAAppAwareBackupWorkFlow: [Get Status of VSA Archive Index phase (Script_1)] Could not find JMBkpStats row for backup job [283758]
1632 711 03/11 13:33:44 283759 VSAAppAwareBackupWorkFlow: [Get Status of VSA Archive Index phase (Script_1)] Could not find JMBkpAttemptInfo row for backup job [283758]
1632 711 03/11 13:33:44 283759 VSAAppAwareBackupWorkFlow: [Get Status of VSA Archive Index phase (Script_1)] Executing Query select state from JMJobInfo where jobId =283758 and state in (2,3,5,10,11)
1632 711 03/11 13:33:44 283759 VSAAppAwareBackupWorkFlow: [Get Status of VSA Archive Index phase (Script_1)] VSA Parent Job is in Pending/Suspend/Kill state,  Proceed to Executing Query SELECT TOP 1 attrVal from App_VMProp where jobid = 283758 and attrName='vmStatus' and VMclientId=82
1632 711 03/11 13:34:44 283759 VSAAppAwareBackupWorkFlow: ERROR: [Get Status of VSA Archive Index phase (Script_1)] VSA Backup job [283758] completed. Now lets check its status.
1632 711 03/11 13:34:44 283759 VSAAppAwareBackupWorkFlow: [Get Status of VSA Archive Index phase (Script_1)] Could not find JMBkpStats row for backup job [283758]
1632 711 03/11 13:34:44 283759 VSAAppAwareBackupWorkFlow: ERROR: [Get Status of VSA Archive Index phase (Script_1)] Archive index for VSA Backup job [283758] completed.
1632 823 03/11 13:34:44 283759 VSAAppAwareBackupWorkFlow: [GetComma separated jobids (Script_24)] 283761,144
 

Any help is appreciated!

Best answer by Nikramak

@Orazan and @Damian Andre , thank you for your responses. 

AppAware backups were apparently impacted by a bug in the most recent CommVault maintenance releases SP25 (SP25.23) and SP26 (SP26.13). 
The AppAware installation fails in the Workflow due to a user privilege error, causing the job to stall. 

In order to address this issue, the Commvault development team issued a hotfix patch. 
I installed the fix on the CommServe computer and started new AppAware jobs for SQL servers, which all completed successfully. 

The fix will be available in SP25.25 or SP26.15 releases.

View original
Did this answer your question?
If you have a question or comment, please create a topic

4 replies

Damian Andre
Vaulter
Forum|alt.badge.img+23

Hey @Nikramak,

I had a quick look over support cases received and there does not seem to be a pattern or influx of folks having the same issue. I saw a couple related from a while back that ended up being network connectivity and another related to dual instances of an agent install - both do not seem to match what you are seeing here.

The logs provided don’t seem to easily identify the issue, but do appreciate you posting those as they can usually help immensely. In situations like these to remove the criticality I’d usually recommend to reinstall the client and go back to the previous maintenance release, but its not clear where the break is here (iDA, VSA or CS workflow).

I think in this case we’ll need to monitor your case and see how it goes, and then post the solution up once we figure it out, unless somebody else on here is in a similar predicament! 


Forum|alt.badge.img+15
  • Vaulter
  • 630 replies
  • March 14, 2022

@Nikramak  There is not enough information in the log snippets to determine the cause of the issue.   VSA Application Aware have many “moving parts”.  It is a job that relies heavily on indexing, pseudo snapshots on the client as well as workflows to control it all.  In order to properly diagnose an App Aware job, the full set of logs from all of the clients, proxies, MediaAgents involved would need to be reviewed. 

 

Once you create a ticket with the support team (or they have provided a solution), please update this thread for future reference.


Forum|alt.badge.img+5
  • Author
  • Byte
  • 11 replies
  • Answer
  • March 14, 2022

@Orazan and @Damian Andre , thank you for your responses. 

AppAware backups were apparently impacted by a bug in the most recent CommVault maintenance releases SP25 (SP25.23) and SP26 (SP26.13). 
The AppAware installation fails in the Workflow due to a user privilege error, causing the job to stall. 

In order to address this issue, the Commvault development team issued a hotfix patch. 
I installed the fix on the CommServe computer and started new AppAware jobs for SQL servers, which all completed successfully. 

The fix will be available in SP25.25 or SP26.15 releases.


Mike Struening
Vaulter
Forum|alt.badge.img+23

@Nikramak , thanks for the very thorough update!


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings