Solved

VSA subclient long running jobs - kill individual VM job

  • 11 August 2021
  • 4 replies
  • 541 views

Badge +3

I have two long running VSA jobs, hypervisor is VMware ESXi v7u1. Both subclient jobs contain several VMs and each one has a lingering VM, meaning it’s been running for days and not showing any progress. I can kill those specific VM jobs but will the subclient show completed with errors or show as failed if I do? I’m not sure what the expected behavior is so I thought I’d reach out to the community to see if anyone else has had the same problem and do what I’m ready to try.

icon

Best answer by Damian Andre 11 August 2021, 17:20

View original

4 replies

Userlevel 1
Badge +2

Hi Bill! If you navigate to the VSA proxy machine, and open up VSBKP.log,  can you paste the most recent 15 lines?

Generally speaking, the usual suspects of holding up completion of the backup phase, are datastore read speeds, writes out to the Media Agent/library, and/or reading the “White noise” data of the virtual disk it’s processing. 

Badge +3

@japplin 

Here are the most 15 recent lines from the proxy machine:

23895 6170 08/11 10:31:24 3419604 stat- ID [readdisk], Bytes [1257955656121], Time [4762.657301] Sec(s), Average Speed [251.892993] MB/Sec

23895 6170 08/11 10:31:24 3419604 stat- ID [Datastore Read [COL-NMB-VAS-DEIS]], Bytes [1257955656121], Time [4763.874003] Sec(s), Average Speed [251.828659] MB/Sec

23895 6248 08/11 10:31:24 3419604 stat- ID [writePLBuffer], Bytes [1236533624131], Time [6862.870856] Sec(s), Average Speed [171.830475] MB/Sec

23895 6248 08/11 10:31:24 3419604 stat- ID [allocPLBuffer], Samples [3534490], Time [107.024415] Sec(s), Average [0.000030] Sec/Sample

23895 61fe 08/11 10:31:24 3419604 stat- ID [writePLBuffer], Bytes [1253131329427], Time [6751.119637] Sec(s), Average Speed [177.019402] MB/Sec

23895 61fe 08/11 10:31:24 3419604 stat- ID [allocPLBuffer], Samples [3598987], Time [104.809606] Sec(s), Average [0.000029] Sec/Sample

17846 4645 08/11 10:31:41 3419454 TPool [SdtHeadThPool]. Ser# [0] Tot [750738], Pend [0], Comp [750738], Max Par [8], Time (Serial) [202.613242]s, Time (Parallel) [172.857647]s, Wait [484.295964]s

23895 5d8b 08/11 10:31:43 3419604 TPool [SdtHeadThPool]. Ser# [0] Tot [2695185], Pend [4], Comp [2695181], Max Par [16], Time (Serial) [703.743625]s, Time (Parallel) [293.547104]s, Wait [3001.687357]s

17846 6641 08/11 10:31:54 3419454 vsJobMgr::updateVMBkpJobStatus() - Sending VM status for [1] virtual machines

23895 6170 08/11 10:33:23 3419604 vsJobMgr::updateVMBkpJobStatus() - Sending VM status for [1] virtual machines

17846 6641 08/11 10:33:54 3419454 vsJobMgr::updateVMBkpJobStatus() - Sending VM status for [1] virtual machines

23895 6170 08/11 10:35:23 3419604 vsJobMgr::updateVMBkpJobStatus() - Sending VM status for [1] virtual machines

17846 6641 08/11 10:35:25 3419454 stat- ID [readdisk], Bytes [1910918611296], Time [32822.741176] Sec(s), Average Speed [55.522297] MB/Sec

17846 6998 08/11 10:35:25 3419454 stat- ID [allocPLBuffer], Samples [5467181], Time [81.149421] Sec(s), Average [0.000015] Sec/Sample

17846 6998 08/11 10:35:25 3419454 stat- ID [writePLBuffer], Bytes [1911423030619], Time [11226.865947] Sec(s), Average Speed [162.367223] MB/Sec

Thank you.

Userlevel 1
Badge +2

Thanks Bill for the output requested. While the WritePLBuffer and Readdisk of the datastores does look good, I don’t like the Sec(s) taken on each of these operations, as it seems a bit high. 

 

I’d get a support incident opened on this for further feedback and deep diving to confirm where the possible latencies are coming from. 

Userlevel 7
Badge +23

You can commit a virtual server job - that means when you kill it, it will save all the previous VMs completed (job completes with errors). 

https://documentation.commvault.com/11.24/expert/30848_committing_backup_job_for_virtual_server_agents.html

Reply