Solved

VSA subclient long running jobs - kill individual VM job

Forum|Forum|4 years ago
August 11, 2021
4 replies
665 views

+2

Bill
Apprentice

I have two long running VSA jobs, hypervisor is VMware ESXi v7u1. Both subclient jobs contain several VMs and each one has a lingering VM, meaning it’s been running for days and not showing any progress. I can kill those specific VM jobs but will the subclient show completed with errors or show as failed if I do? I’m not sure what the expected behavior is so I thought I’d reach out to the community to see if anyone else has had the same problem and do what I’m ready to try.

Best answer by Damian Andre

You can commit a virtual server job - that means when you kill it, it will save all the previous VMs completed (job completes with errors).

https://documentation.commvault.com/11.24/expert/30848_committing_backup_job_for_virtual_server_agents.html

J

+1

japplin
Vaulter
Forum|Forum|4 years ago
August 11, 2021

Hi Bill! If you navigate to the VSA proxy machine, and open up VSBKP.log, can you paste the most recent 15 lines?

Generally speaking, the usual suspects of holding up completion of the backup phase, are datastore read speeds, writes out to the Media Agent/library, and/or reading the “White noise” data of the virtual disk it’s processing.

Like

B

+2

Bill
Author
Apprentice
Forum|Forum|4 years ago
August 11, 2021

@japplin

Here are the most 15 recent lines from the proxy machine:

23895 6170 08/11 10:31:24 3419604 stat- ID [readdisk], Bytes [1257955656121], Time [4762.657301] Sec(s), Average Speed [251.892993] MB/Sec

23895 6170 08/11 10:31:24 3419604 stat- ID [Datastore Read [COL-NMB-VAS-DEIS]], Bytes [1257955656121], Time [4763.874003] Sec(s), Average Speed [251.828659] MB/Sec

23895 6248 08/11 10:31:24 3419604 stat- ID [writePLBuffer], Bytes [1236533624131], Time [6862.870856] Sec(s), Average Speed [171.830475] MB/Sec

23895 6248 08/11 10:31:24 3419604 stat- ID [allocPLBuffer], Samples [3534490], Time [107.024415] Sec(s), Average [0.000030] Sec/Sample

23895 61fe 08/11 10:31:24 3419604 stat- ID [writePLBuffer], Bytes [1253131329427], Time [6751.119637] Sec(s), Average Speed [177.019402] MB/Sec

23895 61fe 08/11 10:31:24 3419604 stat- ID [allocPLBuffer], Samples [3598987], Time [104.809606] Sec(s), Average [0.000029] Sec/Sample

17846 4645 08/11 10:31:41 3419454 TPool [SdtHeadThPool]. Ser# [0] Tot [750738], Pend [0], Comp [750738], Max Par [8], Time (Serial) [202.613242]s, Time (Parallel) [172.857647]s, Wait [484.295964]s

23895 5d8b 08/11 10:31:43 3419604 TPool [SdtHeadThPool]. Ser# [0] Tot [2695185], Pend [4], Comp [2695181], Max Par [16], Time (Serial) [703.743625]s, Time (Parallel) [293.547104]s, Wait [3001.687357]s

17846 6641 08/11 10:31:54 3419454 vsJobMgr::updateVMBkpJobStatus() - Sending VM status for [1] virtual machines

23895 6170 08/11 10:33:23 3419604 vsJobMgr::updateVMBkpJobStatus() - Sending VM status for [1] virtual machines

17846 6641 08/11 10:33:54 3419454 vsJobMgr::updateVMBkpJobStatus() - Sending VM status for [1] virtual machines

23895 6170 08/11 10:35:23 3419604 vsJobMgr::updateVMBkpJobStatus() - Sending VM status for [1] virtual machines

17846 6641 08/11 10:35:25 3419454 stat- ID [readdisk], Bytes [1910918611296], Time [32822.741176] Sec(s), Average Speed [55.522297] MB/Sec

17846 6998 08/11 10:35:25 3419454 stat- ID [allocPLBuffer], Samples [5467181], Time [81.149421] Sec(s), Average [0.000015] Sec/Sample

17846 6998 08/11 10:35:25 3419454 stat- ID [writePLBuffer], Bytes [1911423030619], Time [11226.865947] Sec(s), Average Speed [162.367223] MB/Sec

Thank you.

Like

J

+1

japplin
Vaulter
Forum|Forum|4 years ago
August 11, 2021

Thanks Bill for the output requested. While the WritePLBuffer and Readdisk of the datastores does look good, I don’t like the Sec(s) taken on each of these operations, as it seems a bit high.

I’d get a support incident opened on this for further feedback and deep diving to confirm where the possible latencies are coming from.