Question

Job gets killed without explanation

Forum|Forum|1 year ago
December 10, 2024
7 replies
175 views

CreativeChad
Novice

I have several VMs that are supposed to be backing up. . . but for whatever reason, the job gets killed. No error message or code (it says, “not applicable). . . No idea why they are getting killed. I’m new to BDR and CommVault, so please talk to me like I”m 5

+10

Wasim
Vaulter
Forum|Forum|1 year ago
December 11, 2024

Hello,

Check when exactly the job was killed from the Commcell console and in line with that time, please open the job manager log from the Commserver, filter by job ID, and share it.

Regards,

Wasim

CreativeChad
Author
Novice
Forum|Forum|1 year ago
December 11, 2024

Hi There. . . Not exactly sure where to see when exactly the job was killed, although I can see the “end time” (Dec 10, 2024, 4:34:36 AM. . . I was able to export the job log, and the only line I see with that time stamp is “5544 1bac 12/10 04:33:36 51456 ArchiveManagerCS::closeChunk Cmt Chnk[349690] cop[378] vol[9134] CC[2] retCnt [0]”

Attached is the log file that I was able to export. The parent job ID is 51456.

Thanks again for the assistance!

job-51520-logs.txt

Arvind
Vaulter
Forum|Forum|1 year ago
December 11, 2024

Good Day @CreativeChad,

I did take a look on the Job referenced, https://m109.metallic.io/commandcenter/#/jobs/51520 - which has the base/parent job: https://m109.metallic.io/commandcenter/#/jobs/51456

The job has been auto-killed, as it did try all the retries and it reached the max limit that is set per the best practices of SaaS Infrastructure.

Hope this helps with the information, let me know if any questions or concerns, thank you!

Cheers,
Arvind

- Cheers, Arvind

CreativeChad
Author
Novice
Forum|Forum|1 year ago
December 11, 2024

Hey Arvind,

I appreciate the information and you taking a look. . . I guess I’m wondering where to begin troubleshooting? if it auto-killed because it reached the maximum number of retries, then my question is how to figure out why it retried so many times? How do I remedy the issue so it completes successfully?

Thanks so much again for all the help. this is all new to me, so I’m just trying to gain a better understanding.

Arvind
Vaulter
Forum|Forum|1 year ago
December 12, 2024

Hey @CreativeChad,

Thanks for your update, I would request to check with the Error Summary of the overview tab of the Job, which will help us with the error’s on what caused the job to fail or error out at times.

Blurred few crucial information, as this is an community 🙂https://m109.metallic.io/commandcenter/#/jobs/51456

And, here you can see based out of the first one Number of restarts has reached - so you can navigate to Attempts tab which will help us understand which phase is causing the failure:

As you can see the Discover phase completed, only the Backup phase had failed multiple times which is 12 attempts post which the job had committed/auto-killed as it reached max re-tries.

And, now if you look at VM list - we do see one specific VM which has failed, which I suspect to be the cause, as that Child job for that VM had many backup attempts which had failed out.

So, based on the error for the Failed VM we can check - but again, it could be due to something else too which needs more in-depth log analysis to be done on the Infrastructure server’s, for which I would prefer the best way to log an Support case, so we can have it reviewed and confirm on the same.

Hope this helps with giving an glimpse to do an quick run through of checking the issue, but in-depth requires support case. Feel free to let me know if any additional questions.

- Cheers, Arvind

CreativeChad
Author
Novice
Forum|Forum|1 year ago
December 12, 2024

Hi Arvind. . . I’m not sure how you are getting that information as I don’t have those tab options in my portal. I’m definitely an admin, but I don’t have an “attempts” tab in my view (see attached). . .

I can see the error summary, and it refers to disk consolidation, but is that to say that it ran into that issue on this one VM, so it causes the rest of them to “kill”? I would assume it would simply skip that one and move on to the next one. . . Or am I just misreading the information?

Thanks again!

Chad

Arvind
Vaulter
Forum|Forum|1 year ago
December 12, 2024

Hello @CreativeChad,

My bad looks like it’s the view from Support access, but not from Tenant Admin view

That’s right, as it was doing an re-try for the backup phase again and again for 12 attempts which caused the job to fail here.

But, as mentioned earlier I would prefer making an case here - it needs an in-depth review from logs.

- Cheers, Arvind

Readiverse Academy Certifications

Sign up

Login to the community