Solved

Kubernetes Backup Failing

Forum|Forum|4 years ago
January 27, 2021
20 replies
1641 views

+18

dude
Community All Star

Added a Kubernetes Cluster. The pods are discovered fine however during the backup I see the following on the vsbkp.log.

Job goes pending and after a while it fails.

4868 3940 01/27 13:37:18 JOB ID CKubsInfo::OpenVmdk() - Listing volume [qbc] failed with error [0xFFFFFFFF:{KubsApp::ListDir(1401)} + {KubsVol::ListDir(643)} + {KubsWS::ListDir(193)/ErrNo.-1.(Unknown error)-Exec failed with unhandled exception: set_fail_handler: 20: Invalid HTTP status.}]

4868 3940 01/27 13:37:18 JOBID CKubsFSVolume::Close() - Closing Disk: [qbc] of VM [test-clst`StatefulSet`qbc`b5bf0bb6-a0fc-4123-ac68-9de3c3800807]

Documentation is very shallow and not enough KBs around kubernetes.

Ideas?

Best answer by raj5725

Hi,

Thanks for updating the solution and my apologies for the delayed response. I would summarize the problem and the solution that resolved my problem below

CommVault environment: was running at 11.24 when the issue was seen and then upgraded it to CV 11.25 – problem remained

Kubernetes environment: Charmed Distribution of Kubernetes (CDK- Canonical Ubuntu 18.04)

Problem reported: Our DC is not internet connected so we had configured Air-gapped clusters as mentioned in the document https://documentation.commvault.com/11.25/essential/144080_enabling_backups_and_restores_of_air_gapped_clusters_for_kubernetes.html

When we try to backup a stateful container, the backup of the container was failing with the bellow error

From the Media agent à vsbkp logs, we could see the following errors:

{},"name":"cvcontainer","ready":false,"restartCount":0,"started":false,"state":{"waiting":{"message":"Back-off pulling image \"centos:8\"","reason":"ImagePullBackOff"}}}],
            "hostIP":"xx.xx.xx.xx.",
            "phase":"Pending",
            "podIP":"xx.xx.xx.xx",
            "podIPs":[{"ip":"xx.xx.xx.xx."}],"qosClass":"Burstable","startTime":"2021-11-03T10:59:53Z"}]}]
24604 606c 11/03 11:02:30 817 CheckVMInfoError() - VM [POD name] Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 MonitorAgents() - There are no running agents
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name replicas-0 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name -replicas-1 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name -replicas-2 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name t-master-0 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name -replicas-0 --> PENDING                     Agent Failure
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - ===== JOB SUMMARY REPORT BACKUP PHASE =====
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - Total VMs to backup           [6]
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() -       VMs pending             [6]
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - ===========================================

Inspite of setting up the airgap environment, CV was still trying to pull the CentOS container from docker hub and failing with ImagePullBackOff.

After re-checking the configuration a mistake we had done was

Registry entries are case sensitive “i” was entered in small case.

Modifying this string resolved the issue

+24

Damian Andre
Vaulter
Forum|Forum|4 years ago
January 27, 2021

Curious as to the version you are trying to protect @dude ?

+18

dude
Author
Community All Star
Forum|Forum|4 years ago
January 27, 2021

Curious as to the version you are trying to protect @dude ?

CV SP20.32 Kubernetes 1.16.8

Manoranjan Reddy
Vaulter
Forum|Forum|4 years ago
January 28, 2021

@dude

If possible can you share the vsbkp.log and yaml of the application you are trying to protect?
It will help us better to understand the issue being faced.

-Manoranjan

+18

dude
Author
Community All Star
Forum|Forum|4 years ago
February 1, 2021

@dude

If possible can you share the vsbkp.log and yaml of the application you are trying to protect?
It will help us better to understand the issue being faced.

-Manoranjan

I have opened a ticket with commvault to discuss further.

raj5725
Novice
Forum|Forum|4 years ago
October 8, 2021

@Dude,

if you found a solution to this problem, can you share the same as we are also facing the same issue?

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
October 8, 2021

That’s on me, @raj5725 ! Not sure how this one slipped through as ‘solved’!

Here is the last action for the case:

Here is a summary of the issues seen during troubleshooting on Tuesday.

Also it is advised to update to the latest available hotfixes as there are some fixes for Kubernetes backup.

On the call we added a new Kubernetes instance using API IP: port number (previously the url for Rancher was used)

1. Tested one Application
- This failed due to a failed mount attempt on the volume. This error was seen on the Kubernetes side
- please check with storage team on this

2. Failed backup attempt also left pods in a stuck creating or terminating state
- Solution for this was to force delete the pod (k delete pod “podname” –force –grace-period=0

I did notice the case was closed, though @dude responded afterwards (though the case was closed so there was no further response).

@dude , do you recall what was the main fix for this issue?

ps I unmarked this as answered.

https://www.linkedin.com/in/michael-struening

+18

dude
Author
Community All Star
Forum|Forum|4 years ago
October 8, 2021

I dont think there was a proper fix to it. There was a lot of errors around disk mounts and unmounts that were unfamiliar. We decided not to pursue CV and Kubernetes. Documentation had very little info about configuring and troubleshooting at the time. Support Ticket did not help nor provided the confidence/results we expected.

Sorry not much of a help in this area.

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
October 8, 2021

No apology need, @dude !

@raj5725 , can you create a support incident and share the case number with me?

https://www.linkedin.com/in/michael-struening

raj5725
Novice
Forum|Forum|4 years ago
October 12, 2021

@Mike Struening

we are yet to install CV in production mode. Before going into production we were testing the CV +K8 integration in a test environment and we got these errors. we wanted to confirm before going into production. Let me add the CV licenses if possible to log a support incident

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
October 12, 2021

Sounds like a solid plan. Keep me posted!

https://www.linkedin.com/in/michael-struening

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
October 21, 2021

Hey @raj5725 , following up to see if you had a chance to open a support case for this?

Let me know the case number!

https://www.linkedin.com/in/michael-struening

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
October 28, 2021

Hi @raj5725 , gentle follow up on this one. Were you able to get an incident created? or did you resolve the issue? Please let me know how this is going.

Thanks!

https://www.linkedin.com/in/michael-struening

raj5725
Novice
Forum|Forum|4 years ago
November 2, 2021

Hi Mike,

sorry for the delayed response..

Since this environment is a very high sensitive government site, I am not able to add the CV licenses to the test environment and because of that I am not able to log a support incident. I am trying to find work around to achieve this and will keep you posted.

Regards

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
November 2, 2021

Understand completely. Please do keep us posted!

https://www.linkedin.com/in/michael-struening

+11

MFasulo
Vaulter
Forum|Forum|4 years ago
November 2, 2021

@raj5725 reach out to me mfasulo@commvault.com

raj5725
Novice
Forum|Forum|4 years ago
November 15, 2021

Hi Mike and MFasulo,

I am finally able to create a support incident ( 211115-320). will wait for an answer from them.

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
November 15, 2021

Thanks, @raj5725 , I’ll keep an eye on it.

https://www.linkedin.com/in/michael-struening

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
December 6, 2021

Sharing the Solution for the second incident:

Finding Details:

The temp pod was not creating during the backups for the PODs with PV.

vsbkp.log
24604 606c 11/03 11:02:30 817 CK8sInfo::MountVM() - Backup failed for app [redis-data-ems-redis-master-0]. Error [0xFFFFFFFF:{CK8sInfo::Backup(282)} + {K8sCluster::CreateAppFromSnapshot(555)} + {K8sApp::CreateWorker(1087)} + {K8sUtils::WaitForReady(1079)/ErrNo.-1.(Unknown error -1)-Wait timedout for [redis-data-ems-redis-master-0-redis-data-ems-redis-master-0-cv-817]. Last update [{"conditions":[{"lastTransitionTime":"2021-11-03T10:59:53Z","status":"True","type":"Initialized"},{
"lastTransitionTime":"2021-11-03T10:59:53Z",
"message":"containers with unready status: [cvcontainer]",
"reason":"ContainersNotReady",
"status":"False","type":"Ready"},
{"lastTransitionTime":"2021-11-03T10:59:53Z",
"message":"containers with unready status: [cvcontainer]",
"reason":"ContainersNotReady",
"status":"False",
"type":"ContainersReady"},
{"lastTransitionTime":"2021-11-03T10:59:53Z",
"status":"True","type":"PodScheduled"}],
"containerStatuses":[{"image":"centos:8","imageID":"","lastState":{},"name":"cvcontainer","ready":false,"restartCount":0,"started":false,"state":{"waiting":{"message":"Back-off pulling image \"centos:8\"","reason":"ImagePullBackOff"}}}],
"hostIP":"10.11.224.24",
"phase":"Pending",
"podIP":"10.11.230.49",
"podIPs":[{"ip":"10.11.230.49"}],"qosClass":"Burstable","startTime":"2021-11-03T10:59:53Z"}]}]
24604 606c 11/03 11:02:30 817 CheckVMInfoError() - VM [redis-data-ems-redis-master-0] Error mounting snap volumes.

Solution:

Assisted in configuring the additional setting sK8sImageRegistryUrl to pull pod from local repository.

https://www.linkedin.com/in/michael-struening

raj5725
Novice
Answer
Forum|Forum|4 years ago
December 7, 2021

Hi,

Thanks for updating the solution and my apologies for the delayed response. I would summarize the problem and the solution that resolved my problem below

CommVault environment: was running at 11.24 when the issue was seen and then upgraded it to CV 11.25 – problem remained

Kubernetes environment: Charmed Distribution of Kubernetes (CDK- Canonical Ubuntu 18.04)

When we try to backup a stateful container, the backup of the container was failing with the bellow error

From the Media agent à vsbkp logs, we could see the following errors:

Inspite of setting up the airgap environment, CV was still trying to pull the CentOS container from docker hub and failing with ImagePullBackOff.

After re-checking the configuration a mistake we had done was

Registry entries are case sensitive “i” was entered in small case.

Modifying this string resolved the issue

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
December 7, 2021

Appreciate the detailed reply! I marked your response as the Best Answer as well.

https://www.linkedin.com/in/michael-struening

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded