Solved

Kubernetes Backup Failing

  • 27 January 2021
  • 20 replies
  • 303 views

Badge +13

Added a Kubernetes Cluster. The pods are discovered fine however during the backup I see the following on the vsbkp.log.

Job goes pending and after a while it fails.

4868  3940  01/27 13:37:18 JOB ID CKubsInfo::OpenVmdk() - Listing volume [qbc] failed with error [0xFFFFFFFF:{KubsApp::ListDir(1401)} + {KubsVol::ListDir(643)} + {KubsWS::ListDir(193)/ErrNo.-1.(Unknown error)-Exec failed with unhandled exception: set_fail_handler: 20: Invalid HTTP status.}]

4868  3940  01/27 13:37:18 JOBID CKubsFSVolume::Close() - Closing Disk: [qbc] of VM [test-clst`StatefulSet`qbc`b5bf0bb6-a0fc-4123-ac68-9de3c3800807]

Documentation is very shallow and not enough KBs around kubernetes. 

Ideas?

icon

Best answer by raj5725 7 December 2021, 08:14

View original

20 replies

Userlevel 7
Badge +15

Curious as to the version you are trying to protect @dude ?

Badge +13

Curious as to the version you are trying to protect @dude ?

CV SP20.32 Kubernetes 1.16.8

Userlevel 1
Badge +1

@dude 

If possible can you share the vsbkp.log and yaml of the application you are trying to protect?
It will help us better to understand the issue being faced.

 

-Manoranjan

Badge +13

@dude 

If possible can you share the vsbkp.log and yaml of the application you are trying to protect?
It will help us better to understand the issue being faced.

 

-Manoranjan

I have opened a ticket with commvault to discuss further.

Userlevel 1
Badge +1

@Dude, 

if you found a solution to this problem, can you share the same as we are also facing the same issue?

Userlevel 7
Badge +23

That’s on me, @raj5725 !  Not sure how this one slipped through as ‘solved’!

Here is the last action for the case:

Here is a summary of the issues seen during troubleshooting on Tuesday.

Also it is advised to update to the latest available hotfixes as there are some fixes for Kubernetes backup.


On the call we added a new Kubernetes instance using  API IP: port number (previously the url for Rancher was used)

1.    Tested one Application
-    This failed due to a failed mount attempt on the volume.  This error was seen on the Kubernetes side 
-    please check with storage team on this 

 

2.    Failed backup attempt also left pods in a stuck creating or terminating state
-    Solution for this was to force delete the pod  (k delete pod “podname” –force –grace-period=0

I did notice the case was closed, though @dude responded afterwards (though the case was closed so there was no further response).

@dude , do you recall what was the main fix for this issue?

ps I unmarked this as answered.

Badge +13

I dont think there was a proper fix to it. There was a lot of errors around disk mounts and unmounts that were unfamiliar. We decided not to pursue CV and Kubernetes. Documentation had very little info about configuring and troubleshooting at the time. Support Ticket did not help nor provided the confidence/results we expected.

Sorry not much of a help in this area. 

 

Userlevel 7
Badge +23

No apology need, @dude !

@raj5725 , can you create a support incident and share the case number with me?

Userlevel 1
Badge +1

@Mike Struening 

we are yet to install CV in production mode. Before going into production we were testing the CV +K8 integration in a test environment and we got these errors. we wanted to confirm before going into production. Let me add the CV licenses if possible to log a support incident 

Userlevel 7
Badge +23

Sounds like a solid plan.  Keep me posted!

Userlevel 7
Badge +23

Hey @raj5725 , following up to see if you had a chance to open a support case for this?

Let me know the case number!

Userlevel 7
Badge +23

Hi @raj5725 , gentle follow up on this one.  Were you able to get an incident created?  or did you resolve the issue?  Please let me know how this is going.

Thanks!

Userlevel 1
Badge +1

Hi Mike, 

sorry for the delayed response.. 

Since this environment is a very high sensitive government site, I am not able to add the CV licenses to the test environment and because of that I am not able to log a support incident. I am trying to find work around to achieve this and will keep you posted.  

Regards

 

Userlevel 7
Badge +23

Understand completely.  Please do keep us posted!

Userlevel 5
Badge +11

@raj5725 reach out to me mfasulo@commvault.com

Userlevel 1
Badge +1

Hi Mike and MFasulo, 

I am finally able to create a support incident ( 211115-320). will wait for an answer from them. 

 

Userlevel 7
Badge +23

Thanks, @raj5725 , I’ll keep an eye on it.

Userlevel 7
Badge +23

Sharing the Solution for the second incident:

 

Finding Details:

The temp pod was not creating during the backups for the PODs with PV.

vsbkp.log
24604 606c 11/03 11:02:30 817 CK8sInfo::MountVM() - Backup failed for app [redis-data-ems-redis-master-0]. Error [0xFFFFFFFF:{CK8sInfo::Backup(282)} + {K8sCluster::CreateAppFromSnapshot(555)} + {K8sApp::CreateWorker(1087)} + {K8sUtils::WaitForReady(1079)/ErrNo.-1.(Unknown error -1)-Wait timedout for [redis-data-ems-redis-master-0-redis-data-ems-redis-master-0-cv-817]. Last update [{"conditions":[{"lastTransitionTime":"2021-11-03T10:59:53Z","status":"True","type":"Initialized"},{
"lastTransitionTime":"2021-11-03T10:59:53Z",
"message":"containers with unready status: [cvcontainer]",
"reason":"ContainersNotReady",
"status":"False","type":"Ready"},
{"lastTransitionTime":"2021-11-03T10:59:53Z",
"message":"containers with unready status: [cvcontainer]",
"reason":"ContainersNotReady",
"status":"False",
"type":"ContainersReady"},
{"lastTransitionTime":"2021-11-03T10:59:53Z",
"status":"True","type":"PodScheduled"}],
"containerStatuses":[{"image":"centos:8","imageID":"","lastState":{},"name":"cvcontainer","ready":false,"restartCount":0,"started":false,"state":{"waiting":{"message":"Back-off pulling image \"centos:8\"","reason":"ImagePullBackOff"}}}],
"hostIP":"10.11.224.24",
"phase":"Pending",
"podIP":"10.11.230.49",
"podIPs":[{"ip":"10.11.230.49"}],"qosClass":"Burstable","startTime":"2021-11-03T10:59:53Z"}]}]
24604 606c 11/03 11:02:30 817 CheckVMInfoError() - VM [redis-data-ems-redis-master-0] Error mounting snap volumes.

Solution:

Assisted in configuring the additional setting sK8sImageRegistryUrl to pull pod from local repository.

Userlevel 1
Badge +1

Hi,

Thanks for updating the solution and my apologies for the delayed response. I would summarize the problem and the solution that resolved my problem below

CommVault environment:  was running at 11.24 when the issue was seen and then upgraded it to CV 11.25 – problem remained

Kubernetes environment: Charmed Distribution of Kubernetes (CDK- Canonical Ubuntu 18.04)

Problem reported: Our DC is not internet connected so we had configured Air-gapped clusters as mentioned in the document  https://documentation.commvault.com/11.25/essential/144080_enabling_backups_and_restores_of_air_gapped_clusters_for_kubernetes.html

When we try to backup a stateful container, the backup of the container was failing with the bellow error

From the Media agent à vsbkp logs, we could see the following errors:

 

{},"name":"cvcontainer","ready":false,"restartCount":0,"started":false,"state":{"waiting":{"message":"Back-off pulling image \"centos:8\"","reason":"ImagePullBackOff"}}}],
            "hostIP":"xx.xx.xx.xx.",
            "phase":"Pending",
            "podIP":"xx.xx.xx.xx",
            "podIPs":[{"ip":"xx.xx.xx.xx."}],"qosClass":"Burstable","startTime":"2021-11-03T10:59:53Z"}]}]
24604 606c 11/03 11:02:30 817 CheckVMInfoError() - VM [POD name] Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 MonitorAgents() - There are no running agents
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name replicas-0 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name -replicas-1 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name -replicas-2 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name t-master-0 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name -replicas-0 --> PENDING                     Agent Failure
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - ===== JOB SUMMARY REPORT BACKUP PHASE =====
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - Total VMs to backup           [6]
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() -       VMs pending             [6]
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - ===========================================

Inspite of setting up the airgap environment, CV was still trying to pull the CentOS container from docker hub and failing with ImagePullBackOff.

After re-checking the configuration a mistake we had done was

Registry entries are case sensitive “i” was entered in small case. 

 Modifying this string resolved the issue

 

Userlevel 7
Badge +23

Appreciate the detailed reply!  I marked your response as the Best Answer as well.

Reply