Skip to main content
Solved

Kubernetes Backup Failing


dude
Byte
Forum|alt.badge.img+15

Added a Kubernetes Cluster. The pods are discovered fine however during the backup I see the following on the vsbkp.log.

Job goes pending and after a while it fails.

4868  3940  01/27 13:37:18 JOB ID CKubsInfo::OpenVmdk() - Listing volume [qbc] failed with error [0xFFFFFFFF:{KubsApp::ListDir(1401)} + {KubsVol::ListDir(643)} + {KubsWS::ListDir(193)/ErrNo.-1.(Unknown error)-Exec failed with unhandled exception: set_fail_handler: 20: Invalid HTTP status.}]

4868  3940  01/27 13:37:18 JOBID CKubsFSVolume::Close() - Closing Disk: [qbc] of VM [test-clst`StatefulSet`qbc`b5bf0bb6-a0fc-4123-ac68-9de3c3800807]

Documentation is very shallow and not enough KBs around kubernetes. 

Ideas?

Best answer by raj5725

Hi,

Thanks for updating the solution and my apologies for the delayed response. I would summarize the problem and the solution that resolved my problem below

CommVault environment:  was running at 11.24 when the issue was seen and then upgraded it to CV 11.25 – problem remained

Kubernetes environment: Charmed Distribution of Kubernetes (CDK- Canonical Ubuntu 18.04)

Problem reported: Our DC is not internet connected so we had configured Air-gapped clusters as mentioned in the document  https://documentation.commvault.com/11.25/essential/144080_enabling_backups_and_restores_of_air_gapped_clusters_for_kubernetes.html

When we try to backup a stateful container, the backup of the container was failing with the bellow error

From the Media agent à vsbkp logs, we could see the following errors:

 

{},"name":"cvcontainer","ready":false,"restartCount":0,"started":false,"state":{"waiting":{"message":"Back-off pulling image \"centos:8\"","reason":"ImagePullBackOff"}}}],
            "hostIP":"xx.xx.xx.xx.",
            "phase":"Pending",
            "podIP":"xx.xx.xx.xx",
            "podIPs":[{"ip":"xx.xx.xx.xx."}],"qosClass":"Burstable","startTime":"2021-11-03T10:59:53Z"}]}]
24604 606c 11/03 11:02:30 817 CheckVMInfoError() - VM [POD name] Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 MonitorAgents() - There are no running agents
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name replicas-0 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name -replicas-1 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name -replicas-2 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name t-master-0 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name -replicas-0 --> PENDING                     Agent Failure
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - ===== JOB SUMMARY REPORT BACKUP PHASE =====
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - Total VMs to backup           [6]
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() -       VMs pending             [6]
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - ===========================================

Inspite of setting up the airgap environment, CV was still trying to pull the CentOS container from docker hub and failing with ImagePullBackOff.

After re-checking the configuration a mistake we had done was

Registry entries are case sensitive “i” was entered in small case. 

 Modifying this string resolved the issue

 

View original
Did this answer your question?

20 replies

Damian Andre
Vaulter
Forum|alt.badge.img+23
  • Vaulter
  • 1229 replies
  • January 27, 2021

Curious as to the version you are trying to protect @dude ?


dude
Byte
Forum|alt.badge.img+15
  • Author
  • Byte
  • 287 replies
  • January 27, 2021
Damian Andre wrote:

Curious as to the version you are trying to protect @dude ?

CV SP20.32 Kubernetes 1.16.8


Forum|alt.badge.img+1

@dude 

If possible can you share the vsbkp.log and yaml of the application you are trying to protect?
It will help us better to understand the issue being faced.

 

-Manoranjan


dude
Byte
Forum|alt.badge.img+15
  • Author
  • Byte
  • 287 replies
  • February 1, 2021
Manoranjan Reddy wrote:

@dude 

If possible can you share the vsbkp.log and yaml of the application you are trying to protect?
It will help us better to understand the issue being faced.

 

-Manoranjan

I have opened a ticket with commvault to discuss further.


Forum|alt.badge.img+4
  • Byte
  • 11 replies
  • October 8, 2021

@Dude, 

if you found a solution to this problem, can you share the same as we are also facing the same issue?


Mike Struening
Vaulter
Forum|alt.badge.img+23

That’s on me, @raj5725 !  Not sure how this one slipped through as ‘solved’!

Here is the last action for the case:

Here is a summary of the issues seen during troubleshooting on Tuesday.

Also it is advised to update to the latest available hotfixes as there are some fixes for Kubernetes backup.


On the call we added a new Kubernetes instance using  API IP: port number (previously the url for Rancher was used)

1.    Tested one Application
-    This failed due to a failed mount attempt on the volume.  This error was seen on the Kubernetes side 
-    please check with storage team on this 

 

2.    Failed backup attempt also left pods in a stuck creating or terminating state
-    Solution for this was to force delete the pod  (k delete pod “podname” –force –grace-period=0

I did notice the case was closed, though @dude responded afterwards (though the case was closed so there was no further response).

@dude , do you recall what was the main fix for this issue?

ps I unmarked this as answered.


dude
Byte
Forum|alt.badge.img+15
  • Author
  • Byte
  • 287 replies
  • October 8, 2021

I dont think there was a proper fix to it. There was a lot of errors around disk mounts and unmounts that were unfamiliar. We decided not to pursue CV and Kubernetes. Documentation had very little info about configuring and troubleshooting at the time. Support Ticket did not help nor provided the confidence/results we expected.

Sorry not much of a help in this area. 

 


Mike Struening
Vaulter
Forum|alt.badge.img+23

No apology need, @dude !

@raj5725 , can you create a support incident and share the case number with me?


Forum|alt.badge.img+4
  • Byte
  • 11 replies
  • October 12, 2021

@Mike Struening 

we are yet to install CV in production mode. Before going into production we were testing the CV +K8 integration in a test environment and we got these errors. we wanted to confirm before going into production. Let me add the CV licenses if possible to log a support incident 


Mike Struening
Vaulter
Forum|alt.badge.img+23

Sounds like a solid plan.  Keep me posted!


Mike Struening
Vaulter
Forum|alt.badge.img+23

Hey @raj5725 , following up to see if you had a chance to open a support case for this?

Let me know the case number!


Mike Struening
Vaulter
Forum|alt.badge.img+23

Hi @raj5725 , gentle follow up on this one.  Were you able to get an incident created?  or did you resolve the issue?  Please let me know how this is going.

Thanks!


Forum|alt.badge.img+4
  • Byte
  • 11 replies
  • November 2, 2021

Hi Mike, 

sorry for the delayed response.. 

Since this environment is a very high sensitive government site, I am not able to add the CV licenses to the test environment and because of that I am not able to log a support incident. I am trying to find work around to achieve this and will keep you posted.  

Regards

 


Mike Struening
Vaulter
Forum|alt.badge.img+23

Understand completely.  Please do keep us posted!


MFasulo
Vaulter
Forum|alt.badge.img+12
  • Vaulter
  • 175 replies
  • November 2, 2021

@raj5725 reach out to me mfasulo@commvault.com


Forum|alt.badge.img+4
  • Byte
  • 11 replies
  • November 15, 2021

Hi Mike and MFasulo, 

I am finally able to create a support incident ( 211115-320). will wait for an answer from them. 

 


Mike Struening
Vaulter
Forum|alt.badge.img+23

Thanks, @raj5725 , I’ll keep an eye on it.


Mike Struening
Vaulter
Forum|alt.badge.img+23

Sharing the Solution for the second incident:

 

Finding Details:

The temp pod was not creating during the backups for the PODs with PV.

vsbkp.log
24604 606c 11/03 11:02:30 817 CK8sInfo::MountVM() - Backup failed for app [redis-data-ems-redis-master-0]. Error [0xFFFFFFFF:{CK8sInfo::Backup(282)} + {K8sCluster::CreateAppFromSnapshot(555)} + {K8sApp::CreateWorker(1087)} + {K8sUtils::WaitForReady(1079)/ErrNo.-1.(Unknown error -1)-Wait timedout for [redis-data-ems-redis-master-0-redis-data-ems-redis-master-0-cv-817]. Last update [{"conditions":[{"lastTransitionTime":"2021-11-03T10:59:53Z","status":"True","type":"Initialized"},{
"lastTransitionTime":"2021-11-03T10:59:53Z",
"message":"containers with unready status: [cvcontainer]",
"reason":"ContainersNotReady",
"status":"False","type":"Ready"},
{"lastTransitionTime":"2021-11-03T10:59:53Z",
"message":"containers with unready status: [cvcontainer]",
"reason":"ContainersNotReady",
"status":"False",
"type":"ContainersReady"},
{"lastTransitionTime":"2021-11-03T10:59:53Z",
"status":"True","type":"PodScheduled"}],
"containerStatuses":[{"image":"centos:8","imageID":"","lastState":{},"name":"cvcontainer","ready":false,"restartCount":0,"started":false,"state":{"waiting":{"message":"Back-off pulling image \"centos:8\"","reason":"ImagePullBackOff"}}}],
"hostIP":"10.11.224.24",
"phase":"Pending",
"podIP":"10.11.230.49",
"podIPs":[{"ip":"10.11.230.49"}],"qosClass":"Burstable","startTime":"2021-11-03T10:59:53Z"}]}]
24604 606c 11/03 11:02:30 817 CheckVMInfoError() - VM [redis-data-ems-redis-master-0] Error mounting snap volumes.

Solution:

Assisted in configuring the additional setting sK8sImageRegistryUrl to pull pod from local repository.


Forum|alt.badge.img+4
  • Byte
  • 11 replies
  • Answer
  • December 7, 2021

Hi,

Thanks for updating the solution and my apologies for the delayed response. I would summarize the problem and the solution that resolved my problem below

CommVault environment:  was running at 11.24 when the issue was seen and then upgraded it to CV 11.25 – problem remained

Kubernetes environment: Charmed Distribution of Kubernetes (CDK- Canonical Ubuntu 18.04)

Problem reported: Our DC is not internet connected so we had configured Air-gapped clusters as mentioned in the document  https://documentation.commvault.com/11.25/essential/144080_enabling_backups_and_restores_of_air_gapped_clusters_for_kubernetes.html

When we try to backup a stateful container, the backup of the container was failing with the bellow error

From the Media agent à vsbkp logs, we could see the following errors:

 

{},"name":"cvcontainer","ready":false,"restartCount":0,"started":false,"state":{"waiting":{"message":"Back-off pulling image \"centos:8\"","reason":"ImagePullBackOff"}}}],
            "hostIP":"xx.xx.xx.xx.",
            "phase":"Pending",
            "podIP":"xx.xx.xx.xx",
            "podIPs":[{"ip":"xx.xx.xx.xx."}],"qosClass":"Burstable","startTime":"2021-11-03T10:59:53Z"}]}]
24604 606c 11/03 11:02:30 817 CheckVMInfoError() - VM [POD name] Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 MonitorAgents() - There are no running agents
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name replicas-0 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name -replicas-1 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name -replicas-2 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name t-master-0 --> PENDING                     Error mounting snap volumes.
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - VM POD name -replicas-0 --> PENDING                     Agent Failure
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - ===== JOB SUMMARY REPORT BACKUP PHASE =====
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - Total VMs to backup           [6]
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() -       VMs pending             [6]
24604 601c 11/03 11:03:34 817 VSBkpCoordinator::Run() - ===========================================

Inspite of setting up the airgap environment, CV was still trying to pull the CentOS container from docker hub and failing with ImagePullBackOff.

After re-checking the configuration a mistake we had done was

Registry entries are case sensitive “i” was entered in small case. 

 Modifying this string resolved the issue

 


Mike Struening
Vaulter
Forum|alt.badge.img+23

Appreciate the detailed reply!  I marked your response as the Best Answer as well.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings