Commvault services goes into disabled mode after coming back from DR failover
Best answer by Mike Struening RETIRED 11 July 2022, 22:50
Good morning. Just to clarify, is this after a CommServe LiveSync failover?
Good Morning Yes, this is after LiveSync failover
@SollyK , when you do a CS failover, the services on the node you failed over FROM will have services disabled, though the CS failed TO should have them running.
Are you seeing both of them disabled? Or the wrong one running vs disabled?
Did the failover show as successful?
Once we get some more context we can advise better.
@SollyK , hope all is well.
Can you let me know if you’ve had a chance to review my questions?
Sharing case resolution:
Environment Details=============Commvault version 11.25.28SQL 2014 SP2.Windows server 2016.Issue:======Customer says when the instances002 is started on the DR machines, it stops the services on instances01On the production machines.Findings:=======Found DR server ( Instances002 ) were in stoppedSteps Taken :========-- Disabled the Live Sync-- Navigate customer toContentStore2\iDataAgent\JobResults\CommServeFailover.xml.-- Renamed to CommServeFailover.xml.oldOn both the nodes.-- Started the instances02 services on both nodes again.-- Enabled Live Sync.-- Confirmed that replication jobs started to run and completed successfully.-- Later , it started to stop the instances01 services on the production machines again.Found that due to the unsuccessful failover and failback attempt last Saturday we have a Split brain situation in live sync configuration.As soon as we started the instance002 services on DR node it stops and disables the services on the production node.Found that due to the unsuccessful failover attempt the Failover configuration is inconsistent.Tried renaming the configuration files on both the nodes but the configuration files are not getting recreated after disabling and enabling the live sync.Also, the media agents are showing offline due to the whitelist added to the commserv itself- A lot of inconsistencies.To fix this stopped all the services on the production node and performed the unplanned failover to DR node.Post that we had a communication issue on the failover instances- corrected communication issue- This corrected the split brian.After this stopped all the services on DR node for both instance001 and 002.Performed the unplanned failover to the production node.Post that started the s instance002 Services on DR node and performed Cvlogshipping backup and restores successfully.Now we do not see any service getting stopped on Production node.Monitored for 24 hours and Confirmed that issue has been resolved.
Stopped all the services on the production node and performed the unplanned failover to DR node.Post that we had a communication issue on the failover instances- corrected communication issue- This corrected the split brain.Post that started the s instance002 Services on DR node and performed Cvlogshipping backup and restores successfully.
Already have an account? Login
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.
Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.
Sorry, our virus scanner detected that this file isn't safe to download.