Commvault services goes into disabled mode after coming back from DR failover
Commvault services goes into disabled mode

Best answer by Mike Struening RETIRED
Sharing case resolution:
Finding Details:
Environment Details
=============
Commvault version 11.25.28
SQL 2014 SP2.
Windows server 2016.
Issue:
======
Customer says when the instances002 is started on the DR machines, it stops the services on instances01
On the production machines.
Findings:
=======
Found DR server ( Instances002 ) were in stopped
Steps Taken :
========
-- Disabled the Live Sync
-- Navigate customer to
ContentStore2\iDataAgent\JobResults\CommServeFailover.xml.
-- Renamed to CommServeFailover.xml.old
On both the nodes.
-- Started the instances02 services on both nodes again.
-- Enabled Live Sync.
-- Confirmed that replication jobs started to run and completed successfully.
-- Later , it started to stop the instances01 services on the production machines again.
Found that due to the unsuccessful failover and failback attempt last Saturday we have a Split brain situation in live sync configuration.
As soon as we started the instance002 services on DR node it stops and disables the services on the production node.
Found that due to the unsuccessful failover attempt the Failover configuration is inconsistent.
Tried renaming the configuration files on both the nodes but the configuration files are not getting recreated after disabling and enabling the live sync.
Also, the media agents are showing offline due to the whitelist added to the commserv itself- A lot of inconsistencies.
To fix this stopped all the services on the production node and performed the unplanned failover to DR node.
Post that we had a communication issue on the failover instances- corrected communication issue- This corrected the split brian.
After this stopped all the services on DR node for both instance001 and 002.
Performed the unplanned failover to the production node.
Post that started the s instance002 Services on DR node and performed Cvlogshipping backup and restores successfully.
Now we do not see any service getting stopped on Production node.
Monitored for 24 hours and Confirmed that issue has been resolved.
Solution:
Stopped all the services on the production node and performed the unplanned failover to DR node.
Post that we had a communication issue on the failover instances- corrected communication issue- This corrected the split brain.
Post that started the s instance002 Services on DR node and performed Cvlogshipping backup and restores successfully.
Reply
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.