Solved

Commvault services goes into disabled mode


Badge +1

Commvault services goes into disabled mode after coming back from DR failover

icon

Best answer by Mike Struening RETIRED 11 July 2022, 22:50

View original

5 replies

Userlevel 6
Badge +15

Good morning.  Just to clarify, is this after a CommServe LiveSync failover?

Badge +1

Good Morning Yes, this is after LiveSync failover

 

Userlevel 7
Badge +23

@SollyK , when you do a CS failover, the services on the node you failed over FROM will have services disabled, though the CS failed TO should have them running.

Are you seeing both of them disabled?  Or the wrong one running vs disabled?

Did the failover show as successful?

Once we get some more context we can advise better.

Thanks!

Userlevel 7
Badge +23

@SollyK , hope all is well.

Can you let me know if you’ve had a chance to review my questions?

Thanks!

Userlevel 7
Badge +23

Sharing case resolution:

Finding Details:

Environment Details
=============

Commvault version 11.25.28
SQL 2014 SP2.
Windows server 2016.

Issue:
======
Customer says when the instances002 is started on the DR machines, it stops the services on instances01
On the production machines.

Findings:
=======
Found DR server ( Instances002 ) were in stopped

Steps Taken :
========
-- Disabled the Live Sync
-- Navigate customer to
ContentStore2\iDataAgent\JobResults\CommServeFailover.xml.
-- Renamed to CommServeFailover.xml.old
On both the nodes.
-- Started the instances02 services on both nodes again.
-- Enabled Live Sync.
-- Confirmed that replication jobs started to run and completed successfully.

-- Later , it started to stop the instances01 services on the production machines again.

Found that due to the unsuccessful failover and failback attempt last Saturday we have a Split brain situation in live sync configuration.
As soon as we started the instance002 services on DR node it stops and disables the services on the production node.
Found that due to the unsuccessful failover attempt the Failover configuration is inconsistent.
Tried renaming the configuration files on both the nodes but the configuration files are not getting recreated after disabling and enabling the live sync.
Also, the media agents are showing offline due to the whitelist added to the commserv itself- A lot of inconsistencies.
To fix this stopped all the services on the production node and performed the unplanned failover to DR node.
Post that we had a communication issue on the failover instances- corrected communication issue- This corrected the split brian.
After this stopped all the services on DR node for both instance001 and 002.
Performed the unplanned failover to the production node.
Post that started the s instance002 Services on DR node and performed Cvlogshipping backup and restores successfully.
Now we do not see any service getting stopped on Production node.
Monitored for 24 hours and Confirmed that issue has been resolved.

Solution:

Stopped all the services on the production node and performed the unplanned failover to DR node.
Post that we had a communication issue on the failover instances- corrected communication issue- This corrected the split brain.
Post that started the s instance002 Services on DR node and performed Cvlogshipping backup and restores successfully.

Reply