Question

two site auto failover when wan link failure

  • 15 May 2024
  • 1 reply
  • 27 views

Badge +5

Hi All,

 

Refer auto failover as below

  • The monitoring nodes and the failover CommServe host(s) periodically check the active CommServe host, and automatically fails over to the standby CommServe host when the active CommServe host is not reachable. The failover occurs only when both the following conditions are satisfied:

    • The monitoring nodes and the failover CommServe host(s) are not able to communicate with the CommServe (Instance001) in the active CommServe host.

    • The monitoring nodes and the failover CommServe host(s) are able to reach the other clients in the network.

 

Compare normal cluster failover concept, it may require third site (witness) decide and trigger failover. But manual did not mention require monitoring nodes in third site.

 

If setting as below, which (nodes / hosts) and how to decide and trigger auto failover when WAN link failure and master commserve still alive?

  • siteA  WAN link connect to siteB
  • siteA have physical master commserve with MediaAgent role backup siteA VM data and replicate to siteB
  • siteB have standby commserve VM and physical MediaAgent backup siteB data and replicate to siteA
  • siteA VM1 and VM2 for monitoring nodes
  • siteB VM3 for monitoring nodes

 


1 reply

Userlevel 2
Badge +3

Hello @Chinchilla,

 

Thank you for reaching out to Commvault Community. For this specific scenario, the Primary Commserve (Site A) should remain active. The WAN failure would cause the Standby CS to be unreachable and the failover logic would check which Commserve has the latest Database. If they all of the databases in sync, such as if the WAN goes down but comes back up before the next LiveSync Replication job completes, the logic would choose the lowest Client ID (which would be the first CS configured in your environment… Most likely CS at Site A). 

 

If services go down on your Primary CS, failover monitor checks fail 3 times in a row (every 5 minutes by default) and Standby CS is reachable, this is when the failover logic would automatically bring up the Standby at Site B as the active CS. 

 

Please let me know if this helps!

 

-- Chuck Graves

 

 

Reply