How Does Failover Work?
Did you ever wonder just what the Tungsten Manager is thinking when it does an automatic failover or a manual switch in a cluster?
What factors are taken into account by the Tungsten Manager it picks a replica to fail over to?
This page will detail the steps the Tungsten Manager takes to perform a switch or failover.
This section covers both the process and some possible reasons why that process might not complete, along with best practices and ways to monitor the cluster for each situation.
Roles for Nodes and Clusters
When we say “role” in the context of a cluster datasource, we are talking about the view of a database node from the Tungsten Manager's perspective.
These roles apply to the node datasource at the local (physical) cluster level, and to the composite datasource at the composite cluster level.
Possible roles are:
- Primary
- A database node which is writable, or
- A composite cluster which is active (contains a writable primary)
- Relay
- A read-only database node which pulls data from a remote cluster and shares it with downstream replicas in the same cluster
- Replica
- A read-only database node which pulls data from a local-cluster primary node, or from a local-cluster relay node for passive composite clusters
- A composite cluster which is passive (contains a relay but NO writable primary)
Moving the Primary Role to Another Node or Cluster
One of the great powers of the Tungsten Cluster is that the roles for both cluster nodes and composite cluster datasources can be moved to another node or
cluster, either at will via the switch command, or by having an automatic failover invoked by the Tungsten Manager layer.
Please note that while failovers are normally automatic and triggered by the Tungsten Manager, a failover can be also be invoked manually via the
cctrl command if ever needed.
Switch versus Failover
There are key differences between the manual switch and automatic failover operations:
- Switch
- Switch attempts to perform the operation as gracefully as possible, so there will be a delay as all of the steps are followed to ensure zero data loss
- When the switch sub-command is invoked within cctrl, the Manager will cleanly close connections and ensure replication is caught up before moving the Primary role to another node
- Switch recovers the original Primary to be a Replica
- Please see "Manual Primary Switch".
- Failover
- Failover is immediate, and could possibly result in data loss, even though we do everything we can to get all events moved to the new Primary
- Failover leaves the original primary in a SHUNNED state
- Connections are closed immediately
- Use the
recovercommand to make the failed Primary into a Replica once it is healthy - Please see both "Automatic Primary Failover" and "Tungsten Manager Failover Behavior"
For even more details, please visit: "Switching Primary Hosts"
Which Target Replica Node To Use?
Picking a target replica node from a pool of candidate database replicas involves several checks and decisions.
For switch commands for both physical and composite services, the user has the ability to pass in the name of the physical or composite replica that is to be the target of the switch. If no target is passed in, or if the operation is an automatic failover, then the Manager has logic to identify the 'most up to date' replica which then becomes the target of the switch or failover.
Here are the choices to pick a new primary database node from available replicas, in order:
- Skip any replica that is either not online or that is not a standby replica.
- Skip any replica that has its status set to ARCHIVE
- Skip any replica that does not have an online manager.
- Skip any replica that does not have a replicator in either online or synchronizing state.
- Now we have a target datasource prospect...
- By comparing the last applied sequence number of the current target datasource prospect to any other previously seen prospect, we should eventually end up with a replica that has the highest applied sequence number. We also save the prospect that has the highest stored sequence number.
- If we find that there is a tie in the highest sequence number that has been applied or stored by any prospect with another prospect, we compare the datasource precedence and if there's a difference in this precedence, we choose the datasource with the lowest precedence number i.e. a precedence of 1 is higher than a precedence of 2. If we have a tie in precedence, select the last replica chosen and discard the replica currently being evaluated.
- After we have evaluated all of the replicas, we will either have a single winner or we may have a case where we have one replica that has the highest applied sequence number but we have another replica that has the highest stored sequence number i.e. it has gotten the most number of THL records from the primary prior to the switch operation. In this case, and this is particularly important in cases of failover, we choose the replica that has the highest number of stored THL records.
- Skip any replica that has a latency higher than the configured threshold. If too far behind, do not use that replica. The tpm option
property=policy-relay-from-slave=900controls the check, with 900 seconds as the default value. - At this point return to the switch or failover command whatever target replica we have chosen so that the operation can proceed.
After looping over all available replicas, check the selected target Replica’s appliedLatency to see if it is higher than the configured threshold
(default: 900 seconds). If the appliedLatency is too far behind, do not use that Replica. The tpm
option property=policy-relay-from-slave=900 controls the check.
If no viable Replica is found (or if there is no available Replica to begin with), there will be no switch or failover at this point.
For more details on automatic failover versus manual switch, please visit: "Manual Switch Versus Automatic Failover"
Switch and Failover Steps for Local Clusters
For more details on switch and failover steps for local clusters, please visit:
Switch and Failover Steps for Composite Services
For more details on switch and failover steps for composite services, please visit: