This section describes the main differences between the manual and automatic scenarios.
Manual Switch
The entry point for manual switch/failover is the cctrl command line tool.
For switch commands for both physical and composite services, the user has the ability to pass in the name of the physical or composite replica that is to be the target of the switch.
If no target is passed in, then the Manager has logic to identify the 'most up to date' replica which then becomes the target of the switch. Here are the steps that are taken in that logic:
Skip any replica that is either not online or that is NOT a standby replica.
Skip any replica that has its status set to ARCHIVE
Skip any replica that does not have an online manager.
Skip any replica that does not have a replicator in either online or synchronizing state.
Now we have a target datasource prospect.
By comparing the last applied sequence number of the current target datasource prospect to any other previously seen prospect, we should eventually end up with a replica that has the highest applied sequence number. We also save the prospect that has the highest stored sequence number.
If we find that there is a tie in the highest sequence number that has been applied or stored by any prospect with another prospect, we compare the datasource precedence and if there's a difference in this precedence, we choose the datasource with the lowest precedence number i.e. a precedence of 1 is higher than a precedence of 2. If we have a tie in precedence, select the last replica chosen and discard the replica currently being evaluated.
After we have evaluated all of the replicas, we will either have a single winner or we may have a case where we have one replica that has the highest applied sequence number but we have another replica that has the highest stored sequence number i.e. it has gotten the most number of THL records from the primary prior to the switch operation. In this case, and this is particularly important in cases of failover, we choose the replica that has the highest number of stored THL records.
At this point we return whatever target replica we have determined to the switch or failover command so that it can continue.
After looping over all available replicas, check the selected target replica’s appliedLatency to see if it is higher than the configured threshold (default: 900 seconds). If the appliedLatency is too far behind, do not use that replica.
The tpm
option --property=policy.slave.promotion.latency.threshold=900
controls the check.
If no viable replica is found (or if there is no available replica to begin with), there will be no switch or failover at this point.
Once the target replica has been identified and validated, the switch/failover continues and follows the steps outlined in the Switch and Failover Steps section.
Automatic Failover
NOTE: Automatic failover only applies to local (physical) primary or relay datasources.
The entry point for automatic failover (there is no such thing as an automatic switch) can be found in two rules:
0200a: RECOVER PRIMARY DATASOURCE BY FAILING OVER - NON-REPLICATOR FAULT
0200b: RECOVER PRIMARY DATASOURCE BY FAILING OVER - REPLICATOR FAULT
The main difference between these two rules is that 0200b - failover
because of a replicator fault - is qualified by having the existence
of an expired ReplicatorFaultAlarm. Such an alarm will only exist if
the customer has a non-default policy configuration that says 'I want
to failover my primary if my replicator goes into a fault condition'.
This is achieved by having the following properties, in
tungsten-manager/conf/manager.properties
, set to
true:
policy.fence.masterReplicator=true
tells the manager to set the associated primary or relay
datasource to the failed state if the replicator has a fault. This
will cause a failover if the cluster is in automatic mode.
policy.fence.masterReplicator.threshold=6
tells the manager to do six iterations to try to recover the
replicator, one iteration every 10 seconds, after which the
fencing occurs.
The other difference, in both rules, is that the rules that call this method ensure that there's at least one potential target replica for the failover. So some of the validation that is done in the manual case, inside of the method, is actually done in the body of the rules. The reason for this is that we would prefer not to initiate a failover if it doesn't have a chance to succeed.