5.7. Composite Cluster Switching, Failover and Recovery
Switching of a dataservice is done to transfer the Master role from one
cluster to another, usually in another datacenter site. This also has the
effect of turning the original Master into a Relay. The master dataservice
within a composite cluster can be forced to failover to the slave
dataservice in the event the master dataservice is offline.
Switching the master dataservice performs the following steps:
Set the master node to offline state. New connections to the master are
rejected, and writes to the master are stopped.
On the relay in the target cluster, switch the datasource offline. New
connections are rejected, stopping reads on this master.
Kill any outstanding client connections to the master data source,
except those belonging to the
Send a heartbeat transaction between the old master and the new master,
and wait until this transaction has been received. Once received, the
THL on master and slave are up to date.
Perform the switch:
Configure all remaining replicators offline
Configure the target cluster relay node as the new master.
Set the new master to the online state.
New connections to the master are permitted.
Configure the old master to be a relay datasource.
Configure the slaves in the primary site to use the new master
Configure the slaves in the slave site to use the new relay datasource.
Update the connector configurations and enable client connections to
connect to the masters and slaves.
The switching process is monitoring by Tungsten Clustering, and if the process
fails, either due to a timeout or a recoverable error occurs, the switch
operation is rolled back, returning the dataservice to the original
configuration. This ensures that the dataservice remains operational. In
some circumstances, when performing a manual switch, the command may need to
be repeated to ensure the requested switch operation completes.
The process takes a finite amount of time to complete, and the exact timing
and duration will depend on the state, health, and database activity on the
dataservice. The actual time taken will depend on how up to date the slave
being promoted is compared to the master. The switch will take place
regardless of the current status after a delay period.