The Primary host within a dataservice can be switched, either
automatically, or manually. Automatic switching occurs when the
dataservice is in the AUTOMATIC
policy mode, and a failure in the underlying datasource has been
identified. The automatic process is designed to keep the dataservice
running without requiring manual intervention.
Manual switching of the Primary can be performed during maintenance operations, for example during an upgrade or dataserver modification. In this situation, the Primary must be manually taken out of service, but without affecting the rest of the dataservice. By switching the Primary to another datasource in the dataservice, the original Primary can be put offline, or shunned, while maintenance occurs. Once the maintenance has been completed, the datasource can be re-enabled, and either remain as the a Replica, or switched back as the Primary datasource.
Switching a datasource, whether automatically or manually, occurs while the dataservice is running, and without affecting the operation of the dataservice as a whole. Client application connections through Tungsten Connector are automatically reassigned to the datasources in the dataservice, and application operation will be unaffected by the change. Switching the datasource manually requires a single command that performs all of the required steps, monitoring and managing the switch process.
Switching the Primary, manually or automatically, performs the following steps within the dataservice:
Set the Primary node to offline state. New connections to the Primary are rejected, and writes to the Primary are stopped.
On the Replica that will be promoted, switch the datasource offline. New connections are rejected, stopping reads on this Replica.
Kill any outstanding client connections to the Primary data source,
except those belonging to the
tungsten
account.
Send a heartbeat transaction between the Primary and the Replica, and wait until this transaction has been received. Once received, the THL on Primary and Replica are up to date.
Perform the switch:
Configure all remaining replicators offline
Configure the selected Replica as the new Primary.
Set the new Primary to the online state.
New connections to the Primary are permitted.
Configure the remaining Replicas to use the new Primary as the Primary datasource.
Update the connector configurations and enable client connections to connect to the Primaries and Replicas.
The switching process is monitored by Tungsten Cluster, and if the process fails, either due to a timeout or a recoverable error occurs, the switch operation is rolled back, returning the dataservice to the original configuration. This ensures that the dataservice remains operational. In some circumstances, when performing a manual switch, the command may need to be repeated to ensure the requested switch operation completes.
The process takes a finite amount of time to complete, and the exact timing and duration will depend on the state, health, and database activity on the dataservice. The actual time taken will depend on how up to date the Replica being promoted is compared to the Primary. The switch will take place regardless of the current status after a delay period.