When a datasource within the dataservice fails, the exact response by the dataservice is dependent on the dataservice policy mode. Different policy modes either cope with the failure or recovery process automatically, or a prescribed sequence must be followed.
Recovery can normally be achieved by following these basic steps:
Use the recover command
The recover command performs a number of steps to try and return the dataservice to the operational state, but works only if there is an existing Primary within the current configuration. Operations conducted automatically include Replica recovery, and reconfiguring roles. For example:
[LOGICAL] /alpha >
recoverFOUND PHYSICAL DATASOURCE TO RECOVER: 'host2@alpha' VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host2' DATA SERVER 'host2' IS NOW AVAILABLE FOR CONNECTIONS RECOVERING 'host2@alpha' TO A SLAVE USING 'host3@alpha' AS THE MASTER DataSource 'host2' is now OFFLINE RECOVERY OF DATA SERVICE 'alpha' SUCCEEDED FOUND PHYSICAL DATASOURCE TO RECOVER: 'host1@alpha' VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host1' DATA SERVER 'host1' IS NOW AVAILABLE FOR CONNECTIONS RECOVERING 'host1@alpha' TO A SLAVE USING 'host3@alpha' AS THE MASTER DataSource 'host1' is now OFFLINE RECOVERY OF DATA SERVICE 'alpha' SUCCEEDED RECOVERED 2 DATA SOURCES IN SERVICE 'alpha'
Replica failure, Primary still available
[LOGICAL:EXPERT] /alpha >
datasource host1 recoverVERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host1' DATA SERVER 'host1' IS NOW AVAILABLE FOR CONNECTIONS RECOVERING 'host1@alpha' TO A SLAVE USING 'host2@alpha' AS THE MASTER RECOVERY OF 'host1@alpha' WAS SUCCESSFUL
If recovery of the Replica fails with this method, you can try more advanced solutions for getting your Replica(s) working, including reprovisioning from another Replica.
For more info, see Section 5.6.1, “Recover a failed Replica” .
If the most up to date Primary can be identified, use the recover using command to set the new Primary and recover the remaining Replicas. If this does not work, use the set Master command and then use the recover command to bring back as many possible Replicas, and then use a backup/restore operation to bring any other Replicas back into operation, or use the tungsten_provision_slave command. For more information, see Section 5.6.2, “Recover a failed Primary” .
A summary of these different scenarios and steps is provided in the following table:
|Policy Mode||Scenario||Datasource State||Resolution|
|Primary Recovery||Master:SHUNNED(FAILED-OVER-TO-host2)||Section 5.6.2, “Recover a failed Primary”|
|Primary Failure||Master:FAILED(NODE 'host1' IS UNREACHABLE))||Section 184.108.40.206, “Failing over a Primary”|
|Primary Recovery||Master:SHUNNED(FAILED-OVER-TO-host2)||Section 220.127.116.11, “Recover a shunned Primary”|
|Replica Failure||slave:FAILED(NODE 'host1' IS UNREACHABLE)||Automatically removed from service|
|Replica Recovery||slave:FAILED(NODE 'host1' IS UNREACHABLE)||Section 5.6.1, “Recover a failed Replica”|
|Primary Failure||Use Section 18.104.22.168, “Failing over a Primary” to promote a different Replica|
Section 22.214.171.124, “Manually Failing over a Primary in
|Replica Shunned||slave:SHUNNED(MANUALLY-SHUNNED)||Section 5.6.1, “Recover a failed Replica”|
|No Primary||slave:SHUNNED(SHUNNED)||Section 126.96.36.199, “Recover when there are no Primaries”|