When a datasource within the dataservice fails, the exact response by the dataservice is dependent on the dataservice policy mode. Different policy modes either cope with the failure or recovery process automatically, or a prescribed sequence must be followed.
Recovery can normally be achieved by following these basic steps:
Use the recover command
The recover command performs a number of steps to try and return the dataservice to the operational state, but works only if there is an existing master within the current configuration. Operations conducted automatically include slave recovery, and reconfiguring roles. For example:
[LOGICAL] /alpha >
recoverFOUND PHYSICAL DATASOURCE TO RECOVER: 'host2@alpha' VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host2' DATA SERVER 'host2' IS NOW AVAILABLE FOR CONNECTIONS RECOVERING 'host2@alpha' TO A SLAVE USING 'host3@alpha' AS THE MASTER DataSource 'host2' is now OFFLINE RECOVERY OF DATA SERVICE 'alpha' SUCCEEDED FOUND PHYSICAL DATASOURCE TO RECOVER: 'host1@alpha' VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host1' DATA SERVER 'host1' IS NOW AVAILABLE FOR CONNECTIONS RECOVERING 'host1@alpha' TO A SLAVE USING 'host3@alpha' AS THE MASTER DataSource 'host1' is now OFFLINE RECOVERY OF DATA SERVICE 'alpha' SUCCEEDED RECOVERED 2 DATA SOURCES IN SERVICE 'alpha'
Slave failure, Master still available
[LOGICAL:EXPERT] /alpha >
datasource host1 recoverVERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host1' DATA SERVER 'host1' IS NOW AVAILABLE FOR CONNECTIONS RECOVERING 'host1@alpha' TO A SLAVE USING 'host2@alpha' AS THE MASTER RECOVERY OF 'host1@alpha' WAS SUCCESSFUL
If recovery of the slave fails with this method, you can try more advanced solutions for getting your slave(s) working, including reprovisioning from another slave.
For more info, see Section 5.6.1, “Recover a failed slave” .
If the most up to date master can be identified, use the recover using command to set the new master and recover the remaining slaves. If this does not work, use the set master command and then use the recover command to bring back as many possible slaves, and then use a backup/restore operation to bring any other slaves back into operation, or use the tungsten_provision_slave command. For more information, see Section 5.6.2, “Recover a failed master” .
A summary of these different scenarios and steps is provided in the following table:
|Policy Mode||Scenario||Datasource State||Resolution|
|Master Recovery||master:SHUNNED(FAILED-OVER-TO-host2)||Section 5.6.2, “Recover a failed master”|
|Master Failure||master:FAILED(NODE 'host1' IS UNREACHABLE))||Section 184.108.40.206, “Failing over a master”|
|Master Recovery||master:SHUNNED(FAILED-OVER-TO-host2)||Section 220.127.116.11, “Recover a shunned master”|
|Slave Failure||slave:FAILED(NODE 'host1' IS UNREACHABLE)||Automatically removed from service|
|Slave Recovery||slave:FAILED(NODE 'host1' IS UNREACHABLE)||Section 5.6.1, “Recover a failed slave”|
|Master Failure||Use Section 18.104.22.168, “Failing over a master” to promote a different slave|
Section 22.214.171.124, “Manually Failing over a Master in
|Slave Shunned||slave:SHUNNED(MANUALLY-SHUNNED)||Section 5.6.1, “Recover a failed slave”|
|No Master||slave:SHUNNED(SHUNNED)||Section 126.96.36.199, “Recover when there are no masters”|