When a datasource within the dataservice fails, the exact response by the dataservice is dependent on the dataservice policy mode. Different policy modes either cope with the failure or recovery process automatically, or a prescribed sequence must be followed.
Recovery can normally be achieved by following these basic steps:
Use the recover command
The recover command performs a number of steps to try and return the dataservice to the operational state, but works only if there is an existing Primary within the current configuration. Operations conducted automatically include Replica recovery, and reconfiguring roles. For example:
[LOGICAL] /alpha > recover
FOUND PHYSICAL DATASOURCE TO RECOVER: 'host2@alpha'
VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host2'
DATA SERVER 'host2' IS NOW AVAILABLE FOR CONNECTIONS
RECOVERING 'host2@alpha' TO A SLAVE USING 'host3@alpha' AS THE MASTER
DataSource 'host2' is now OFFLINE
RECOVERY OF DATA SERVICE 'alpha' SUCCEEDED
FOUND PHYSICAL DATASOURCE TO RECOVER: 'host1@alpha'
VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host1'
DATA SERVER 'host1' IS NOW AVAILABLE FOR CONNECTIONS
RECOVERING 'host1@alpha' TO A SLAVE USING 'host3@alpha' AS THE MASTER
DataSource 'host1' is now OFFLINE
RECOVERY OF DATA SERVICE 'alpha' SUCCEEDED
RECOVERED 2 DATA SOURCES IN SERVICE 'alpha'
Replica failure, Primary still available
Use the recover to bring all Replicas back into operation. To bring a single Replica, use the datasource recover :
[LOGICAL:EXPERT] /alpha > datasource host1 recover
VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host1'
DATA SERVER 'host1' IS NOW AVAILABLE FOR CONNECTIONS
RECOVERING 'host1@alpha' TO A SLAVE USING 'host2@alpha' AS THE MASTER
RECOVERY OF 'host1@alpha' WAS SUCCESSFUL
If recovery of the Replica fails with this method, you can try more advanced solutions for getting your Replica(s) working, including reprovisioning from another Replica.
For more info, see Section 6.6.1, “Recover a failed Replica” .
Primary failure
If the most up to date Primary can be identified, use the recover using command to set the new Primary and recover the remaining Replicas. If this does not work, use the set Master command and then use the recover command to bring back as many possible Replicas, and then use a backup/restore operation to bring any other Replicas back into operation, or use the tungsten_provision_slave command. For more information, see Section 6.6.2, “Recover a failed Primary” .
A summary of these different scenarios and steps is provided in the following table:
Policy Mode | Scenario | Datasource State | Resolution |
---|---|---|---|
AUTOMATIC
| |||
Primary Failure | Automatic | ||
Primary Recovery | Master:SHUNNED(FAILED-OVER-TO-host2) | Section 6.6.2, “Recover a failed Primary” | |
Replica Failure | Automatic | ||
Replica Recovery | Automatic | ||
MANUAL
| |||
Primary Failure | Master:FAILED(NODE 'host1' IS UNREACHABLE)) | Section 6.6.2.4, “Failing over a Primary” | |
Primary Recovery | Master:SHUNNED(FAILED-OVER-TO-host2) | Section 6.6.2.2, “Recover a shunned Primary” | |
Replica Failure | slave:FAILED(NODE 'host1' IS UNREACHABLE) | Automatically removed from service | |
Replica Recovery | slave:FAILED(NODE 'host1' IS UNREACHABLE) | Section 6.6.1, “Recover a failed Replica” | |
MAINTENANCE
| |||
Primary Failure | Use Section 6.6.2.4, “Failing over a Primary” to promote a different Replica | ||
Primary Recovery |
Section 6.6.2.3, “Manually Failing over a Primary in
MAINTENANCE policy mode”
| ||
Replica Failure | N/A | ||
Replica Recovery | N/A | ||
Any
| |||
Replica Shunned | slave:SHUNNED(MANUALLY-SHUNNED) | Section 6.6.1, “Recover a failed Replica” | |
No Primary | slave:SHUNNED(SHUNNED) | Section 6.6.2.1, “Recover when there are no Primaries” |