5.6. Datasource Recovery Steps

When a datasource within the dataservice fails, the exact response by the dataservice is dependent on the dataservice policy mode. Different policy modes either cope with the failure or recovery process automatically, or a prescribed sequence must be followed.

Recovery can normally be achieved by following these basic steps:

  • Use the recover command

    The recover command performs a number of steps to try and return the dataservice to the operational state, but works only if there is an existing Primary within the current configuration. Operations conducted automatically include Replica recovery, and reconfiguring roles. For example:

    [LOGICAL] /alpha > recover
    FOUND PHYSICAL DATASOURCE TO RECOVER: 'host2@alpha'
    VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host2'
    DATA SERVER 'host2' IS NOW AVAILABLE FOR CONNECTIONS
    RECOVERING 'host2@alpha' TO A SLAVE USING 'host3@alpha' AS THE MASTER
    DataSource 'host2' is now OFFLINE
    RECOVERY OF DATA SERVICE 'alpha' SUCCEEDED
    FOUND PHYSICAL DATASOURCE TO RECOVER: 'host1@alpha'
    VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host1'
    DATA SERVER 'host1' IS NOW AVAILABLE FOR CONNECTIONS
    RECOVERING 'host1@alpha' TO A SLAVE USING 'host3@alpha' AS THE MASTER
    DataSource 'host1' is now OFFLINE
    RECOVERY OF DATA SERVICE 'alpha' SUCCEEDED
    RECOVERED 2 DATA SOURCES IN SERVICE 'alpha'
  • Replica failure, Primary still available

    Use the recover to bring all Replicas back into operation. To bring a single Replica, use the datasource recover :

    [LOGICAL:EXPERT] /alpha > datasource host1 recover
    VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host1'
    DATA SERVER 'host1' IS NOW AVAILABLE FOR CONNECTIONS
    RECOVERING 'host1@alpha' TO A SLAVE USING 'host2@alpha' AS THE MASTER
    RECOVERY OF 'host1@alpha' WAS SUCCESSFUL

    If recovery of the Replica fails with this method, you can try more advanced solutions for getting your Replica(s) working, including reprovisioning from another Replica.

    For more info, see Section 5.6.1, “Recover a failed Replica” .

  • Primary failure

    If the most up to date Primary can be identified, use the recover using command to set the new Primary and recover the remaining Replicas. If this does not work, use the set Master command and then use the recover command to bring back as many possible Replicas, and then use a backup/restore operation to bring any other Replicas back into operation, or use the tungsten_provision_slave command. For more information, see Section 5.6.2, “Recover a failed Primary” .

A summary of these different scenarios and steps is provided in the following table:

Policy Mode Scenario Datasource State Resolution
AUTOMATIC    
  Primary Failure   Automatic
  Primary Recovery Master:SHUNNED(FAILED-OVER-TO-host2) Section 5.6.2, “Recover a failed Primary”
  Replica Failure   Automatic
  Replica Recovery   Automatic
MANUAL    
  Primary Failure Master:FAILED(NODE 'host1' IS UNREACHABLE)) Section 5.6.2.4, “Failing over a Primary”
  Primary Recovery Master:SHUNNED(FAILED-OVER-TO-host2) Section 5.6.2.2, “Recover a shunned Primary”
  Replica Failure slave:FAILED(NODE 'host1' IS UNREACHABLE) Automatically removed from service
  Replica Recovery slave:FAILED(NODE 'host1' IS UNREACHABLE) Section 5.6.1, “Recover a failed Replica”
MAINTENANCE    
  Primary Failure   Use Section 5.6.2.4, “Failing over a Primary” to promote a different Replica
  Primary Recovery   Section 5.6.2.3, “Manually Failing over a Primary in MAINTENANCE policy mode”
  Replica Failure   N/A
  Replica Recovery   N/A
Any    
  Replica Shunned slave:SHUNNED(MANUALLY-SHUNNED) Section 5.6.1, “Recover a failed Replica”
  No Primary slave:SHUNNED(SHUNNED) Section 5.6.2.1, “Recover when there are no Primaries”