5.6. Datasource Recovery Steps

When a datasource within the dataservice fails, the exact response by the dataservice is dependent on the dataservice policy mode. Different policy modes either cope with the failure or recovery process automatically, or a prescribed sequence must be followed.

Recovery can normally be achieved by following these basic steps:

  • Use the recover command

    The recover command performs a number of steps to try and return the dataservice to the operational state, but works only if there is an existing master within the current configuration. Operations conducted automatically include slave recovery, and reconfiguring roles. For example:

    [LOGICAL] /alpha > recover
    FOUND PHYSICAL DATASOURCE TO RECOVER: 'host2@alpha'
    VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host2'
    DATA SERVER 'host2' IS NOW AVAILABLE FOR CONNECTIONS
    RECOVERING 'host2@alpha' TO A SLAVE USING 'host3@alpha' AS THE MASTER
    DataSource 'host2' is now OFFLINE
    RECOVERY OF DATA SERVICE 'alpha' SUCCEEDED
    FOUND PHYSICAL DATASOURCE TO RECOVER: 'host1@alpha'
    VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host1'
    DATA SERVER 'host1' IS NOW AVAILABLE FOR CONNECTIONS
    RECOVERING 'host1@alpha' TO A SLAVE USING 'host3@alpha' AS THE MASTER
    DataSource 'host1' is now OFFLINE
    RECOVERY OF DATA SERVICE 'alpha' SUCCEEDED
    RECOVERED 2 DATA SOURCES IN SERVICE 'alpha'
  • Slave failure, Master still available

    Use the recover to bring all slaves back into operation. To bring a single slave, use the datasource recover :

    [LOGICAL:EXPERT] /alpha > datasource host1 recover
    VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host1'
    DATA SERVER 'host1' IS NOW AVAILABLE FOR CONNECTIONS
    RECOVERING 'host1@alpha' TO A SLAVE USING 'host2@alpha' AS THE MASTER
    RECOVERY OF 'host1@alpha' WAS SUCCESSFUL

    If recovery of the slave fails with this method, you can try more advanced solutions for getting your slave(s) working, including reprovisioning from another slave.

    For more info, see Section 5.6.1, “Recover a failed slave” .

  • Master failure

    If the most up to date master can be identified, use the recover using command to set the new master and recover the remaining slaves. If this does not work, use the set master command and then use the recover command to bring back as many possible slaves, and then use a backup/restore operation to bring any other slaves back into operation, or use the tungsten_provision_slave command. For more information, see Section 5.6.2, “Recover a failed master” .

A summary of these different scenarios and steps is provided in the following table:

Policy Mode Scenario Datasource State Resolution
AUTOMATIC    
  Master Failure   Automatic
  Master Recovery master:SHUNNED(FAILED-OVER-TO-host2) Section 5.6.2, “Recover a failed master”
  Slave Failure   Automatic
  Slave Recovery   Automatic
MANUAL    
  Master Failure master:FAILED(NODE 'host1' IS UNREACHABLE)) Section 5.6.2.4, “Failing over a master”
  Master Recovery master:SHUNNED(FAILED-OVER-TO-host2) Section 5.6.2.2, “Recover a shunned master”
  Slave Failure slave:FAILED(NODE 'host1' IS UNREACHABLE) Automatically removed from service
  Slave Recovery slave:FAILED(NODE 'host1' IS UNREACHABLE) Section 5.6.1, “Recover a failed slave”
MAINTENANCE    
  Master Failure   Use Section 5.6.2.4, “Failing over a master” to promote a different slave
  Master Recovery   Section 5.6.2.3, “Manually Failing over a Master in MAINTENANCE policy mode”
  Slave Failure   N/A
  Slave Recovery   N/A
Any    
  Slave Shunned slave:SHUNNED(MANUALLY-SHUNNED) Section 5.6.1, “Recover a failed slave”
  No Master slave:SHUNNED(SHUNNED) Section 5.6.2.1, “Recover when there are no masters”