When the dataservice policy mode is
AUTOMATIC
, the dataservice
will automatically failover the Primary host when the existing Primary is
identified as having failed or become unavailable.
For example, when the Primary host
host1
becomes unavailable because
of a network problem, the dataservice automatically switches to
host2
. The dataservice status is
updated accordingly, showing the automatically shunned
host1
:
[LOGICAL:EXPERT] /alpha > ls
COORDINATOR[host3:AUTOMATIC:ONLINE]
ROUTERS:
+----------------------------------------------------------------------------+
|connector@host2[28116](ONLINE, created=0, active=0) |
|connector@host3[1533](ONLINE, created=0, active=0) |
+----------------------------------------------------------------------------+
DATASOURCES:
+----------------------------------------------------------------------------+
|host1(Master:SHUNNED(FAILED-OVER-TO-host2)) |
|STATUS [SHUNNED] [2013/05/14 12:18:54 PM BST] |
+----------------------------------------------------------------------------+
| MANAGER(state=STOPPED) |
| REPLICATOR(state=STATUS NOT AVAILABLE) |
| DATASERVER(state=ONLINE) |
| CONNECTIONS(created=0, active=0) |
+----------------------------------------------------------------------------+
+----------------------------------------------------------------------------+
|host2(Master:ONLINE, progress=156325, THL latency=0.606) |
|STATUS [OK] [2013/05/14 12:46:55 PM BST] |
+----------------------------------------------------------------------------+
| MANAGER(state=ONLINE) |
| REPLICATOR(role=Master, state=ONLINE) |
| DATASERVER(state=ONLINE) |
| CONNECTIONS(created=0, active=0) |
+----------------------------------------------------------------------------+
The status for the original Primary
(host1
) identifies the datasource
as shunned, and indicates which datasource was promoted to the Primary in
the FAILED-OVER-TO-host2
.
A automatic failover can be triggered by using the datasource fail command:
[LOGICAL:EXPERT] /alpha > datasource host1 fail
This triggers the automatic failover sequence, and simulates what would happen if the specified host failed.
If host1
becomes available again,
the datasource is not automatically added back to the dataservice, but
must be explicitly re-added to the dataservice. The status of the
dataservice once host1
returns is
shown below:
[LOGICAL:EXPERT] /alpha > ls
COORDINATOR[host3:AUTOMATIC:ONLINE]
ROUTERS:
+----------------------------------------------------------------------------+
|connector@host1[19869](ONLINE, created=0, active=0) |
|connector@host2[28116](ONLINE, created=0, active=0) |
|connector@host3[1533](ONLINE, created=0, active=0) |
+----------------------------------------------------------------------------+
DATASOURCES:
+----------------------------------------------------------------------------+
|host1(Master:SHUNNED(FAILED-OVER-TO-host2), progress=156323, THL |
|latency=0.317) |
|STATUS [SHUNNED] [2013/05/14 12:30:21 PM BST] |
+----------------------------------------------------------------------------+
| MANAGER(state=ONLINE) |
| REPLICATOR(role=Master, state=ONLINE) |
| DATASERVER(state=ONLINE) |
| CONNECTIONS(created=0, active=0) |
+----------------------------------------------------------------------------+
Because host1
was previously the
Primary, the datasource recover
command verifies that the server is available, configures the node as a
Replica of the newly promoted Primary, and re-enables the services:
[LOGICAL:EXPERT] /alpha > datasource host1 recover
VERIFYING THAT WE CAN CONNECT TO DATA SERVER 'host1'
DATA SERVER 'host1' IS NOW AVAILABLE FOR CONNECTIONS
RECOVERING 'host1@alpha' TO A SLAVE USING 'host2@alpha' AS THE MASTER
SETTING THE ROLE OF DATASOURCE 'host1@alpha' FROM 'Master' TO 'slave'
RECOVERY OF 'host1@alpha' WAS SUCCESSFUL
If the command is successful, then the node should be up and running as a Replica of the new Primary.
The recovery process can fail if the THL data and dataserver contents do not match, for example when statements have been executed on a Replica. For information on recovering from failures that recover cannot fix, see Section 6.6.1.3, “Replica Datasource Extended Recovery” .