All datasources will be in one of a number of states that indicate their current operational status.
ONLINE State
A datasource in the ONLINE state
is considered to be operating normally, with replication, connector and
other traffic being handled as normal.
SHUNNED State
A SHUNNED datasource implies that
the datasource is OFFLINE. Unlike
the OFFLINE state, a
SHUNNED datasource is not
automatically recovered.
A datasource in a SHUNNED state is
not connected or actively part of the dataservice. Individual services
can be reconfigured and restarted. The operating system and any other
maintenance to be performed can be carried out while a host is in the
SHUNNED state without affecting
the other members of the dataservice.
Datasources can be manually or automatically shunned. The current reason
for the SHUNNED state is indicated
in the status output. For example, in the sample below, the node
host3 was manually shunned for
maintenance reasons:
... +----------------------------------------------------------------------------+ |host3(slave:SHUNNED(MANUALLY-SHUNNED), progress=157454, latency=1.000) | |STATUS [SHUNNED] [2013/05/14 05:12:52 PM BST] | ...
OFFLINE State
A datasource in the OFFLINE does
not accept connections through the connector for either reads or writes.
When the dataservice is in the
AUTOMATIC policy mode, a
datasource in the OFFLINE state is
automatically recovered and placed into the
ONLINE state. If this operation
fails, the datasource remains in the
OFFLINE state.
When the dataservice is in
MAINTENANCE or
MANUAL policy mode, the
datasource will remain in the
OFFLINE state until the datasource
is explicitly switched to the
ONLINE state.
FAILED State
When a datasource fails, for example when a failure in one of the
services for the datasource stops responding or fails, the datasource
will be placed into the FAILED
state. In the example below, the underlying dataserver has failed:
+----------------------------------------------------------------------------+ |host3(slave:FAILED(DATASERVER 'host3@alpha' STOPPED), | |progress=154146, latency=31.419) | |STATUS [CRITICAL] [2013/05/10 11:51:42 PM BST] | |REASON[DATASERVER 'host3@alpha' STOPPED] | +----------------------------------------------------------------------------+ | MANAGER(state=ONLINE) | | REPLICATOR(role=slave, master=host1, state=ONLINE) | | DATASERVER(state=STOPPED) | | CONNECTIONS(created=208, active=0) | +----------------------------------------------------------------------------+
For a FAILED datasource, the
recover command within
cctrl can be used to attempt to recover the
datasource to the operational state. If this fails, the underlying fault
must be identified and addressed before the datasource is recovered.