All datasources will be in one of a number of states that indicate their current operational status.
ONLINE
State
A datasource in the ONLINE
state
is considered to be operating normally, with replication, connector and
other traffic being handled as normal.
SHUNNED
State
A SHUNNED
datasource implies that
the datasource is OFFLINE
. Unlike
the OFFLINE
state, a
SHUNNED
datasource is not
automatically recovered.
A datasource in a SHUNNED
state is
not connected or actively part of the dataservice. Individual services
can be reconfigured and restarted. The operating system and any other
maintenance to be performed can be carried out while a host is in the
SHUNNED
state without affecting
the other members of the dataservice.
Datasources can be manually or automatically shunned. The current reason
for the SHUNNED
state is indicated
in the status output. For example, in the sample below, the node
host3
was manually shunned for
maintenance reasons:
... +----------------------------------------------------------------------------+ |host3(slave:SHUNNED(MANUALLY-SHUNNED), progress=157454, latency=1.000) | |STATUS [SHUNNED] [2013/05/14 05:12:52 PM BST] | ...
OFFLINE
State
A datasource in the OFFLINE
does
not accept connections through the connector for either reads or writes.
When the dataservice is in the
AUTOMATIC
policy mode, a
datasource in the OFFLINE
state is
automatically recovered and placed into the
ONLINE
state. If this operation
fails, the datasource remains in the
OFFLINE
state.
When the dataservice is in
MAINTENANCE
or
MANUAL
policy mode, the
datasource will remain in the
OFFLINE
state until the datasource
is explicitly switched to the
ONLINE
state.
FAILED
State
When a datasource fails, for example when a failure in one of the
services for the datasource stops responding or fails, the datasource
will be placed into the FAILED
state. In the example below, the underlying dataserver has failed:
+----------------------------------------------------------------------------+ |host3(slave:FAILED(DATASERVER 'host3@alpha' STOPPED), | |progress=154146, latency=31.419) | |STATUS [CRITICAL] [2013/05/10 11:51:42 PM BST] | |REASON[DATASERVER 'host3@alpha' STOPPED] | +----------------------------------------------------------------------------+ | MANAGER(state=ONLINE) | | REPLICATOR(role=slave, master=host1, state=ONLINE) | | DATASERVER(state=STOPPED) | | CONNECTIONS(created=208, active=0) | +----------------------------------------------------------------------------+
For a FAILED
datasource, the
recover command within
cctrl can be used to attempt to recover the
datasource to the operational state. If this fails, the underlying fault
must be identified and addressed before the datasource is recovered.