Automatic recovery enables the replicator to go back
ONLINE in the event of a transient
failure that is triggered during either the
GOING-ONLINE:SYNCHRONIZING state that
would otherwise trigger a change of states to
OFFLINE. For example, connection
failures, or restarts in the MySQL service, trigger the replicator to go
OFFLINE. With autorecovery enabled,
the replicator will attempt to put the replicator
ONLINE again to keep the service
running. Failures outside of these states will not trigger autorecovery.
Autorecovery operates by scheduling an attempt to go back online after a transient failure. If autorecovery is enabled, the process works as follows:
If a failure is identified, the replicator attempts to go back online after a specified delay. The delay allows the replicator time to decide whether autorecovery should be attempted. For example, if the MySQL server restarts, the delay gives time for the MySQL server to come back online before the replicator goes back online.
Recovery is attempted a configurable number of times. This presents the
replicator from continually attempting to go online within a service
that has a more serious failure. If the replicator fails to go
ONLINE within the configurable
reset interval, then the replicator will go to the
If the replicator remains in the
ONLINE state for a configurable
period of time, then the automatic recovery is deemed to have succeeded.
If the autorecovery fails, then the autorecovery attempts counter is
incremented by one.
The configurable parameters are set using tpm within the static properties for the replicator:
Sets the maximum number of attempts to automatically recovery from any single failure trigger. This prevents the autorecovery mechanism continually attempting autorecover. The current number of attempts is reset if the replicator remains online for the configured reset period.
The delay between entering the
OFFLINE state, and attempting
autorecovery. On servers that are busy, use some form of network or HA
solution, or have high MySQL restart/startup times, this value should be
configured accordingly to give the underlying services time to startup
again after failure.
The duration after a successful autorecovery has been completed that the
replicator must remain in the
ONLINE state for the recovery
process to be deemed to have succeeded. The number of attempts for
autorecovery is reset to 0 (zero) if the replicator stays up for this
period of time.
Auto recovery is enabled only when the
--auto-recovery-max-attempts parameter is
set to a non-zero value.
tpm update alpha --auto-recovery-max-attempts=5
trepctl statusProcessing status command... NAME VALUE ---- ----- ... autoRecoveryEnabled : false autoRecoveryTotal : 0 ...
The above output indicates that the autorecovery service is disabled. The
autoRecoveryTotal is a count of the number of times
the autorecovery has been completed since the replicator has started.