Skip to main content
Tungsten Clustering

Performing Maintenance on an Entire Dataservice

To perform maintenance on all of the machines within a dataservice, a rolling sequence of maintenance must be performed carefully on each machine in a structured way. In brief, the sequence is as follows

  1. Perform maintenance on each of the current Replicas
  2. Switch the Primary to one of the already maintained Replicas
  3. Perform maintenance on the old Primary (now in Replica state)
  4. Switch the old Primary back to be the Primary again
warning

The "Rolling Maintenance" procedure outlined here should NOT be used when upgrading Tungsten Software.

In most cases the switch will not work due to differences within the manager communications and this could cause unexpected outages.

See "Upgrading Tungsten Cluster" for more details on upgrading Tungsten software.

A more detailed sequence of steps, including the status of each datasource in the dataservice, and the commands to be performed, is shown in the table below. The table assumes a three-node dataservice (one Primary, two Replicas), but the same principles can be applied to any Primary/Replica dataservice:

StepDescriptionCommandhost1host2host3
1Initial statePrimaryReplicaReplica
2Set MAINTENANCE policyset policy maintenancePrimaryReplicaReplica
3Shun Replica host2datasource host2 shunPrimaryShunnedReplica
4Perform maintenancePrimaryShunnedReplica
5Recover the Replica host2 backdatasource host2 recoverPrimaryReplicaReplica
6Ensure the Replica (host2) has caught upPrimaryReplicaReplica
7Shun Replica host3datasource host3 shunPrimaryReplicaShunned
8Perform maintenancePrimaryReplicaShunned
9Recover Replica host3 backdatasource host3 recoverPrimaryReplicaReplica
10Ensure the Replica (host3) has caught upPrimaryReplicaReplica
11Switch Primary to host2switch to host2ReplicaPrimaryReplica
12Shun host1datasource host1 shunShunnedPrimaryReplica
13Perform maintenanceShunnedPrimaryReplica
14Recover the Replica host1 backdatasource host1 recoverReplicaPrimaryReplica
15Ensure the Replica (host1) has caught upReplicaPrimaryReplica
16Switch Primary back to host1 (Optional)switch to host1PrimaryReplicaReplica
17Set AUTOMATIC policyset policy automaticPrimaryReplicaReplica