Performing Maintenance on an Entire Dataservice
To perform maintenance on all of the machines within a dataservice, a rolling sequence of maintenance must be performed carefully on each machine in a structured way. In brief, the sequence is as follows
- Perform maintenance on each of the current Replicas
- Switch the Primary to one of the already maintained Replicas
- Perform maintenance on the old Primary (now in Replica state)
- Switch the old Primary back to be the Primary again
The "Rolling Maintenance" procedure outlined here should NOT be used when upgrading Tungsten Software.
In most cases the switch will not work due to differences within the manager communications and this could cause unexpected outages.
See "Upgrading Tungsten Cluster" for more details on upgrading Tungsten software.
A more detailed sequence of steps, including the status of each datasource in the dataservice, and the commands to be performed, is shown in the table below. The table assumes a three-node dataservice (one Primary, two Replicas), but the same principles can be applied to any Primary/Replica dataservice:
| Step | Description | Command | host1 | host2 | host3 |
|---|---|---|---|---|---|
| 1 | Initial state | Primary | Replica | Replica | |
| 2 | Set MAINTENANCE policy | set policy maintenance | Primary | Replica | Replica |
| 3 | Shun Replica host2 | datasource host2 shun | Primary | Shunned | Replica |
| 4 | Perform maintenance | Primary | Shunned | Replica | |
| 5 | Recover the Replica host2 back | datasource host2 recover | Primary | Replica | Replica |
| 6 | Ensure the Replica (host2) has caught up | Primary | Replica | Replica | |
| 7 | Shun Replica host3 | datasource host3 shun | Primary | Replica | Shunned |
| 8 | Perform maintenance | Primary | Replica | Shunned | |
| 9 | Recover Replica host3 back | datasource host3 recover | Primary | Replica | Replica |
| 10 | Ensure the Replica (host3) has caught up | Primary | Replica | Replica | |
| 11 | Switch Primary to host2 | switch to host2 | Replica | Primary | Replica |
| 12 | Shun host1 | datasource host1 shun | Shunned | Primary | Replica |
| 13 | Perform maintenance | Shunned | Primary | Replica | |
| 14 | Recover the Replica host1 back | datasource host1 recover | Replica | Primary | Replica |
| 15 | Ensure the Replica (host1) has caught up | Replica | Primary | Replica | |
| 16 | Switch Primary back to host1 (Optional) | switch to host1 | Primary | Replica | Replica |
| 17 | Set AUTOMATIC policy | set policy automatic | Primary | Replica | Replica |