To perform maintenance on all of the machines within a dataservice, a rolling sequence of maintenance must be performed carefully on each machine in a structured way. In brief, the sequence is as follows
Perform maintenance on each of the current Replicas
Switch the Primary to one of the already maintained Replicas
Perform maintenance on the old Primary (now in Replica state)
Switch the old Primary back to be the Primary again
A more detailed sequence of steps, including the status of each datasource in the dataservice, and the commands to be performed, is shown in the table below. The table assumes a three-node dataservice (one Primary, two Replicas), but the same principles can be applied to any Primary/Replica dataservice:
Step | Description | Command | host1 | host2 | host3 |
---|---|---|---|---|---|
1 | Initial state | Primary | Replica | Replica | |
2 | Set maintenance policy | set policy maintenance | Primary | Replica | Replica |
3 |
Shun Replica host2
| datasource host2 shun | Primary | Shunned | Replica |
4 | Perform maintenance | Primary | Shunned | Replica | |
5 |
Validate the host2 server
configuration
| tpm validate | Primary | Shunned | Replica |
6 |
Recover the Replica host2
back
| datasource host2 recover | Primary | Replica | Replica |
7 |
Ensure the Replica ( host2 )
has caught up
| Primary | Replica | Replica | |
8 |
Shun Replica host3
| datasource host3 shun | Primary | Replica | Shunned |
9 | Perform maintenance | Primary | Replica | Shunned | |
10 |
Validate the host3 server
configuration
| tpm validate | Primary | Replica | Shunned |
11 |
Recover Replica host3 back
| datasource host3 recover | Primary | Replica | Replica |
12 |
Ensure the Replica ( host3 )
has caught up
| Primary | Replica | Replica | |
13 |
Switch Primary to host2
| switch to host2 | Replica | Primary | Replica |
14 |
Shun host1
| datasource host1 shun | Shunned | Primary | Replica |
15 | Perform maintenance | Shunned | Primary | Replica | |
16 |
Validate the host1 server
configuration
| tpm validate | Shunned | Primary | Replica |
17 |
Recover the Replica host1
back
| datasource host1 recover | Replica | Primary | Replica |
18 |
Ensure the Replica ( host1 )
has caught up
| Primary | Replica | Replica | |
19 |
Switch Primary back to
host1
| switch to host1 | Primary | Replica | Replica |
20 | Set automatic policy | set policy automatic | Primary | Replica | Replica |