Upgrade/Convert from Multi-Site Active/Active (MSAA) to Composite Active/Active (CAA)
These steps are specifically for the safe and successful upgrade (or conversion) of an existing Multi-Site/Active-Active (MSAA) topology, to a Composite Active/Active (CAA) topology.
It is very important to follow all the below steps and ensure full backups are taken when instructed. These steps can be destructive and without proper care and attention, data loss, data corruption or a split-brain scenario can happen.
Parallel apply MUST be disabled before starting your upgrade. You may re-enable it once the upgrade has been fully completed. See "How to Disable Parallel Replication Safely" and "Enabling Parallel Apply During Install" for more information.
The examples in this section are based on three clusters named 'nyc', 'london' and 'tokyo'
If you do not have exactly three clusters, please adjust this procedure to match your environment.
Supported Upgrade Paths
If you are currently installed using a staging-based installation, you must convert to an INI, since INI based installation is the only option supported for the
Composite Active/Active deployments. For notes on how to perform the staging to INI file conversion using the translatetoini.pl script, please visit
"Converting from Staging to INI".
| Path | Supported |
|---|---|
| ini, in place | Yes |
| ini, with Primary switch | No |
| Staging | No |
| Staging, with --no-connectors | No |
Upgrade Prerequisites
Parallel apply MUST be disabled before starting your upgrade. You may re-enable it once the upgrade has been fully completed. See "How to Disable Parallel Replication Safely" and "Enabling Parallel Apply During Install" for more information.
Obtain the latest Tungsten Cluster software build and place it within
/opt/continuent/softwareIf you are not upgrading, just converting, then this step is not required since you will already have the extracted software bundle available.
Extract the package
The examples below refer to the
tungsten_prep_upgradescript, this can be located in the extracted software package within thetoolsdirectory.
Step 1: Backups
Take a full and complete backup of one node - this can be a Replica, and preferably should be either performed by:
Percona xtrabackup whilst database is open
Manual backup of all datafiles after stopping the database instance
Step 2: Stop the Cross-site Replicators
Typically the cross-site replicators will be installed within /opt/replicator, if you have installed this in a different location you will need to pass
this to the script in the examples using the --path option
The following commands tell the replicators to go offline at a specific point, in this case when they receive an explicit heartbeat. This is to ensure that all the replicators stop at the same sequence number and binary log position. The replicators will NOT be offline until the explicit heartbeat has been issued a bit later in this step.
On every nyc node:
shell> ./tungsten_prep_upgrade -o~or~shell> ./tungsten_prep_upgrade --service london --offlineshell> ./tungsten_prep_upgrade --service tokyo --offlineOn every london node:
shell> ./tungsten_prep_upgrade -o~or~shell> ./tungsten_prep_upgrade --service nyc --offlineshell> ./tungsten_prep_upgrade --service tokyo --offlineOn every tokyo node:
shell> ./tungsten_prep_upgrade -o~or~shell> ./tungsten_prep_upgrade --service london --offlineshell> ./tungsten_prep_upgrade --service tokyo --offline
Next, on the Primary hosts within each cluster we issue the heartbeat, execute the following using the cluster-specific
trepctl, typically in/opt/continuent:shell> trepctl heartbeat -name offline_for_upgEnsure that every cross-site replicator on every node is now in the
OFFLINE:NORMALstate:shell> mmtrepctl status~or~shell> mmtrepctl --service {servicename} statusCapture the position of the cross-site replicators on all nodes in all clusters.
The service name provided should be the name of the remote service(s) for this cluster, so for example in the london cluster you get the positions for nyc and tokyo, and in nyc you get the position for london and tokyo, etc.
On every london node:
shell> ./tungsten_prep_upgrade -g~or~shell> ./tungsten_prep_upgrade --service nyc --get(NOTE: saves to ~/position-nyc-YYYYMMDDHHMMSS.txt)shell> ./tungsten_prep_upgrade --service tokyo --get(NOTE: saves to ~/position-tokyo-YYYYMMDDHHMMSS.txt)On every nyc node:
shell> ./tungsten_prep_upgrade -g~or~shell> ./tungsten_prep_upgrade --service london --get(NOTE: saves to ~/position-london-YYYYMMDDHHMMSS.txt)shell> ./tungsten_prep_upgrade --service tokyo --get(NOTE: saves to ~/position-tokyo-YYYYMMDDHHMMSS.txt)On every tokyo node:
shell> ./tungsten_prep_upgrade -g~or~shell> ./tungsten_prep_upgrade --service london --get(NOTE: saves to ~/position-london-YYYYMMDDHHMMSS.txt)shell> ./tungsten_prep_upgrade --service nyc --get(NOTE: saves to ~/position-nyc-YYYYMMDDHHMMSS.txt)
Finally, to complete this step, stop the replicators on all nodes:
shell> ./tungsten_prep_upgrade --stop
Step 3: Export the tungsten_* Databases
On every node in each cluster, export the tracking schema for the cross-site replicator
Similar to the above step 2 when you captured the cross-site position, the same applies here, in london you export/backup nyc and tokyo, and in nyc you export/backup london and tokyo, and finally in tokyo you export/backup nyc and london.
On every london node:
shell> ./tungsten_prep_upgrade -d --alldb~or~shell> ./tungsten_prep_upgrade --service nyc --dumpshell> ./tungsten_prep_upgrade --service tokyo --dumpOn every nyc node:
shell> ./tungsten_prep_upgrade -d --alldb~or~shell> ./tungsten_prep_upgrade --service london --dumpshell> ./tungsten_prep_upgrade --service tokyo --dumpOn every tokyo node:
shell> ./tungsten_prep_upgrade -d --alldb~or~shell> ./tungsten_prep_upgrade --service london --dumpshell> ./tungsten_prep_upgrade --service nyc --dump
Step 4: Uninstall the Cross-site Replicators
To uninstall the cross-site replicators, execute the following on every node:
shell> cd {replicator software path}
shell> tools/tpm uninstall --i-am-sure
Step 5: Reload the tracking schema
We DO NOT want the reloading of this schema to appear in the binary logs on the Primary, therefore the reload needs to be performed on each node individually:
On every london node:
shell> ./tungsten_prep_upgrade -s nyc -u tungsten -w secret -rshell> ./tungsten_prep_upgrade -s tokyo -u tungsten -w secret -r~or~shell> ./tungsten_prep_upgrade --service nyc --user tungsten --password secret --restoreshell> ./tungsten_prep_upgrade --service tokyo --user tungsten --password secret --restoreOn every tokyo node:
shell> ./tungsten_prep_upgrade -s london -u tungsten -w secret -rshell> ./tungsten_prep_upgrade -s nyc -u tungsten -w secret -r~or~shell> ./tungsten_prep_upgrade --service london --user tungsten --password secret --restoreshell> ./tungsten_prep_upgrade --service nyc --user tungsten --password secret --restoreOn every nyc node:
shell> ./tungsten_prep_upgrade -s london -u tungsten -w secret -rshell> ./tungsten_prep_upgrade -s tokyo -u tungsten -w secret -r~or~shell> ./tungsten_prep_upgrade --service london --user tungsten --password secret --restoreshell> ./tungsten_prep_upgrade --service tokyo --user tungsten --password secret --restore
Step 6: Update Configuration
Update /etc/tungsten/tungsten.ini to a valid CAA configuration. An example of a valid configuration is as follows:
[defaults]
user=tungsten
install-directory=/opt/continuent
profile-script=~/.bash_profile
replication-user=tungsten
replication-password=secret
replication-port=13306
application-user=app_user
application-password=secret
application-port=3306
rest-api-admin-user=apiuser
rest-api-admin-password=secret
connector-rest-api-address=0.0.0.0
manager-rest-api-address=0.0.0.0
replicator-rest-api-address=0.0.0.0
skip-validation-check=THLSchemaChangeCheck
[nyc]
topology=clustered
master=db1
members=db1,db2,db3
connectors=db1,db2,db3
[london]
topology=clustered
master=db4
members=db4,db5,db6
connectors=db4,db5,db6
[tokyo]
topology=clustered
master=db7
members=db8,db8,db9
connectors=db7,db8,db9
[global]
topology=composite-multi-master
composite-datasources=nyc,london,tokyo
Show argument definitions
user=tungstenOS System User, for example tungsten. DO NOT use root.install-directory=/opt/continuentInstallation directory.profile-script=~/.bash_profileAppend commands to include env.sh in this profile script.replication-user=tungstenUser for database connection.replication-password=secretDatabase password.replication-port=13306Database network port.application-user=app_userDatabase username for the connector.application-password=secretDatabase password for the connector.application-port=3306Port for the connector to listen on.rest-api-admin-user=apiuserSpecify the initial Admin Username for API access.Available from v7.0.0rest-api-admin-password=secretSpecify the initial Admin User Password for API access. rest-api-admin-password alias only available from version 7.1.2 onwards.Available from v7.0.0connector-rest-api-address=0.0.0.0Address for the API to bind too.Available from v7.0.0manager-rest-api-address=0.0.0.0Address for the API to bind too.Available from v7.0.0replicator-rest-api-address=0.0.0.0Address for the API to bind too.Available from v7.0.0skip-validation-check=THLSchemaChangeCheckDo not run the specified validation check.It is critical that you ensure the master= entry in the configuration matches the current, live Primary host in your cluster for the purpose of
this process.
Step 7: Enter MAINTENANCE Mode
Enable MAINTENANCE mode on all clusters using the cctrl command:
shell> cctrl
cctrl> set policy maintenance
Step 8: Stop Managers
Stop the manager process on all nodes:
shell> manager stop
Step 9: Install/Update the Software
Run the update as follows:
shell> tools/tpm update --replace-release
If you had start-and-report=false you may need to restart manager services
Until all nodes have been updated, the output from cctrl may show services in an OFFLINE, STOPPED, or UNKNOWN state. This is to be expected until all the new managers are online
Step 10: Start Managers
After the installation is complete on all nodes, start the manager services:
shell> manager start
Step 11: Return to AUTOMATIC Mode
Return all clusters to AUTOMATIC mode using the cctrl command:
shell> cctrl
cctrl> set policy automatic
Step 12: Validate
Identify the cross-site service name(s):
shell> trepctl servicesIn our example, the local cluster service will one of
london,nycortokyodepending on the node you are on. The cross site replication services would be:(within the london cluster)london_from_nyclondon_from_tokyo(within the nyc cluster)nyc_from_londonnyc_from_tokyo(within the tokyo cluster)tokyo_from_londontokyo_from_nycUpon installation, the new cross-site replicators will come online, it is possible that they may be in an
OFFLINE:ERRORstate due to a change in Epoch numbers, check this on the Primary in each cluster by looking at the output from thetrepctlcommand.Check each service as needed based on the status seen above:
shell> trepctl -service london_from_nyc statusshell> trepctl -service london_from_tokyo status~or~shell> trepctl -service nyc_from_london statusshell> trepctl -service nyc_from_tokyo status~or~shell> trepctl -service tokyo_from_london statusshell> trepctl -service tokyo_from_nyc statusIf the replicator is in an error state due to an epoch difference, you will see an error similar to the following:
pendingErrorSeqno : -1pendingExceptionMessage: Client handshake failure: Client responsevalidation failed: Log epoch numbers do not match: master sourceID=db1 client source ID=db4 seqno=4 server epoch number=0 clientepoch number=4pipelineSource : UNKNOWNThe above error is due to the epoch numbers changing as a result of the replicators being restarted, and the new replicators being installed.
To resolve, simply force the replicator online as follows:
shell> trepctl -service london_from_nyc online -forceshell> trepctl -service london_from_tokyo online -force~or~shell> trepctl -service nyc_from_london online -forceshell> trepctl -service nyc_from_tokyo online -force~or~shell> trepctl -service tokyo_from_london online -forceshell> trepctl -service tokyo_from_nyc online -forceIf the replicator shows an error state similar to the following:
pendingErrorSeqno : -1pendingExceptionMessage: Client handshake failure: Client responsevalidation failed: Master log does not contain requestedtransaction: master source ID=db1 client source ID=db2 requestedseqno=1237 client epoch number=0 master min seqno=5 master maxseqno=7pipelineSource : UNKNOWNThe above error is possible if during install the Replica replicators came online before the Primary.
Providing the steps above have been followed, just bringing the replicator online should be enough to get the replicator to retry and carry on successfully:
shell> trepctl -service london_from_nyc onlineshell> trepctl -service london_from_tokyo online~or~shell> trepctl -service nyc_from_london onlineshell> trepctl -service nyc_from_tokyo online~or~shell> trepctl -service tokyo_from_london onlineshell> trepctl -service tokyo_from_nyc online
During an upgrade, the tpm process will incorrectly create additional, empty, tracking schemas based on the service names of the auto-generated cross-site services.
For example, if your cluster has service names east and west, you should only have tracking schemas for tungsten_east and tungsten_west
In some cases, you will also see tungsten_east_from_west and/or tungsten_west_from_east
These tungsten_x_from_y tracking schemas will be empty and unused. They can be safely removed by issuing DROP DATABASE tungsten_x_from_y on
a Primary node, or they can be safely ignored