3.2.3. Upgrade: From v4 or v5 to v6 Composite Multimaster Clusters

These steps are specifically for the safe and successful upgrade of an existing v4.0.7 or later, and v5.0.0 MSMM topology, to the new Multimaster topology.

Warning

It is very important to follow all the below steps and ensure full backups are taken when instructed. These steps can be destructive and without proper care and attention, data loss, data corruption or a split-brain scenario can happen.

Note

The examples in this section are based on three clusters named 'nyc', 'london' and 'tokyo'

If you do not have exactly three clusters, please adjust this procedure to match your environment.

A video of the upgrade progress, showing the full process from start to finish, is available:

Direct link video.

If you are currently installed using a staging based installation, you must move to an INI based installed, since INI based installation is the only option support for the composite multi master deployments. For notes on how to perform the staging to INI file conversion using the translatetoini.pl script. A training video is available for that process:

Direct link video.

3.2.3.1. Supported Upgrade Paths

The only method to upgrade an existing MSMM installation is via the ini method, therefore customers currently deployed via Staging will need to convert to an ini deployment first.

Path Supported
ini, in place Yes
ini, with master switch No
Staging No
Staging, with --no-connectors No

3.2.3.2. Upgrade Prerequisites

  • Obtain the latest v6 Tungsten Clustering software build and place it within /opt/continuent/software

  • Extract the package

  • The examples below refer to the tungsten_prep_upgrade script, this can be located in the extracted software package within the tools directory.

3.2.3.3. Upgrade Step 1: Backups

Take a full and complete backup of one node - this can be a slave, and preferably should be either performed by:

  • Percona xtrabackup whilst database is open

  • Manual backup of all datafiles after stopping the database instance

3.2.3.4. Upgrade Step 2: Stop the Cross-site Replicators

Important

Typically the cross-site replicators will be installed within /opt/replicator, if you have installed this in a different location you will need to pass this to the script in the examples using the --path option

  1. The following commands tell the replicators to go offline at a specific point, in this case when they receive an explicit heartbeat. This is to ensure that all the replicators stop at the same sequence number and binary log position. The replicators will NOT be offline until the explicit heartbeat has been issued a bit later in this step.

    • On every nyc node:

      shell> ./tungsten_prep_upgrade -o 
      ~or~
      shell> ./tungsten_prep_upgrade --service london --offline
      shell> ./tungsten_prep_upgrade --service tokyo --offline
    • On every london node:

      shell> ./tungsten_prep_upgrade -o 
      ~or~
      shell> ./tungsten_prep_upgrade --service nyc --offline
      shell> ./tungsten_prep_upgrade --service tokyo --offline
    • On every tokyo node:

      shell> ./tungsten_prep_upgrade -o 
      ~or~
      shell> ./tungsten_prep_upgrade --service london --offline
      shell> ./tungsten_prep_upgrade --service tokyo --offline
  2. Next, on the master hosts within each cluster we issue the heartbeat, execute the following using the cluster-specific trepctl, typically in /opt/continuent:

    shell> trepctl heartbeat -name offline_for_upg

    Ensure that every cross-site replicator on every node is now in the OFFLINE:NORMAL state:

    shell> mmtrepctl status
    ~or~
    shell> mmtrepctl --service {servicename} status
  3. Capture the position of the cross-site replicators on all nodes in all clusters.

    The service name provided should be the name of the remote service(s) for this cluster, so for example in the london cluster you get the positions for nyc and tokyo, and in nyc you get the position for london and tokyo, etc.

    • On every london node:

      shell> ./tungsten_prep_upgrade -g
      ~or~
      shell> ./tungsten_prep_upgrade --service nyc --get
      (NOTE: saves to ~/position-nyc-YYYYMMDDHHMMSS.txt)
      shell> ./tungsten_prep_upgrade --service tokyo --get
      (NOTE: saves to ~/position-tokyo-YYYYMMDDHHMMSS.txt)
    • On every nyc node:

      shell> ./tungsten_prep_upgrade -g
      ~or~
      shell> ./tungsten_prep_upgrade --service london --get
      (NOTE: saves to ~/position-london-YYYYMMDDHHMMSS.txt)
      shell> ./tungsten_prep_upgrade --service tokyo --get
      (NOTE: saves to ~/position-tokyo-YYYYMMDDHHMMSS.txt)
    • On every tokyo node:

      shell> ./tungsten_prep_upgrade -g
      ~or~
      shell> ./tungsten_prep_upgrade --service london --get
      (NOTE: saves to ~/position-london-YYYYMMDDHHMMSS.txt)
      shell> ./tungsten_prep_upgrade --service nyc --get
      (NOTE: saves to ~/position-nyc-YYYYMMDDHHMMSS.txt)
  4. Finally, to complete this step, stop the replicators on all nodes:

    shell> ./tungsten_prep_upgrade --stop 

3.2.3.5. Upgrade Step 3: Export the tungsten_* Databases

On every node in each cluster, export the tracking schema for the cross-site replicator

Similar to the above step 2 when you captured the cross-site position, the same applies here, in london you export/backup nyc and tokyo, and in nyc you export/backup london and tokyo, and finally in tokyo you export/backup nyc and london.

  • On every london node:

    shell> ./tungsten_prep_upgrade -d --alldb 
    ~or~
    shell> ./tungsten_prep_upgrade --service nyc --dump
    shell> ./tungsten_prep_upgrade --service tokyo --dump
  • On every nyc node:

    shell> ./tungsten_prep_upgrade -d --alldb 
    ~or~
    shell> ./tungsten_prep_upgrade --service london --dump
    shell> ./tungsten_prep_upgrade --service tokyo --dump
  • On every tokyo node:

    shell> ./tungsten_prep_upgrade -d --alldb 
    ~or~
    shell> ./tungsten_prep_upgrade --service london --dump
    shell> ./tungsten_prep_upgrade --service nyc --dump

3.2.3.6. Upgrade Step 4: Uninstall the Cross-site Replicators

To uninstall the cross-site replicators, execute the following on every node:

shell> cd {replicator software path}
shell> tools/tpm uninstall --i-am-sure

3.2.3.7. Upgrade Step 5: Reload the tracking schema

We DO NOT want the reloading of this schema to appear in the binary logs on the Master, therefore the reload needs to be performed on each node individually:

  • On every london node:

    shell> ./tungsten_prep_upgrade -s nyc -u tungsten -w secret -r
    shell> ./tungsten_prep_upgrade -s tokyo -u tungsten -w secret -r
    ~or~
    shell> ./tungsten_prep_upgrade --service nyc --user tungsten --password secret --restore
    shell> ./tungsten_prep_upgrade --service tokyo --user tungsten --password secret --restore
  • On every tokyo node:

    shell> ./tungsten_prep_upgrade -s london -u tungsten -w secret -r
    shell> ./tungsten_prep_upgrade -s nyc -u tungsten -w secret -r
    ~or~
    shell> ./tungsten_prep_upgrade --service london --user tungsten --password secret --restore
    shell> ./tungsten_prep_upgrade --service nyc --user tungsten --password secret --restore
  • On every nyc node:

    shell> ./tungsten_prep_upgrade -s london -u tungsten -w secret -r
    shell> ./tungsten_prep_upgrade -s tokyo -u tungsten -w secret -r
    ~or~
    shell> ./tungsten_prep_upgrade --service london --user tungsten --password secret --restore
    shell> ./tungsten_prep_upgrade --service tokyo --user tungsten --password secret --restore

3.2.3.8. Upgrade Step 6: Update Configuration

Update /etc/tungsten/tungsten.ini to a valid v6 config. An example of a valid config is as follows:

[defaults]
user=tungsten
home-directory=/opt/continuent
application-user=app_user
application-password=secret
application-port=3306
profile-script=~/.bash_profile
replication-user=tungsten
replication-password=secret
mysql-allow-intensive-checks=true
skip-validation-check=THLSchemaChangeCheck
start-and-report=true
[nyc]
topology=clustered
master=db1
slaves=db2,db3
connectors=db1,db2,db3
[london]
topology=clustered
master=db4
slaves=db5,db6
connectors=db4,db5,db6
[london]
topology=clustered
master=db7
slaves=db8,db9
connectors=db7,db8,db9
[global]
topology=composite-multi-master
composite-datasources=nyc,london,tokyo

3.2.3.9. Upgrade Step 7: Enter Maintenance Mode

Enable Maintenance mode on all clusters using the cctrl command:

shell> cctrl
cctrl> set policy maintenance

3.2.3.10. Upgrade Step 8: Stop Managers

Stop the manager process on all nodes:

shell> manager stop

3.2.3.11. Upgrade Step 9: Install the Software

Validate and install install new release on all nodes:

shell> cd /opt/continuent/software/tungsten-clustering-6.0.0-442
shell> tools/tpm validate-update

If validation shows no errors, run the install:

shell> tools/tpm update --replace-release

Important

If you had start-and-report=false you may need to restart manager services

Warning

Until all nodes have been updated, the output from cctrl may shows services in an OFFLINE, STOPPED, or UNKNOWN state. This is to be expected until all the new v6 managers are online

3.2.3.12. Upgrade Step 10: Start Managers

After the installation is complete on all nodes, start the manager services:

shell> manager start

3.2.3.13. Upgrade Step 11: Return to Automatic Mode

Return all clusters to Automatic mode using the cctrl command:

shell> cctrl
cctrl> set policy automatic

3.2.3.14. Upgrade Step 12: Validate

  1. Identify the cross-site service name(s):

    shell> trepctl services

    In our example, the local cluster service will one of london, nyc or tokyo depending on the node you are on. The cross site replication services would be:

    (within the london cluster)
    london_from_nyc
    london_from_tokyo
    
    (within the nyc cluster)
    nyc_from_london
    nyc_from_tokyo
    
    (within the tokyo cluster)
    tokyo_from_london
    tokyo_from_nyc
  2. Upon installation, the new cross-site replicators will come online, it is possible that they may be in an OFFLINE:ERROR state due to a change in Epoch numbers, check this on the master in each cluster by looking at the output from the trepctl command.

    Check each service as needed based on the status seen above:

    shell> trepctl -service london_from_nyc status
    shell> trepctl -service london_from_tokyo status
    ~or~
    shell> trepctl -service nyc_from_london status
    shell> trepctl -service nyc_from_tokyo status
    ~or~
    shell> trepctl -service tokyo_from_london status
    shell> trepctl -service tokyo_from_nyc status
  3. If the replicator is in an error state due to an epoch difference, you will see an error similar to the following:

    pendingErrorSeqno      : -1
    pendingExceptionMessage: Client handshake failure: Client response
    validation failed: Log epoch numbers do not match: master source
    ID=db1 client source ID=db4 seqno=4 server epoch number=0 client
    epoch number=4
    pipelineSource         : UNKNOWN

    The above error is due to the epoch numbers changing as a result of the replicators being restarted, and the new replicators being installed.

    To resolve, simply force the replicator online as follows:

    shell> trepctl -service london_from_nyc online -force
    shell> trepctl -service london_from_tokyo online -force
    ~or~
    shell> trepctl -service nyc_from_london online -force
    shell> trepctl -service nyc_from_tokyo online -force
    ~or~
    shell> trepctl -service tokyo_from_london online -force
    shell> trepctl -service tokyo_from_nyc online -force
  4. If the replicator shows an error state similar to the following:

    pendingErrorSeqno      : -1
    pendingExceptionMessage: Client handshake failure: Client response
    validation failed: Master log does not contain requested
    transaction: master source ID=db1 client source ID=db2 requested
    seqno=1237 client epoch number=0 master min seqno=5 master max
    seqno=7
    pipelineSource         : UNKNOWN

    The above error is possible if during install the slave replicators came online before the master.

    Providing the steps above have been followed, just bringing the replicator online should be enough to get the replicator to retry and carry on successfully:

    shell> trepctl -service london_from_nyc online 
    shell> trepctl -service london_from_tokyo online 
    ~or~
    shell> trepctl -service nyc_from_london online 
    shell> trepctl -service nyc_from_tokyo online 
    ~or~
    shell> trepctl -service tokyo_from_london online 
    shell> trepctl -service tokyo_from_nyc online