2.5. Best Practices

A successful deployment depends on being mindful during deployment, operations and ongoing maintenance.

2.5.1. Best Practices: Deployment

  • Identify the best deployment method for your environment and use that in production and testing. See Section 10.1, “Comparing Staging and INI tpm Methods”.

  • Standardize the OS and database prerequisites. There are Ansible modules available for immediate use within AWS, or as a template for modifications.

    More information on the Ansible method is available in this blog article.

  • Ensure that the output of the `hostname` command and the nodename entries in the Tungsten configuration match exactly prior to installing Tungsten.

    The configuration keys that define nodenames are: --slaves, --dataservice-slaves, --members, --master, --dataservice-master-host, --masters and --relay

  • For security purposes you should ensure that you secure the following areas of your deployment:

  • Choose your topology from the deployment section and verify the configuration matches the basic settings. Additional settings may be included for custom features but the basics are needed to ensure proper operation. If your configuration is not listed or does not match our documented settings; we cannot guarantee correct operation.

  • If there are an even number of database servers in the cluster, configure the cluster with a witness host. An active witness is preferred but a passive one will ensure stability. See Section 2.1.4, “Active Witness Hosts” for an explanation of the differences and how to configure them.

  • If you are using ROW replication, any triggers that run additional INSERT/UPDATE/DELETE operations must be updated so they do not run on the Replica servers.

  • Make sure you know the structure of the Tungsten Cluster home directory and how to initialize your environment for administration. See Section 6.1, “The Home Directory” and Section 6.2, “Establishing the Shell Environment”.

  • Prior to migrating applications to Tungsten Cluster test failover and recovery procedures from Chapter 6, Operations Guide. Be sure to try recovering a failed Primary and reprovisioning failed Replicas.

  • When deciding on the Service Name for your configurations, keep them simple and short and only use alphanumerics (Aa-Zz,0-9) and underscores (_).

2.5.2. Best Practices: Upgrade

In this section we identify the best practices for performing a Tungsten Software upgrade.

  • Identify the deployment method chosen for your environment, Staging or INI. See Section 10.1, “Comparing Staging and INI tpm Methods”.

  • The best practice for Tungsten software is to upgrade All-at-Once, performing zero Primary switches.

  • The Staging deployment method automatically does an All-at-Once upgrade - this is the basic design of the Staging method.

  • For an INI upgrade, there are two possible ways, One-at-a-Time (with at least one Primary switch), and All-at-Once (no switches at all).

  • See Section 10.4.3, “Upgrades with an INI File” for more information.

  • Here is the sequence of events for a proper Tungsten upgrade on a 3-node cluster with the INI deployment method:

    • Login to the Customer Downloads Portal and get the latest version of the software.

    • Copy the file (i.e. tungsten-clustering-7.0.2-161.tar.gz) to each host that runs a Tungsten component.

    • Set the cluster to policy MAINTENANCE

    • On every host:

      • Extract the tarball under /opt/continuent/software/ (i.e. create /opt/continuent/software/tungsten-clustering-7.0.2-161)

      • cd to the newly extracted directory

      • Run the Tungsten Package Manager tool, tools/tpm update --replace-release

    • For example, here are the steps in order:

      On ONE database node:
      shell> cctrl
      cctrl> set policy maintenance
      cctrl> exit
      
      On EVERY Tungsten host at the same time:
      shell> cd /opt/continuent/software
      shell> tar xvzf tungsten-clustering-7.0.2-161.tar.gz
      shell> cd tungsten-clustering-7.0.2-161
      
      To perform the upgrade and restart the Connectors gracefully at the same time:
      shell> tools/tpm update --replace-release
      
      To perform the upgrade and delay the restart of the Connectors to a later time:
      shell> tools/tpm update --replace-release --no-connectors
      When it is time for the Connector to be promoted to the new version, perhaps after taking it out of the load balancer:
      shell> tpm promote-connector
      
      When all nodes are done, on ONE database node:
      shell> cctrl
      cctrl> set policy automatic
      cctrl> exit

WHY is it ok to upgrade and restart everything all at once?

Let’s look at each component to examine what happens during the upgrade, starting with the Manager layer.

Once the cluster is in Maintenance mode, the Managers cease to make changes to the cluster, and therefore Connectors will not reroute traffic either.

Since Manager control of the cluster is passive in Maintenance mode, it is safe to stop and restart all Managers - there will be zero impact to the cluster operations.

The Replicators function independently of client MySQL requests (which come through the Connectors and go to the MySQL database server), so even if the Replicators are stopped and restarted, there should be only a small window of delay while the replicas catch up with the Primary once upgraded. If the Connectors are reading from the Replicas, they may briefly get stale data if not using SmartScale.

Finally, when the Connectors are upgraded they must be restarted so the new version can take over. As discussed in this blog post, Zero-Downtime Upgrades, the Tungsten Cluster software upgrade process will do two key things to help keep traffic flowing during the Connector upgrade promote step:

  • Execute `connector graceful-stop 30` to gracefully drain existing connections and prevent new connections.

  • Using the new software version, initiate the start/retry feature which launches a new connector process while another one is still bound to the server socket. The new Connector process will wait for the socket to become available by retrying binding every 200ms by default (which is tunable), drastically reducing the window for application connection failures.

2.5.3. Best Practices: Operations

2.5.4. Best Practices: Maintenance

  • Your license allows for a testing cluster. Deploy a cluster that matches your production cluster and test all operations and maintenance operations there.

  • Schedule regular tests for local and DR failover. This should at least include switching the Primary server to another host in the local cluster. If possible, the DR cluster should be tested once per quarter.

  • Disable any automatic operating system patching processes. The use of automatic patching will cause issues when all database servers automatically restart without coordination. See Section 6.15.3, “Performing Maintenance on an Entire Dataservice”.

  • Regularly check for maintenance releases and upgrade your environment. Every version includes stability and usability fixes to ease the administrative process.