5.14.4. Making Online Schema Changes

Similar to the maintenance procedure, schema changes to an underlying dataserver may need to be performed on dataservers that are not part of an active dataservice. Although many inline schema changes, such as the addition, removal or modification of an existing table definition will be correctly replicated to slaves, other operations, such as creating new indexes, adding/removing columns or migrating table data between table definitions, is best performed individually on each dataserver while it has been temporarily taken out of the dataservice.

As with all maintenance operations it is advisable to have fully tested your DDL in a staging environment. In some cases, the imapct of DDL change is minimal and therefore can safely applied to the Master node and allowing the change to be replicated down to the slaves.

In situations where the overhead of the DDL change would cause an outage to your application through table locking, use the rolling maintenance procedure below which is specific for DDL changes.

The basic process comprises of a number of steps, these are as follows:

  1. If the DDL adds or removes columns, then enable the colnames and dropcolumn filters

  2. If the DDL adds or removes tables, and you do not want to simply apply to the master and allow replication to handle it, then enable the replicate filters

  3. Perform schema changes following the process summarised in the table below

  4. Optionally, remove the filters enabled in the first step

Enable filters for column changes

The use of the colnames and dropcolumn filters allow you to make changes to the structure of tables, without impacting the flow of replication.

Important

During these schema changes, and whilst the filters are in place, applications MUST be forwards and backwards compatible, but MUST NOT alter data within any columns that are filtered out from replication until after the process has been completed on all hosts, and the filters disabled. Data changes to filtered columns will cause data drift and inconsistencies, resulting in potentially unexpected behaviour

  • To enable the filters, first create a file called schemachange.json in a directory accessible by the OS user that the software is running as.

    Typically, this will the tungsten user and the default location for this file will be /opt/continuent/share

    The file needs to contain a JSON block outlining ALL the columns being added and removed from all tables affected by the changes.

    In the example below, we are removing the column, operator_code and adding operator_desc to the system_operators table and adding the column action_date to the system_actions table:

    [
        {
          "schema": "ops",
          "table": "system_operators",
          "columns": ["operator_code","operator_desc"]
        },
        {
          "schema": "ops",
          "table": "system_actions",
          "columns": ["action_date"]
        }  
      ]
  • Place your cluster into maintenance mode

    shell> cctrl
    cctrl> set policy maintenance

    Note

    If running a Composite Multimaster or Composite HA/DR topology, issue the command at the top global level to place all clusters in maintenance, or execute individually within each cluster
  • Next, enable the filters within your configuration by adding the following two parameters to the tungsten.ini (if running in INI method) to the [defaults] section on EVERY cluster node:

    svc-extractor-filters=colnames,dropcolumn
    property=replicator.filter.dropcolumn.definitionsFile=/opt/continuent/share/schemachange.json

    Followed by tpm update to apply the changes:

  • Or, if running as a staging install:

    shell> cd staging_dir
    shell> tools/tpm update alpha \
      --svc-extractor-filters=colnames,dropcolumn \
      --property=replicator.filter.dropcolumn.definitionsFile=/opt/continuent/share/schemachange.json
  • Monitor replication to ensure there are no errors before continuing

Enable filters for adding/removing tables

The use of the replicate filter allows you to add and remove tables without impacting the flow of replication.

Important

During these schema changes, and whilst the filter is in place, applications MUST be forwards and backwards compatible, but MUST NOT modify data in any new tables until after the process has been completed on all hosts, and the filters disabled. Data changes to filtered tables will cause data drift and inconsistencies, resulting in potentially unexpected behaviour.

  • Place your cluster into maintenance mode.

    shell> cctrl
    cctrl> set policy maintenance

    Note

    If running a Composite Multimaster or Composite HA/DR topology, issue the command at the top global level to place all clusters in maintenance, or individually within each cluster.
  • Next, enable the filters within your configuration by adding the following two parameters to the tungsten.ini (if running in INI method) to the [defaults] section on EVERY cluster node.

    In this example we plan to ADD the table system_actions and REMOVE the table system_operations, both within the ops schema:

    svc-extractor-filters=replicate
    property=replicator.filter.replicate.ignore=ops.system_actions,ops.system_operations

    Followed by tpm update to apply the changes

  • Or, if running as a staging install:

    shell> cd staging_dir
    shell> tools/tpm update alpha \
      --svc-extractor-filters=replicate \
      --property=replicator.filter.replicate.ignore=ops.system_actions,ops.system_operations
  • Monitor replication to ensure there are no errors before continuing.

Apply DDL Changes

  • Follow the steps outlined in the table below to make the DDL changes to all nodes, in all clusters.

  • If filtering columns, once all the changes have been complete, edit the schemachange.json to contain an empty document:

    shell> echo "[]" > /opt/continuent/share/schemachange.json

    Then, restart the replicators:

    shell> replicator restart
  • If filtering tables, repeat the process of adding the replicate filter removing any tables from the ignore parameter that you have ADDED to your database.

  • You can optionally fully remove the filters if you wish by removing the entries from the configuration and re-running tpm update however it is also perfectly fine to leave them in place. There is a potentially small CPU overhead in very busy clusters by having the filters in place, but otherwise should not have any impact.

    It is advisable to monitor the system usage and make the decision based on your own business needs.

The following method assumes a schema update on the entire dataservice by modifying the schema on the slaves first. The schema shows three datasources being updated in sequence, slaves first, then the master.

Step Description Command host1 host2 host3
1 Initial state   Master Slave Slave
2 Set the slave host2 offline trepctl -host host2 offline Master Offline Slave
3 Connect to dataserver for host2 and update schema

set sql_log_bin=0;

run ddl statements

Master Offline Slave
4 Set the slave online trepctl -host host2 online Master Slave Slave
5 Ensure the slave (host2) has caught up trepctl -host host2 status Master Slave Slave
6 Set the slave host3 offline trepctl -host host3 offline Master Slave Offline
7 Connect to dataserver for host3 and update schema

set sql_log_bin=0;

run ddl statements

Master Slave Offline
8 Set the slave (host3) online trepctl -host host3 online Master Slave Slave
9 Ensure the slave (host3) has caught up trepctl -host host3 status Master Slave Slave
10 Switch master to host2 See Section 5.5, “Switching Master Hosts” Slave Master Slave
11 Set the slave host1 offline trepctl -host host1 offline Offline Master Slave
12 Connect to dataserver for host1 and update schema

set sql_log_bin=0;

run ddl statements

Offline Master Slave
13 Set the slave host1 online trepctl -host host1 online Slave Master Slave
14 Ensure the slave (host1) has caught up trepctl -host host1 status Master Slave Slave
15 Switch master back to host1 See Section 5.5, “Switching Master Hosts” Master Slave Slave

Note

With any schema change to a database, the database performance should be monitored to ensure that the change is not affecting the overall dataservice performance.