7.10.2. Seeding Data for Heterogeneous Replication

Seeding data for heterogeneous targets is a complex process that can have many challenges that all depend on the target and the amount of data requiring seeding.

The steps outlined below come with their own challenges that may prove not to be suitable in your own environment. Therefore, any pre-seeding process needs to be fully understood and evaluated

The following requirements are needed for this process:

  • A temporary host meeting all required pre-requisites, with an empty MySQL instance matching the same version as the original source, and with the default storage engine set to BLACKHOLE

  • Enough disk space on the temporary host to hold binary logs and THL logs equivalent to the amount of data being seeded

  • The ability to extract the data from the source, using mysqldump, or via reverse engineering SQL Statements

The process in summary is as follows:

  • Configure replicator to extract from the temporary instance

  • Extract data from the source capturing the binlog position appropriate to the export

  • Import the data into the temporary instance and allow the replicator to load the target

  • Re-Configure the replicator to extract from the original source, positioned to start from the co-ordinates noted during the export

7.10.2.1. Seeding Data from a Standalone Source

Step 1: Configure the Temporary Instance

  • Build an empty MySQL Instance providing that all the pre-requisites are in place. These are outline in Appendix B, Prerequisites

  • To ensure compatibility, you need to make sure that the version of MySQL used matches the version of MySQL running on the main source.

  • Once the instance is running, you need to pre-create all of the tables that you will be loading. This must be done manually as you need to ensure that each table is created with the ENGINE=BLACKHOLE option

  • The use of the BLACKHOLE engine will mean that the data doesn't actually get stored when written to the database, however binary logs are generated and it is these that the replicator requires

Step 2: Configure the Replicator

  • Follow the steps outlined in Section 3.2, “Deploying a Primary/Replica Topology” to configure an extractor against the Temporary host.

  • Ensure that the configuration includes the following entries to enable heterogeneous replication, and ensure you qualify the objects (schemas and/or tables) that you want to seed

    enable-heterogeneous-master=true
    svc-extractor-filters=replicate
    property=replicator.filter.replicate.do=schema.table
  • Once installed, start the replicator

Step 3: Build the Target Schema

  • Depending on the target, you may need to pre-create the final objects in your target environment. Use ddlscan to do this now if this is required

Step 4: Configure the Applier

  • Ensure the applier host meets all the required pre-requisites and then configure the applier appropriate to the target you are applying to.

  • Follow the appropriate steps in Chapter 4, Deploying Appliers to configure a standalone applier, ensuring that it is configured to connect to the temporary extractor installed in Step 2

  • Once configured, start the applier. At this stage you should not see any replication traffic since the temporary host will have no data written to it

Step 5: Export the data from the source

  • We now need to export the data from the source.

  • Using mysqldump we need to ensure we capture the binlog position and we also need to ensure that the export does NOT contain DB or TABLE DDL

  • The following example can be used as a template. In this example we are exporting an entire schema. Use the appropriate options if you require only specific tables

    mysqldump -u root -psecret -B hr --no-create-db --no-create-info --master-data=2 >dump.sql

Step 6: Import the Data

  • Now that we have the data we can import this into the Temporary Instance

  • As you load the data, you can monitor replication and you should see the data loading into your target envrionment.

  • Providing you created the tables correctly with the BLACKHOLE engine, you should see that a select count on the tables in the temporary instance should return a row count of zero.

  • When the load has finished and the applier has completed replication, stop both replicators using the following command:

    shell> replicator stop

    We have now finished with the temporary MySQL instance

Step 7: Install Extractor from Main Source Host

  • Follow the steps outlined in Section 3.2, “Deploying a Primary/Replica Topology” to configure an extractor against the Source host.

  • Ensure that the configuration includes the following entries to enable heterogeneous replication, and ensure you qualify the objects (schemas and/or tables) as required, for example

    enable-heterogeneous-master=true
    svc-extractor-filters=replicate
    property=replicator.filter.replicate.do=schema.table
  • Once installed, start the replicator in an offline state

    shell> replicator start offline

Step 8: Reconfigure the Applier

  • We now need to reconfigure the applier, but first we need to uninstall the software to ensure we have a clean build and any THL from the pre-load has been cleared

    shell> cd /opt/continuent/software/tungsten-replicator-7.0.3-141
    shell> tools/tpm uninstall --i-am-sure

  • Place the correct configuration into the /etc/tungsten/tungsten.ini file ensuring that start-and-report=false is set and that the applier is now configured to point to the main extractor

  • Install the software

    shell> cd /opt/continuent/software/tungsten-replicator-7.0.3-141
    shell> tools/tpm install

  • Finally start the repicator in an offline state, and issue a reset to be sure the previous tracking schema is clean:

    shell> replicator start offline
    shell> trepctl -service servicename reset -all -y

Step 9: Position the Replicator

  • The final step is to now position the extractor to pick up from the position that the export in step 5 was taken.

  • Locate the dump file and issue the following command:

    shell> grep "CHANGE MASTER" dump.sql
    -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000003', MASTER_LOG_POS=847;

    Taking the sequence value from MASTER_LOG_FILE filename and the MASTER_LOG_POS, issue the following statement on the EXTRACTOR host:

    shell> trepctl online -from-event 000003:847

  • Once the command has completed, the extractor will be online, you can now bring the applier online

    shell> trepctl online

7.10.2.2. Seeding Data from a Cluster, for a Cluster-Extractor Target

Step 1: Configure the Temporary Instance

  • Build an empty MySQL Instance providing that all the pre-requisites are in place. These are outline in Appendix B, Prerequisites

  • To ensure compatibility, you need to make sure that the version of MySQL used matches the version of MySQL running on the main source.

  • Once the instance is running, you need to pre-create all of the tables that you will be loading. This must be done manually as you need to ensure that each table is created with the ENGINE=BLACKHOLE option

  • The use of the BLACKHOLE engine will mean that the data doesn't actually get stored when written to the database, however binary logs are generated and it is these that the replicator requires

Step 2: Configure the Replicator

  • Follow the steps outlined in Section 3.2, “Deploying a Primary/Replica Topology” to configure an extractor against the Temporary host.

  • Ensure that the configuration includes the following entries to enable heterogeneous replication, and ensure you qualify the objects (schemas and/or tables) that you want to seed

    enable-heterogeneous-master=true
    svc-extractor-filters=replicate
    property=replicator.filter.replicate.do=schema.table
  • Once installed, start the replicator

Step 3: Build the Target Schema

  • Depending on the target, you may need to pre-create the final objects in your target environment. Use ddlscan to do this now if this is required

Step 4: Configure the Applier

In a Cluster-Extractor environment, we will use the same applier after we have seeded the target.

  • Ensure the applier host meets all the required pre-requisites and then configure the applier appropriate to the target you are applying to.

  • Follow the appropriate steps in Chapter 4, Deploying Appliers to configure a standalone applier, ensuring that it is configured to connect to the temporary extractor installed in Step 2. At this stage do not follow the Cluster-Extractor setup

    Important

    When setting the service name for this temporary seeding process, ensure the servicename you choose matches the servicename of the main source cluster that we will later connect to for normal operation

  • Once configured, start the applier. At this stage you should not see any replication traffic since the temporary host will have no data written to it

Step 5: Export the data from the source

  • We now need to export the data from the source. There are two ways to do this with a cluster. We can either take an export from the Primary, or we an take an export from a Replica.

  • Export from a Primary:

    • If you are able to export from the Primary, then using mysqldump we need to ensure we capture the binlog position and we also need to ensure that the export does NOT contain DB or TABLE DDL

    • The following example can be used as a template. In this example we are exporting an entire schema. Use the appropriate options if you require only specific tables

      mysqldump -u root -psecret -B hr --no-create-db --no-create-info --master-data=2 >dump.sql

  • Export from a Replica:

    • To export from a Replica, we cannot obtain the binlog position correctly as the one we need is specific to the Primary, however, exporting from a Replica means that we can utilise the features of Tungsten Cluster to isolate the node

    • First, set the cluster to MAINTENANCE mode, and then SHUN the node that we wish to export from

      shell> cctrl
      cctrl> set policy maintenance
      cctrl> datasource Replicahost shun

    • Next, we can take the export

      mysqldump -u root -psecret -B hr --no-create-db --no-create-info >dump.sql

    • The final step is to capture the current replication position using dsctl

      shell> dsctl get -ascmd
      dsctl set -seqno 9 -epoch 2 -event-id "mysql-bin.000003:0000000000002608;-1" -source-id "db1"

      Make a note of the output from running this command as we will need it later

    • Finally, you can re-introduce the node back into the cluster

      shell> cctrl
      cctrl> datasource Replicahost recover
      cctrl> set policy automatic

Step 6: Import the Data

  • Now that we have the data we can import this into the Temporary Instance

  • As you load the data, you can monitor replication and you should see the data loading into your target envrionment.

  • Providing you created the tables correctly with the BLACKHOLE engine, you should see that a select count on the tables in the temporary instance should return a row count of zero.

  • When the load has finished and the applier has completed replication, stop both replicators using the following command:

    shell> replicator stop

    We have now finished with the temporary MySQL instance

Step 7: Reconfigure the Applier as a Cluster-Extractor

  • We now need to reconfigure the applier as a cluster Replica, but first we need to uninstall the software to ensure we have a clean build and any THL from the pre-load has been cleared

    shell> cd /opt/continuent/software/tungsten-replicator-7.0.3-141
    shell> tools/tpm uninstall --i-am-sure

  • Place the correct configuration into the /etc/tungsten/tungsten.ini file ensuring that start-and-report=false is set

  • Install the software

    shell> cd /opt/continuent/software/tungsten-replicator-7.0.3-141
    shell> tools/tpm install

  • Finally start the repicator in an offline state:

    shell> replicator start offline

Step 8: Position the Replicator

  • The final step is to now position the replicator to pick up from the position that the export in step 5 was taken.

  • If you took the export from the Replica and already have a dsctl set command, then move onto the next step. If you took an export from the Primary then we need to retrieve the correct positions from the dump file

    Locate the dump file and issue the following command:

    shell> grep "CHANGE MASTER" dump.sql
    -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000003', MASTER_LOG_POS=847;

    Taking the sequence value from MASTER_LOG_FILE filename and the MASTER_LOG_POS, issue the following statement on the Primary host in the cluster, and copy the resulting dsctl command:

    shell> tungsten_find_position mysql-bin.000003:847
    dsctl set -reset -seqno 9 -epoch 2 -event-id "mysql-bin.000003:0000000000000847;-1" -source-id "db1"

  • Taking the dsctl that you obtained either from the Replica in Step 5 or from the steps just above, run the dsctl command on the applier host ONLY

  • Once the command has completed, you can now bring the replicator online

    shell> replicator start offline