6.1.5. Deployment with Provisioning

Version Support: 2.2.1 and later

You can setup the extractor from Oracle to automatically read and provision the slave database by using the Section 8.10, “Using the Parallel Extractor”. The parallel extractor reads information from the source database schema in chunks and then feeds this information into the THL data stream as row-based INSERT operations. When the slave connects, these are applied to the slave database as with a normal INSERT operations. The parallel extractor is particularly useful in heterogeneous environments such as Oracle to MySQL where the slave data does not already exist on the slave.

The basic provisioning process operates in two stages:

  1. Provisioning data is extracted and inserted into the THL. One event is used to contain all of the data from a single table. If the table is too large to be contained in a single event, the data will be distributed over multiple events.

  2. Once provisioning has finished, data is extracted from the CDC as normal and added to the THL

Important

The parallel extractor is not restart safe, and the process should not be interrupted.

This allows existing data to be extracted and processed through the replicator path, including filters within the applier. Once the initial data has been extracted, the change data to be applied.

To use the parallel extractor to provision data into the slave, the configuration must be performed as part of the installation process when configuring the master replicator.

To setup provisioning with parallel extractor:

  1. Run setupCDC.sh to create the Oracle CDC infrastructure.

  2. Install master Tungsten Replicator using tpm, but do not enable automatic starting (i.e. do not use the --start or --start-and-report options).

  3. On the slave database, create the destination tables for the schemas being replicated. This can be achieved either manually or by using ddlscan to create the required table definitions.

  4. Install the slave replicator as normal; this can be a MySQL or Oracle destination.

  5. On the master:

    1. Start the replicator in OFFLINE mode using replicator start offline:

      shell> replicator start offline
    2. Put the replicator into the ONLINE state, using the -provision option:

      shell> trepctl online -provision

      Alternatively, the system change number (SCN) identified when CDC capture is first enabled through setupCDC.sh can be used to provide a point-in-time provisioning. This can be useful if the data has previously been loaded and then CDC started by enabling provisioning from the start point. To use this method, identify the start position indicated by setupCDC.sh:

      Capture started at position 40748375

      Then supply this to the trepctl online -provision command:

      shell> trepctl online -provision 40748375

      During the provisioning process, the replicator will show the status GOING-ONLINE:PROVISIONING until all of the data has been read from the existing database.

    The master will now start to read the information currently stored and feed this information through a separate pipeline into the THL.

  6. On the slave, start the replicator, or put the replicator online. Statements from the master containing the provisioning information should be replicated into the slave.

Important

If the replicator is placed offline while the parallel extractor is still extracting data, the extraction process will continue to run and insert data until the extraction process has been completed.

Once the provisioned data has been inserted, replication will continue from the position where changes started to occur after the replicator was installed.

For more information on tuning the parallel extractor, see Section 8.10.1, “Advanced Configuration Parameters”.