6.7.3. Installing Amazon Redshift Replication

Replication into Redshift requires two separate replicator installations, one that extracts information from the source database, and a second that generates the CSV files, loads those files into S3 and then executes the statements on the Redshift database to import the CSV data and apply the transformations to build the final tables. These can either be two separate hosts, or configured to work within a single host.

Configure Defaults

The defaults configure the type of the services, topology and hosts in the overall service:

shell> ./tools/tpm configure defaults --reset
shell> ./tools/tpm configure alpha \
    --install-directory=/opt/continuent \
    --enable-heterogeneous-service=true \
    --members=host1,host2 \
    --master=host1

The description of each of the options is shown below; click the icon to hide this detail:

Click the icon to show a detailed description of each argument.

Configure Master Replicator Service

To configure the master replicator, which will extract information from MySQL into THL:

shell> ./tools/tpm configure alpha --hosts=host1 \
    --replication-user=tungsten \
    --replication-password=password \
    --property=replicator.filter.pkey.addColumnsToDeletes=true \
    --property=replicator.filter.pkey.addPkeyToInserts=true

The description of each of the options is shown below; click the icon to hide this detail:

Click the icon to show a detailed description of each argument.

Configure the Amazon Redshift Replicator Service

Creating the the Amazon Redshift side of the process requires creating a slave to the master service created in the previous step, and configuring the correct applier and user/password combination.

  1. Use tpm to configure the applier side of the installation:

    shell> ./tools/tpm configure alpha --hosts=host2 \
        --replication-host=redshift.us-east-1.redshift.amazonaws.com \
        --replication-user=awsRedshiftUser \
        --replication-password=awsRedshiftPass \
        --datasource-type=redshift \
        --batch-enabled=true \
        --batch-load-template=redshift \
        --redshift-dbname=dev \
        --svc-applier-filters=dropstatementdata \
        --svc-applier-block-commit-interval=10s \
        --svc-applier-block-commit-size=5

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

With the configuration in place, the replicators can be installed by running:

shell> tpm install alpha

If the installation process fails, check the output of the /tmp/tungsten-configure.log file for more information about the root cause.

On the host that is loading data into Redshift, create the s3-config-servicename.json file and then copy that file into the share directory within the installed directory on that host. For example:

shell> cp s3-config-servicename.json /opt/continuent/share/

Now the services can be started:

shell> tpm start alpha

Once the service is configured and running, the service can be monitored as normal using the trepctl command. See Section 6.7.6, “Management and Monitoring of Amazon Redshift Deployments” for more information.