3.10.3.1. Replicating Data from a Cluster to a Datawarehouse (Staging Use Case)

3.10.3.1. Replicating Data from a Cluster to a Datawarehouse (Staging Use Case)
Prev	^Up	3.10.3. Replicating from a Cluster to a Datawarehouse - Configuring the Cluster-Extractor	Next

3.10.3.1. Replicating Data from a Cluster to a Datawarehouse (Staging Use Case)

The following Staging-method procedure will install the Tungsten Replicator software onto target node host6, extracting from a cluster consisting of three (3) nodes (host1, host2 and host3) and applying into the target datawarehouse via host6.

Important

If you are replicating to a MySQL-specific target, please see Section 3.9, “Replicating Data Out of a Cluster” for more information.

On your staging server, go to the software directory.
copy
```
shell> cd /opt/continuent/software
```
Download the latest Tungsten Replicator version.

Unpack the release package

copy

shell> tar xvzf tungsten-replicator-7.1.4-10.tar.gz

Change to the unpackaged directory:

copy

shell> cd tungsten-replicator-7.1.4-10.tar.gz

Execute the tpm command to configure defaults for the installation.
copy
```
shell> ./tools/tpm configure defaults \
--install-directory=/opt/replicator \
'--profile-script=~/.bashrc' \
--replication-password=secret \
--replication-port=13306 \
--replication-user=tungsten \
--start-and-report=true \
--mysql-allow-intensive-checks=true \
--user=tungsten
```
The description of each of the options is shown below; click the icon to hide this detail:
Click the icon to show a detailed description of each argument.
- tpm configure defaults
  This runs the tpm command. configure defaults indicates that we are setting options which will apply to all dataservices.
- --install-directory=/opt/replicator
  The installation directory of the Tungsten service. This is where the service will be installed on each server in your dataservice.
- --profile-script="~/.bashrc"
  The profile script used when your shell starts. Using this line modifies your profile script to add a path to the Tungsten tools so that managing Tungsten Cluster™ are easier to use.
- --user=tungsten
  The operating system user name that you have created for the Tungsten service, tungsten.
- --replication-user=tungsten
  The user name that will be used to apply replication changes to the database on Replicas.
- --replication-password=password
  The password that will be used to apply replication changes to the database on Replicas.
- --replication-port=13306
  Set the port number to use when connecting to the MySQL server.
- --start-and-report
  Tells tpm to startup the service, and report the current configuration and status.
Configure a cluster alias that points to the Primaries and Replicas within the current Tungsten Cluster service that you are replicating from:
copy
```
shell> ./tools/tpm configure alpha \
    --master=host1 \
    --slaves=host2,host3 \
    --thl-port=2112 \
    --topology=cluster-alias
```
The description of each of the options is shown below; click the icon to hide this detail:
Click the icon to show a detailed description of each argument.
- tpm configure alpha
  This runs the tpm command. configure indicates that we are creating a new dataservice, and alpha is the name of the dataservice being created.
  This definition is for a dataservice alias, not an actual dataservice because --topology=cluster-alias has been specified. This alias is used in the cluster-slave section to define the source hosts for replication.
- --master=host1
  Specifies the hostname of the default Primary in the cluster.
- --slaves=host2,host3
  Specifies the name of any other servers in the cluster that may be replicated from.
- --thl-port=2112
  The THL port for the cluster. The default value is 2112 but any other value must be specified.
- --topology=cluster-alias
  Define this as a cluster dataservice alias so tpm does not try to install cluster software to the hosts.
Important
This dataservice cluster-alias name MUST be the same as the cluster dataservice name that you are replicating from.
On the Cluster-Extractor node, copy the convertstringfrommysql.json filter configuration sample file into the /opt/replicator/share directory then edit it to suit:
copy
```
cp /opt/replicator/tungsten/tungsten-replicator/support/filters-config/convertstringfrommysql.json /opt/replicator/share/
vi /opt/replicator/share/convertstringfrommysql.json
```
Once the convertstringfrommysql JSON configuration file has been edited, update the /etc/tungsten/tungsten.ini file to add and configure any addition options needed for the specific datawarehouse you are using.
Create the configuration that will replicate from cluster dataservice alpha into the database on the host specified by --relay=host6:
copy
```
shell> ./tools/tpm configure omega \
--relay=host6 \
--relay-source=alpha \
--repl-svc-remote-filters=convertstringfrommysql \
--property=replicator.filter.convertstringfrommysql.definitionsFile=/opt/replicator/share/convertstringfrommysql.json \
--topology=cluster-slave
```
The description of each of the options is shown below; click the icon to hide this detail:
Click the icon to show a detailed description of each argument.
- tpm configure omega
  This runs the tpm command. configure indicates that we are creating a new replication service, and omega is the unique service name for the replication stream from the cluster.
- --relay=host6
  Specifies the hostname of the destination database into which data will be replicated.
- --relay-source=alpha
  Specifies the name of the source cluster dataservice alias (defined above) that will be used to read events to be replicated.
- --topology=cluster-slave
  Read source replication data from any host in the alpha dataservice.
Now finish configuring the omega dataservice with the options specific to the datawarehouse target in use.
- AWS RedShift Target
  copy
```
shell> ./tools/tpm configure omega \
--batch-enabled=true \
--batch-load-template=redshift \
--enable-heterogeneous-slave=true \
--datasource-type=redshift \
--replication-host=REDSHIFT_ENDPOINT_FQDN_HERE \
--replication-user=REDSHIFT_PASSWORD_HERE \
--replication-password=REDSHIFT_PASSWORD_HERE \
--redshift-dbname=REDSHIFT_DB_NAME_HERE \
--svc-applier-filters=dropstatementdata \
--svc-applier-block-commit-interval=10s \
--svc-applier-block-commit-size=5
```
  The description of each of the options is shown below; click the icon to hide this detail:
  Click the icon to show a detailed description of each argument.
  - tpm configure
    Configures default options that will be configured for all future services.
  - --topology=cluster-slave
    Configure the topology as a cluster-slave. This will configure the individual replicator as ac Extractor of all the nodes in the cluster, as defined in the previous configuration of the cluster topology.
  - --relay
    Configure the node as the relay for the cluster which will replicate data into the datawarehouse.
  - --enable-heterogeneous-slave
    Configures the Extractor to correctly process the incoming data so that it can be written to the datawarehouse. This includes correcting the processing of text data types and configuring the appropriate filters.
  - --replication-host
    The target host for writing data. In the case of Redshift, this is the fully qualified hostname of the Redshift host.
  - --replication-user
    The user within the Redshift service that will be used to write data into the database.
  - --replication-password=password
    The password for the user within the Redshift service that will be used to write data into the database.
  - --datasource-type=redshift
    Set the datasource type to be used when storing information about the replication state.
  - --batch-enabled=true
    Enable the batch service, this configures the JavaScript batch engine and CSV writing semantics to generate the data to be applied into a datawarehouse.
  - --batch-load-template=redshift
    The batch load template to be used. Since we are replicating into Redshift, the redshift template is used.
  - --redshift-dbname=dev
    The name of the database within the Redshift service where the data will be written.
  Please see Install Amazon Redshift Applier for more information.
- Vertica Target
  copy
```
shell> ./tools/tpm configure omega \
--batch-enabled=true  \
--batch-load-template=vertica6 \
--batch-load-language=js  \
--datasource-type=vertica  \
--disable-relay-logs=true \
--enable-heterogeneous-service=true \
--replication-user=dbadmin \
--replication-password=VERTICA_DB_PASSWORD_HERE \
--replication-host=VERTICA_HOST_NAME_HERE \
--replication-port=5433  \
--svc-applier-block-commit-interval=5s \
--svc-applier-block-commit-size=500  \
--vertica-dbname=VERTICA_DB_NAME_HERE
```
  Please see Install Vertica Applier for more information.
- For additional targets, please see the full list at Deploying Appliers, or click on some of the targets below:
Once the configuration has been completed, you can perform the installation to set up the Tungsten Replicator services using the tpm command run from the staging directory:
copy
```
shell> ./tools/tpm install
```

If the installation process fails, check the output of the /tmp/tungsten-configure.log file for more information about the root cause.

The Cluster-Extractor replicator should now be installed and ready to use.

Prev	Up	Next
3.10.3. Replicating from a Cluster to a Datawarehouse - Configuring the Cluster-Extractor	^Level	3.10.3.2. Replicating Data from a Cluster to a Datawarehouse (INI Use Case)

Continuent Documentation

3.10.3.1. Replicating Data from a Cluster to a Datawarehouse (Staging Use Case)

Important

Important