3.10.3.2. Replicating Data from a Cluster to a Datawarehouse (INI Use Case)

3.10.3.2. Replicating Data from a Cluster to a Datawarehouse (INI Use Case)
Prev	^Up	3.10.3. Replicating from a Cluster to a Datawarehouse - Configuring the Cluster-Extractor	Next

3.10.3.2. Replicating Data from a Cluster to a Datawarehouse (INI Use Case)

The following INI-based procedure will install the Tungsten Replicator software onto target node host6, extracting from a cluster consisting of three (3) nodes (host1, host2 and host3) and applying into the target datawarehouse via host6.

Important

If you are replicating to a MySQL-specific target, please see Deploying the MySQL Applier for more information.

On the Cluster-Extractor node, copy the convertstringfrommysql.json filter configuration sample file into the /opt/replicator/share directory then edit it to suit:
copy
```
cp /opt/replicator/tungsten/tungsten-replicator/support/filters-config/convertstringfrommysql.json /opt/replicator/share/
vi /opt/replicator/share/convertstringfrommysql.json
```
Once the convertstringfrommysql JSON configuration file has been edited, update the /etc/tungsten/tungsten.ini file to add and configure any addition options needed for the specific datawarehouse you are using.
Create the configuration file /etc/tungsten/tungsten.ini on the destination DBMS host, i.e. host6:
copy
```
[defaults]
user=tungsten
install-directory=/opt/replicator
replication-user=tungsten
replication-password=secret
replication-port=3306
profile-script=~/.bashrc
mysql-allow-intensive-checks=true
start-and-report=true

[alpha]
topology=cluster-alias
master=host1
members=host1,host2,host3
thl-port=2112

[omega]
topology=cluster-slave
relay=host6
relay-source=alpha
repl-svc-remote-filters=convertstringfrommysql
property=replicator.filter.convertstringfrommysql.definitionsFile=/opt/replicator/share/convertstringfrommysql.json
```
The description of each of the options is shown below; click the icon to hide this detail:
Click the icon to show a detailed description of each argument.
- [defaults]
  defaults indicates that we are setting options which will apply to all cluster dataservices.
- user=tungsten
  The operating system user name that you have created for the Tungsten service, tungsten.
- install-directory=/opt/replicator
  The installation directory of the Tungsten Replicator service. This is where the replicator software will be installed on the destination DBMS server.
- replication-user=tungsten
  The MySQL user name to use when connecting to the MySQL database.
- replication-password=secret
  The MySQL password for the user that will connect to the MySQL database.
- replication-port=3306
  The TCP/IP port on the destination DBMS server that is listening for connections.
- start-and-report=true
  Tells tpm to startup the service, and report the current configuration and status.
- profile-script=~/.bashrc
  Tells tpm to add PATH information to the specified script to initialize the Tungsten Replicator environment.
- [alpha]
  alpha is the name and identity of the source cluster alias being created.
  This definition is for a dataservice alias, not an actual dataservice because topology=cluster-alias has been specified. This alias is used in the cluster-slave section to define the source hosts for replication.
- topology=cluster-alias
  Define this as a cluster dataservice alias so tpm does not try to install cluster software to the hosts.
- members=host1,host2,host3
  A comma separated list of all the hosts that are part of this cluster dataservice.
- master=host1
  The hostname of the server that is the current cluster Primary MySQL server.
- thl-port=2112
  The THL port for the cluster. The default value is 2112 but any other value must be specified.
- [omega]
  omega is is the unique service name for the replication stream from the cluster.
  This replication service will extract data from cluster dataservice alpha and apply into the database on the DBMS server specified by relay=host6.
- topology=cluster-slave
  Tells tpm this is a Cluster-Extractor replication service which will have a list of all source cluster nodes available.
- relay=host6
  The hostname of the destination DBMS server.
- relay-source=alpha
  Specifies the name of the source cluster dataservice alias (defined above) that will be used to read events to be replicated.
Important
The cluster-alias name (i.e. alpha) MUST be the same as the cluster dataservice name that you are replicating from.
Note
Do not include start-and-report=true if you are taking over for MySQL native replication. See Section 6.12.1, “Migrating from MySQL Native Replication 'In-Place'” for next steps after completing installation.
Now finish configuring the omega dataservice with the options specific to the datawarehouse target in use.
Append the appropriate code snippet below to the bottom of the existing [omega] stanza:
- AWS RedShift Target - Offboard Batch Applier
  copy
```
batch-enabled=true
batch-load-template=redshift
datasource-type=redshift
enable-heterogeneous-slave=true
replication-host=REDSHIFT_ENDPOINT_FQDN_HERE
replication-user=REDSHIFT_PASSWORD_HERE
replication-password=REDSHIFT_PASSWORD_HERE
redshift-dbname=REDSHIFT_DB_NAME_HERE
svc-applier-filters=dropstatementdata
svc-applier-block-commit-interval=1m
svc-applier-block-commit-size=5000
```
  The description of each of the options is shown below; click the icon to hide this detail:
  Click the icon to show a detailed description of each argument.
  - --topology=cluster-slave
    Configure the topology as a Cluster-Extractor. This will configure the individual replicator as an trext; of all the nodes in the cluster, as defined in the previous configuration of the cluster topology.
  - --relay
    Configure the node as the relay for the cluster which will replicate data into the datawarehouse.
  - --enable-heterogeneous-slave=true
    Configures the Extractor to correctly process the incoming data so that it can be written to the datawarehouse. This includes correcting the processing of text data types and configuring the appropriate filters.
  - --replication-host
    The target host for writing data. In the case of Redshift, this is the fully qualified hostname of the Redshift host.
  - --replication-user
    The user within the Redshift service that will be used to write data into the database.
  - --replication-password=password
    The password for the user within the Redshift service that will be used to write data into the database.
  - --datasource-type=redshift
    Set the datasource type to be used when storing information about the replication state.
  - --batch-enabled=true
    Enable the batch service, this configures the JavaScript batch engine and CSV writing semantics to generate the data to be applied into a datawarehouse.
  - --batch-load-template=redshift
    The batch load template to be used. Since we are replicating into Redshift, the redshift template is used.
  - --redshift-dbname=dev
    The name of the database within the Redshift service where the data will be written.
  Please see Install Amazon Redshift Applier for more information.
- Vertica Target - Onboard/Offboard Batch Applier
  copy
```
batch-enabled=true 
batch-load-template=vertica6
batch-load-language=js 
datasource-type=vertica
disable-relay-logs=true
enable-heterogeneous-service=true
replication-user=dbadmin
replication-password=VERTICA_DB_PASSWORD_HERE
replication-host=VERTICA_HOST_NAME_HERE
replication-port=5433
svc-applier-block-commit-interval=5s
svc-applier-block-commit-size=500
vertica-dbname=VERTICA_DB_NAME_HERE
```
  Please see Install Vertica Applier for more information.
- For additional targets, please see the full list at Deploying Appliers, or click on some of the targets below:
Download and install the latest Tungsten Replicator package (.rpm), or download the compressed tarball and unpack it on host6:
copy
```
shell> cd /opt/continuent/software
shell> tar xvzf tungsten-replicator-7.1.4-10.tar.gz
```
Change to the Tungsten Replicator staging directory:
copy
```
shell> cd tungsten-replicator-7.1.4-10
```
Run tpm to install the Tungsten Replicator software with the INI-based configuration:
copy
```
shell > ./tools/tpm install
```
During the installation and startup, tpm will notify you of any problems that need to be fixed before the service can be correctly installed and started. If the service starts correctly, you should see the configuration and current status of the service.

If the installation process fails, check the output of the /tmp/tungsten-configure.log file for more information about the root cause.

The Cluster-Extractor replicator should now be installed and ready to use.

Prev	Up	Next
3.10.3.1. Replicating Data from a Cluster to a Datawarehouse (Staging Use Case)	^Level	3.11. Migrating and Seeding Data

Continuent Documentation

3.10.3.2. Replicating Data from a Cluster to a Datawarehouse (INI Use Case)

Important

Important

Note