Tungsten Replicator 6.1 Manual

Question

G.1.

On a Tungsten Replicator Replica, how do I set both the local Replica THL listener port and the upstream Primaries THL listener port?

Answer 1

You need to specify two options: thl-port to set the Replica THL listener port and master-thl-port to define the upstream Primary THL listener port. Otherwise thl-port alone sets BOTH.

Answer 2

The most likely culprit for this issue is that the time is different on the machine in question. If you have ntp or a similar network time tool installed on your machine, use it to update the current time across all the hosts within your deployment:

shell> ntpdate pool.ntp.org

Once the command has been executed across all the hosts, trying sending a heartbeat on the Primary to Replicas and checking the latency:

shell> trepctl heartbeat

Answer 3

Both filters replicate.do and replicate.ignore will either do or ignore both DML and DDL

DDL is currently ONLY replicated for MySQL to MySQL or Oracle to Oracle topologies, or within MySQL Clusters, although it would be advisable not to use ignore/do filters in a clustered environment where data/structural integrity is key.

With replicate.do, all DML and DDL will be replicated ONLY for any database or table listed as part of the do filter.

With replicate.ignore, all DML and DDL will be replicated except for any database or table listed as part of the ignore filter.

Answer 4

You can change the configuration by running the following command from the staging directory:

shell> ./tools/tpm --host=host1 --java-mem-size=2048

Source	5.3	5.4	6.0	6.1	7.0
MySQL (5.0 to 5.6)	x	x	x	x	x
MySQL 5.7	x	x	x	x	x
MySQL 8		x		x	x
MariaDB (5.5, 10)	x	x	x	x	x
Amazon Aurora/RDS MySQL	x	x	x	x	x
Google Cloud MySQL	x	x	x	x	x
Microsoft Azure	x	x	x	x	x

Target	5.3	5.4	6.0	6.1	7.0
MySQL (incl MariaDB)	x	x	x	x	x
Amazon Aurora/RDS MySQL	x	x	x	x	x
Microsoft Azure	x	x	x	x	x
Google Cloud MySQL	x	x	x	x	x
Oracle (incl. Cloud)	x	x	x	x	x
PostgreSQL (incl. Cloud)	x	x	x	x	x
Hadoop	x	x	x	x	x
Vertica	x	x	x	x	x
Amazon Redshift	x	x	x	x	x
MongoDB	x	x	x	x	x
MongoDB Atlas				x (6.1.3)	x
Apache Kafka	x	x	x	x	x
Clickhouse				x	x

Tungsten Term	Traditional Term	Description
dataserver	Database	The database on a host. Datasources include MySQL, or Oracle.
datasource	Host or Node	One member of a dataservice and the associated Tungsten components.
staging host	-	The machine (and directory) from which Tungsten Replicator is installed and configured. The machine does not need to be the same as any of the existing hosts in the cluster.
staging directory	-	The directory where the installation files are located and the installer is executed. Further configuration and updates must be performed from this directory.

Deployment Type/Package	TAR/GZip	RPM
Staging Installation	Yes	No
INI File Configuration	Yes	Yes
Deploy Entire Cluster	Yes	No
Deploy Per Machine	Yes	Yes

Replication Operation Support
Statements Replicated	Yes, within MySQL/MySQL Topologies only
Rows Replicated	Yes
Schema Replicated	Yes, within MySQL/MySQL Topologies only
ddlscan Supported	Yes, supported for mixed MySQL, and data warehouse targets

Operation	Description	Notes
I	Row is an `INSERT` of new data
D	Row is `DELETE` of existing data
UI	Row is an `UPDATE` which caused `INSERT` of data
UD	Row is an `UPDATE` which caused `DELETE` of data

Option	Description
replicator.applier.dbms.embedCommitTime	Sets whether the commit time for the source row is embedded into the document
replicator.applier.dbms.embedSchemaTable	Embed the source schema name and table name in the stored document
replicator.applier.dbms.enabletxinfo.kafka	Embeds transaction information (generated by the rowaddtxninfo filter) into each Kafka message
replicator.applier.dbms.enabletxninfoTopic	Embeds transaction information into a separate Kafka message broadcast on an independent channel from the one used by the actual database data. One message is sent per transaction or THL event.
replicator.applier.dbms.keyFormat	Determines the format of the message ID
replicator.applier.dbms.requireacks	Defines whether when writing messages to the Kafka cluster, how many acknowledgements from Kafka nodes is required
replicator.applier.dbms.retrycount	The number of retries for sending each message
replicator.applier.dbms.txninfoTopic	Sets the topic name for transaction messages
replicator.applier.dbms.zookeeperString	Connection string for Zookeeper, including hostname and port

Option	`replicator.applier.dbms.embedCommitTime`
Description	Sets whether the commit time for the source row is embedded into the document
Value Type	boolean
Default	true
Valid Values	false	Do not embed the source database commit time
	true	Embed the source database commit time into the stored document

Option	`replicator.applier.dbms.embedSchemaTable`
Description	Embed the source schema name and table name in the stored document
Value Type	boolean
Default	true
Valid Values	false	Do not embed the schema or database name in the document
	true	Embed the source schema name and database name into the stored document

Option	`replicator.applier.dbms.enabletxinfo.kafka`
Description	Embeds transaction information (generated by the rowaddtxninfo filter) into each Kafka message
Value Type	boolean
Default	false
Valid Values	false	Do not include transaction information in each
	true	Embed transaction information into each Kafka message

Option	`replicator.applier.dbms.enabletxninfoTopic`
Description	Embeds transaction information into a separate Kafka message broadcast on an independent channel from the one used by the actual database data. One message is sent per transaction or THL event.
Value Type	boolean
Default	false
Valid Values	false	Do not generate transaction information
	true	Send transaction information on a separate Kafka topic for each transaction

Option	`replicator.applier.dbms.keyFormat`
Description	Determines the format of the message ID
Value Type	string
Default	pkey
Valid Values	pkey	Combine the primary key column values into a single string
	pkeyus	Combine the primary key column values into a single string joined by an underscore character
	tspkey	Combine the schema name, table name, and primary key column values into a single string joined by an underscore character
	tspkeyus	Combine the schema name, table name, and primary key column values into a single string

Option	`replicator.applier.dbms.requireacks`
Description	Defines whether when writing messages to the Kafka cluster, how many acknowledgements from Kafka nodes is required
Value Type	string
Default	all
Valid Values	1	Only the lead host should acknowledge receipt of the message
	all	All nodes should acknowledge receipt of the message

Option	`replicator.applier.dbms.retrycount`
Description	The number of retries for sending each message
Value Type	number
Default	0

Option	`replicator.applier.dbms.txninfoTopic`
Description	Sets the topic name for transaction messages
Value Type	string
Default	tungsten_transactions

MySQL	Redshift
Instance	Database
Database	Schema
Table	Table

Directory/File	Description
`/user/USERNAME`	Top-level directory for Tungsten Replicator information, using the configured replication user.
`/user/tungsten/metadata`	Location for metadata related to the replication operation
`/user/tungsten/metadata/alpha`	The directory (named after the servicename of the replicator service) that holds service-specific metadata
`/user/tungsten/staging`	Directory of the data transferred
`/user/tungsten/staging/servicename`	Directory of the data transferred from a specific servicename.
`/user/tungsten/staging/servicename/databasename`	Directory of the data transferred specific to a database.
`/user/tungsten/staging/servicename/databasename/tablename`	Directory of the data transferred specific to a table.
`/user/tungsten/staging/servicename/databasename/tablename/tablename-###.csv`	Filename of a single file of the data transferred for a specific table and database.

MySQL	Hadoop
Database	Directory
Table	Hive-compatible Character-Separated Text file
Row	Line in the text file, fields terminated by character `0x01`

MySQL Datatype	Oracle Datatype	Notes
`INT`	`NUMBER(10, 0)`
`BIGINT`	`NUMBER(19, 0)`
`TINYINT`	`NUMBER(3, 0)`
`SMALLINT`	`NUMBER(5, 0)`
`MEDIUMINT`	`NUMBER(7, 0)`
`DECIMAL(x,y)`	`NUMBER(x, y)`
`FLOAT`	`FLOAT`
`CHAR(n)`	`CHAR(n)`
`VARCHAR(n)`	`VARCHAR2(n)`	For sizes less than 2000 bytes data can be replicated. For lengths larger than 2000 bytes, the data will be truncated when written into Oracle
`DATE`	`DATE`
`DATETIME`	`DATE`
`TIMESTAMP`	`DATE`
`TEXT`	`CLOB`	Replicator can transform `TEXT` into `CLOB` or `VARCHAR(N)`. If you choose VARCHAR(N) on Oracle, the length of the data accepted by Oracle will be limited to 4000. This is limitation of Oracle. The size of `CLOB` columns within Oracle is calculated in terabytes. If `TEXT` fields on MySQL are known to be less than 4000 bytes (not characters) long, then `VARCHAR(4000)` can be used on Oracle. This may be faster than using `CLOB`.
`BLOB`	`BLOB`
`ENUM(...)`	`VARCHAR(255)`	Use the `EnumToString` filter
`SET(...)`	`VARCHAR(255)`	Use the `SetToString` filter

Option	`replicator.applier.dbms.zookeeperString`
Description	Connection string for Zookeeper, including hostname and port
Value Type	string
Default	${replicator.global.db.host}:2181

Command	Description
trepctl status	Shows basic variables including overall latency of Replica and number of apply channels
trepctl status -name shards	Shows the number of transactions for each shard
trepctl status -name stores	Shows the configuration and internal counters for stores between tasks
trepctl status -name tasks	Shows the number of transactions (events) and latency for each independent task in the replicator pipeline

Column	Description
`opcode`	A transaction code that has the value "I" for insert and "D" for delete. Other types are available.
`seqno`	The Tungsten transaction sequence number
`row_id`	A line number that starts with 1 and increments by 1 for each new row
`timestamp`	The commit timestamp, i.e. the origin timestamp of the committed statement that generated the row information.

Property	Description
`stageColumnPrefix`	Prefix for seqno, row_id, and opcode columns generated by Tungsten
`stageTablePrefix`	Prefix for stage table name
`stageSchemaPrefix`	Prefix for the schema in which the stage tables reside

Format	Field Separator	Record Separator	Escape Sequence	Escaped Characters	Null Policy	Null Value	Show Headers	Use Quotes	Quote String	Suppressed Characters
`hive`	`\u0001`	`\n`	`\\`	`\u0001\\`	`Use Null Value`	`\\N`	`false`	`false`		`\n\r`
`mysql`	`,`	`\n`	`\\`	`\\`	`Use Null Value`	`\\N`	`false`	`true`	`\"`
`oracle`	`,`	`\n`	`\\`	`\\`	`Use Null Value`	`\\N`	`false`	`true`	`\"`
`vertica`	`,`	`\n`	`\\`	`\\`	`Skip Value`		`false`	`true`	`\"`	`\n`
`redshift`	`,`	`\n`	`\"`		`Skip Value`		`false`	`true`	`\"`	`\n`

Operation	State
Node operating normally	`ONLINE`
Administrator puts node into offline state	`GOING-OFFLINE`
Node is offline	`OFFLINE:NORMAL`
Administrator puts node into online state	`GOING-ONLINE:SYNCHRONIZING`
Node catches up with Extractor	`ONLINE`

Operation	State
Node operating normally	`ONLINE`
Failure causes the node to go offline	`OFFLINE:ERROR`
Administrator fixes error and puts node into online state	`GOING-ONLINE:SYNCHRONIZING`
Node catches up with Extractor	`ONLINE`

Step	Description	Command	host1	host2	host3
1	Initial state		Primary	Replica	Replica
2	Switch Primary to `host2`	See Section 7.11, “Switching Primary Hosts”	Replica	Primary	Replica
3	Put Replica into OFFLINE state	trepctl -host host1 offline	Offline	Primary	Replica
4	Perform maintenance		Offline	Primary	Replica
5	Validate the `host1` server configuration	tpm validate	Offline	Primary	Replica
6	Put the Replica online	trepctl -host host1 online	Replica	Primary	Replica
7	Ensure the Replica has caught up	trepctl -host host1 status	Replica	Primary	Replica
8	Switch Primary back to `host1`	See Section 7.11, “Switching Primary Hosts”	Primary	Replica	Replica

Option	Description
`-c`	Report a critical status if the latency is above this level
`--perslave-perfdata`	Show the latency performance information on a per-Replica basis
`--perfdata`	Show the latency performance information
`-w`	Report a warning status if the latency is above this level

Option	Description
`-h`	Display the help text
`-port`	RMI port for the replicator being checked

Tungsten Replicator 6.1 Manual

Continuent Ltd

Preface

1. Legal Notice

2. Conventions

3. Quickstart Guide

Chapter 1. Introduction

1.1. Tungsten Replicator

1.1.1. Extractor

1.1.2. Appliers

1.1.3. Transaction History Log (THL)

1.1.4. Filtering

Chapter 2. Deployment Overview

2.1. Deployment Sources

2.1.1. Using the TAR/GZipped files

2.1.2. Using the RPM package files

Note

2.2. Best Practices

2.2.1. Best Practices: Deployment

2.2.2. Best Practices: Upgrade

2.2.3. Best Practices: Operations

2.2.4. Best Practices: Maintenance

2.3. Common tpm Options During Deployment

2.4. Starting and Stopping Tungsten Replicator

Note

2.5. Configuring Startup on Boot

2.6. Removing Datasources from a Deployment

2.6.1. Removing a Datasource from an Existing Deployment

2.7. Understanding Deployment Styles and Topologies

2.7.1. Tungsten Replicator Extraction Operation

2.7.2. Understanding Deployment Models

2.7.3. Understanding Deployment Topologies

2.7.3.1. Simple Primary/Replica Topology

2.7.3.2. Active/Active Topology

2.7.3.3. Fan-Out Topology

2.7.3.4. Fan-In Topology

2.7.3.5. Replicating in/out of an existing Tungsten Cluster

2.8. Understanding Heterogeneous Deployments

2.8.1. How Heterogeneous Replication Works

2.8.1.1. JDBC Applier based Replication

2.8.1.2. Native Applier Replication (e.g. MongoDB)

2.8.1.3. Batch Loading

2.8.1.4. Schema Creation and Replication

Chapter 3. Deploying MySQL Extractors

3.1. MySQL Replication Pre-Requisites

3.2. Deploying a Primary/Replica Topology

Warning

Warning

Warning

3.2.1. Monitoring the MySQL Extractor

3.3. Deploying an Extractor for Amazon Aurora

Warning

Warning

Warning

Warning

Warning

3.3.1. Changing Amazon RDS/Aurora Instance Configurations

3.3.1.1. Changing Amazon RDS using command line functions

3.3.1.2. Changing Amazon Aurora Parameters using AWS Console

3.4. Replicating Data Out of a Cluster

3.4.1. Prepare: Replicating Data Out of a Cluster

3.4.2. Deploy: Replicating Data Out of a Cluster

Important

Important

Note

Chapter 4. Deploying Appliers

4.1. Deploying the MySQL Applier

4.1.1. Preparing for MySQL Replication

4.1.2. Prepare Amazon RDS/Amazon Aurora

4.1.3. Install MySQL Applier

4.1.3.1. Local and Remote MySQL Targets

4.1.3.2. Amazon RDS and Amazon Aurora Targets

Warning

Warning

Warning

Warning

Warning

4.1.4. Management and Monitoring of MySQL Deployments

4.2. Deploying the Amazon Redshift Applier

4.2.1. Redshift Replication Operation

4.6.4.5.1. Errors Reading/Writing `commitseqno.0` File

Option	Description
`-h`	Display the help text.
`-r`	Check the replication services status.

Option	Description
`-conf path`	Path to a static-{svc}.properties file to read JDBC connection address and credentials
`-db db`	Database to use (will substitute ${DBNAME} in the URL, if needed)
`-opt opt val`	Option(s) to pass to template, try: -opt help me
`-out file`	Render to file (print to stdout if not specified)
`-pass secret`	JDBC password
`-path path`	Add additional search path for loading Velocity templates
`-rename file`	Definitions file for renaming schemas, tables and columns
`-service name`	Name of a replication service instead of path to config
`-tableFile file`	New-line separated definitions file of tables to find
`-tables regex`	Comma-separated list of tables to find
`-template file`	Specify template file to render
`-url jdbcUrl`	JDBC connection string (use single quotes to escape)
`-user user`	JDBC username

file	Description
`ddl-check-pkeys.vm`	Reports which tables are without primary key definitions
`ddl-mysql-hive-0.10.vm`	Generates DDL from a MySQL host suitable for the base tables in a Hadoop/Hive Environment
`ddl-mysql-hive-0.10-staging.vm`	Generates DDL from a MySQL host suitable for the staging tables in a Hadoop/Hive Environment
`ddl-mysql-hive-metadata.vm`	Generates metadata as JSON to be used within a Hadoop/Hive Environment
`ddl-mysql-oracle.vm`	Generates Oracle schema from a MySQL schema
`ddl-mysql-oracle-cdc.vm`	Generates Oracle tables with CDC capture information from a MySQL schema
`ddl-mysql-redshift.vm`	Generates DDL from a MySQL host suitable for the base tables in Amazon Redshift.
`ddl-mysql-redshift-staging.vm`	Generates DDL from a MySQL host suitable for the staging tables in Amazon Redshift.
`ddl-mysql-vertica.vm`	Generates DDL suitable for the base tables in HP Vertica
`ddl-mysql-vertica-staging.vm`	Generates DDL suitable for the staging tables in HP Vertica
`ddl-oracle-mysql.vm`	Generates DDL for MySQL tables from an Oracle schema
`ddl-oracle-mysql-pk-only.vm`	Generates Primary Key DDL statements from an Oracle database for MySQL

MySQL Datatype	Hive Datatype
`DATETIME`	`STRING`
`TIMESTAMP`	`TIMESTAMP`
`DATE`	`STRING`
`YEAR`	`INT`
`TIME`	`STRING`
`TINYINT`	`TINYINT`
`TINYINT UNSIGNED`	`SMALLINT`
`SMALLINT`	`SMALLINT`
`SMALLINT UNSIGNED`	`INT`
`MEDIUMINT`	`INT`
`INT`	`INT`
`INT UNSIGNED`	`BIGINT`
`BIGINT`	`BIGINT`
`BIGINT UNSIGNED`	`STRING`
`DECIMAL`	`STRING`
`VARCHAR`	`STRING`
`CHAR`	`STRING`
`BINARY`	`BINARY`
`VARBINARY`	`BINARY`
`TEXT`	`STRING`
`BLOB`	`BINARY`
`FLOAT`	`DOUBLE`
`DOUBLE`	`DOUBLE`
`ENUM`	`STRING`
`SET`	`STRING`
`BIT`	`STRING`

Oracle Datatype	MySQL Datatype
`DATE`	`DATETIME`
`NUMBER(0)`	`NUMERIC`
`NUMBER(n) where n < 19`	`INT`
`NUMBER(n) where n > 19`	`BIGINT`
`NUMBER(n) where n < 3`	`TINYINT`
`NUMBER(n) where n < 5`	`SMALLINT`
`NUMBER(n) where n < 7`	`MEDIUMINT`
`NUMBER(n) where n < 10`	`INT`
`NUMBER(n) where n < 19`	`BIGINT`
`NUMBER`	`DECIMAL`
`FLOAT`	`FLOAT`
`VARCHAR`	`VARCHAR`
`LONG`	`LONGTEXT`
`BFILE`	`VARCHAR(1024)`
`CHAR`	`CHAR`
`CLOB`	`LONGTEXT`
`BLOB`	`LONGBLOB`
`LONG RAW`	`LONGBLOB`
`TIMESTAMP`	`TIMESTAMP`
`RAW`	`VARBINARY`

Option	Description
get	Return the available position information
help	Print the help display
reset	Clear the datasource position information
set	Set the position

Option	Description
`-conf`	Path to the static services properties file
`-ds`	Name of the datasource
`-service`	Name of the replication service to get information from

Option	Description
`-epoch`	Epoch Number
`-event-id`	Source Event ID
`-reset`	Resets the datasources before performing set operation
`-seqno`	Sequence number
`-source-id`	Source ID

Option	Description
`--after {TIMESTAMP}`	Discard all lines prior to {TIMESTAMP}
`--before {TIMESTAMP}`	Discard all lines after {TIMESTAMP}
`--debug`, `--d`
`--extension`, `--X`	Specify the file extension (default: log) Do NOT include the period
`--help`, `-h`	Show help text
`--log-limit`, `--L`	Specify the quantity of log files to gather (default: unlimited). When the wrapper rotates log files, it appends a period and an integer to the end of the log file name, when .1 is the newest and .2 is older and .3 older than that, etc. This parameter will gather the base log file plus limit minus one rotated files.
`--quiet`, `--q`
`--replicator`, `--R`	Include the replicator log files
`--stdout`, `--O`	Send merged logs to STDOUT instead of a file
`--verbose`, `-v`	Show verbose output

Option	Description
`--by-service`	Sort the output by the service name
`--fields`	Fields to be output during during summary
`--host`, `--hosts`	Host or hosts on which to limit output
`--output`	Specify the output format
`--paths`, `--path`	Directory or directories to check when looking for tools
`--role`, `--roles`	Role or roles on which to limit output
`--service`, `--services`	Service or services on which to limit output
`--skip-headers`	Skip the headers
`--sort-by`	Sort by a specified field