Tungsten Replicator 5.4 Manual

Continuent Ltd


This manual documents Tungsten Replicator 5.4. This includes information for:

  • Tungsten Replicator

  • Tungsten Replicator for Analytics and Big Data

Build date: 2021-10-22 (3f4f503b)

Up to date builds of this document: Tungsten Replicator 5.4 Manual (Online), Tungsten Replicator 5.4 Manual (PDF)

Table of Contents

1. Legal Notice
2. Conventions
3. Quickstart Guide
1. Introduction
1.1. Tungsten Replicator
1.1.1. Extractor
1.1.2. Appliers
1.1.3. Transaction History Log (THL)
1.1.4. Filtering
2. Deployment Overview
2.1. Deployment Sources
2.1.1. Using the TAR/GZipped files
2.1.2. Using the RPM package files
2.2. Best Practices
2.2.1. Best Practices: Deployment
2.2.2. Best Practices: Operations
2.2.3. Best Practices: Maintenance
2.3. Common tpm Options During Deployment
2.4. Starting and Stopping Tungsten Replicator
2.5. Configuring Startup on Boot
2.6. Removing Datasources from a Deployment
2.6.1. Removing a Datasource from an Existing Deployment
2.7. Understanding Deployment Styles and Topologies
2.7.1. Tungsten Replicator Extraction Operation
2.7.2. Understanding Deployment Models
2.7.3. Understanding Deployment Topologies
2.8. Understanding Heterogeneous Deployments
2.8.1. How Heterogeneous Replication Works
3. Deploying MySQL Extractors
3.1. MySQL Replication Pre-Requisites
3.2. Deploying a Primary/Replica Topology
3.2.1. Monitoring the MySQL Extractor
3.3. Deploying an Extractor for Amazon Aurora
3.3.1. Changing Amazon RDS/Aurora Instance Configurations
3.4. Replicating Data Out of a Cluster
3.4.1. Prepare: Replicating Data Out of a Cluster
3.4.2. Deploy: Replicating Data Out of a Cluster
4. Deploying Appliers
4.1. Deploying the MySQL Applier
4.1.1. Preparing for MySQL Replication
4.1.2. Prepare Amazon RDS/Amazon Aurora
4.1.3. Install MySQL Applier
4.1.4. Management and Monitoring of MySQL Deployments
4.2. Deploying the Amazon Redshift Applier
4.2.1. Redshift Replication Operation
4.2.2. Preparing for Amazon Redshift Replication
4.2.3. Install Amazon Redshift Applier
4.2.4. Verifying your Redshift Installation
4.2.5. Keeping CDC Information
4.2.6. Management and Monitoring of Amazon Redshift Deployments
4.3. Deploying the Vertica Applier
4.3.1. Preparing for Vertica Deployments
4.3.2. Install Vertica Applier
4.3.3. Management and Monitoring of Vertica Deployments
4.3.4. Troubleshooting Vertica Installations
4.4. Deploying the Kafka Applier
4.4.1. Preparing for Kafka Replication
4.4.2. Install Kafka Applier
4.4.3. Management and Monitoring of Kafka Deployments
4.5. Deploying the MongoDB Applier
4.5.1. MongoDB Atlas Replication
4.5.2. Preparing for MongoDB Replication
4.5.3. Install MongoDB Applier
4.5.4. Install MongoDB Atlas Applier
4.5.5. Management and Monitoring of MongoDB Deployments
4.6. Deploying the Hadoop Applier
4.6.1. Hadoop Replication Operation
4.6.2. Preparing for Hadoop Replication
4.6.3. Replicating into Kerberos Secured HDFS
4.6.4. Install Hadoop Replication
4.7. Deploying the Oracle Applier
4.7.1. Preparing for Oracle Replication
4.7.2. Install Oracle Applier
5. Advanced Deployments
5.1. Deploying a Fan-In Topology
5.1.1. Management and Monitoring Fan-in Deployments
5.2. Deploying Multiple Replicators on a Single Host
5.2.1. Preparing Multiple Replicators
5.2.2. Install Multiple Replicators
5.2.3. Best Practices: Multiple Replicators
5.3. Replicating Data Into an Existing Dataservice
5.4. Deploying Parallel Replication
5.4.1. Application Prerequisites for Parallel Replication
5.4.2. Enabling Parallel Apply During Install
5.4.3. Channels
5.4.4. Parallel Replication and Offline Operation
5.4.5. Adjusting Parallel Replication After Installation
5.4.6. Monitoring Parallel Replication
5.4.7. Controlling Assignment of Shards to Channels
5.4.8. Disk vs. Memory Parallel Queues
5.5. Batch Loading for Data Warehouses
5.5.1. How It Works
5.5.2. Important Limitations
5.5.3. Batch Applier Setup
5.5.4. JavaScript Batchloader Scripts
5.5.5. Staging Tables
5.5.6. Character Sets
5.5.7. Supported CSV Formats
5.5.8. Columns in Generated CSV Files
5.5.9. Batchloading Opcodes
5.5.10. Time Zones
5.5.11. Data File Partitioning
5.6. Deployment Security
5.6.1. Enabling Security
5.6.2. Disabling Security
5.6.3. Creating Suitable Certificates
5.6.4. Installing from a Staging Host with Manually Generated Certificates
5.6.5. Installing via INI File with Manually Generated Certificates
5.6.6. Replacing the TLS Certificate from a Staging Directory
5.6.7. Removing TLS Encryption from a Staging Directory
5.6.8. Handling Database Level Security
5.6.9. Configuring Connector SSL
5.6.10. Creating the Truststore and Keystore
6. Operations Guide
6.1. The Home Directory
6.2. Establishing the Shell Environment
6.3. Understanding Replicator Roles
6.4. Checking Replication Status
6.4.1. Understanding Replicator States
6.4.2. Replicator States During Operations
6.4.3. Changing Replicator States
6.5. Managing Transaction Failures
6.5.1. Identifying a Transaction Mismatch
6.5.2. Skipping Transactions
6.6. Provision or Reprovision a Replica
6.7. Creating a Backup
6.7.1. Using a Different Backup Tool
6.7.2. Using a Different Directory Location
6.7.3. Creating an External Backup
6.8. Restoring a Backup
6.8.1. Restoring a Specific Backup
6.8.2. Restoring an External Backup
6.8.3. Restoring from Another Replica
6.8.4. Manually Recovering from Another Replica
6.8.5. Reprovision a MySQL Replica using rsync
6.9. Deploying Automatic Replicator Recovery
6.10. Migrating and Seeding Data
6.10.1. Migrating from MySQL Native Replication 'In-Place'
6.10.2. Seeding Data for Heterogeneous Replication
6.11. Switching Primary Hosts
6.12. Configuring Parallel Replication
6.13. Performing Database or OS Maintenance
6.13.1. Performing Maintenance on a Single Replica
6.13.2. Performing Maintenance on a Primary
6.13.3. Performing Maintenance on an Entire Dataservice
6.13.4. Upgrading or Updating your JVM
6.14. Upgrading Tungsten Replicator
6.14.1. Upgrading Tungsten Replicator using tpm
6.14.2. Installing an Upgraded JAR Patch
6.14.3. Installing Patches
6.15. Monitoring Tungsten Cluster
6.15.1. Managing Log Files with logrotate
6.15.2. Monitoring Status Using cacti
6.15.3. Monitoring Status Using nagios
6.16. Rebuilding THL on the Primary
7. Command-line Tools
7.1. The clean_release_directory Command
7.2. The check_tungsten_latency Command
7.3. The check_tungsten_online Command
7.4. The check_tungsten_services Command
7.5. The deployall Command
7.6. The ddlscan Command
7.6.1. Optional Arguments
7.6.2. Supported Templates and Usage
7.7. The dsctl Command
7.7.1. dsctl get Command
7.7.2. dsctl set Command
7.7.3. dsctl reset Command
7.7.4. dsctl help Command
7.8. env.sh Script
7.9. The load-reduce-check Tool
7.9.1. Generating Staging DDL
7.9.2. Generating Live DDL
7.9.3. Materializing a View
7.9.4. Generating Sqoop Load Commands
7.9.5. Generating Metadata
7.9.6. Compare Loaded Data
7.10. The materialize Command
7.11. The multi_trepctl Command
7.11.1. multi_trepctl Options
7.11.2. multi_trepctl Commands
7.12. The tungsten_newrelic_event Command
7.13. The query Command
7.14. The replicator Command
7.15. The startall Command
7.16. The stopall Command
7.17. The thl Command
7.17.1. thl Position Commands
7.17.2. thl list Command
7.17.3. thl index Command
7.17.4. thl purge Command
7.17.5. thl info Command
7.17.6. thl help Command
7.18. The trepctl Command
7.18.1. trepctl Options
7.18.2. trepctl Global Commands
7.18.3. trepctl Service Commands
7.19. The tpasswd Command
7.20. The tungsten_health_check Script
7.21. The tungsten_monitor Script
7.22. The tungsten_provision_thl Command
7.22.1. Provisioning from RDS
7.22.2. tungsten_provision_thl Reference
7.23. The tungsten_provision_slave Script
7.24. The tungsten_read_master_events Script
7.25. The tungsten_send_diag Script
7.26. The tungsten_set_position Script
7.27. The tungsten_skip_seqno Script
7.28. The undeployall Command
8. The tpm Deployment Command
8.1. Comparing Staging and INI tpm Methods
8.2. Processing Installs and Upgrades
8.3. tpm Staging Configuration
8.3.1. Configuring default options for all services
8.3.2. Configuring a single service
8.3.3. Configuring a single host
8.3.4. Reviewing the current configuration
8.3.5. Installation
8.3.6. Upgrades from a Staging Directory
8.3.7. Configuration Changes from a Staging Directory
8.3.8. Converting from INI to Staging
8.4. tpm INI File Configuration
8.4.1. Creating an INI file
8.4.2. Installation with INI File
8.4.3. Upgrades with an INI File
8.4.4. Configuration Changes with an INI file
8.4.5. Converting from Staging to INI
8.4.6. Using the translatetoini.pl Script
8.5. tpm Commands
8.5.1. tpm ask Command
8.5.2. tpm configure Command
8.5.3. tpm diag Command
8.5.4. tpm fetch Command
8.5.5. tpm firewall Command
8.5.6. tpm help Command
8.5.7. tpm install Command
8.5.8. tpm mysql Command
8.5.9. tpm query Command
8.5.10. tpm reset Command
8.5.11. tpm reset-thl Command
8.5.12. tpm reverse Command
8.5.13. tpm ssh-copy-cert Command
8.5.14. tpm uninstall Command
8.5.15. tpm update Command
8.5.16. tpm validate Command
8.5.17. tpm validate-update Command
8.6. tpm Common Options
8.7. tpm Validation Checks
8.8. tpm Configuration Options
8.8.1. A tpm Options
8.8.2. B tpm Options
8.8.3. C tpm Options
8.8.4. D tpm Options
8.8.5. E tpm Options
8.8.6. F tpm Options
8.8.7. H tpm Options
8.8.8. I tpm Options
8.8.9. J tpm Options
8.8.10. L tpm Options
8.8.11. M tpm Options
8.8.12. N tpm Options
8.8.13. O tpm Options
8.8.14. P tpm Options
8.8.15. R tpm Options
8.8.16. S tpm Options
8.8.17. T tpm Options
8.8.18. U tpm Options
8.8.19. V tpm Options
8.8.20. W tpm Options
9. Replication Filters
9.1. Enabling/Disabling Filters
9.2. Enabling Additional Filters
9.3. Filter Status
9.4. Filter Reference
9.4.1. ansiquotes.js Filter
9.4.2. BidiRemoteSlave (BidiSlave) Filter
9.4.3. breadcrumbs.js Filter
9.4.4. CaseTransform Filter
9.4.5. ColumnName Filter
9.4.6. ConvertStringFromMySQL Filter
9.4.7. DatabaseTransform (dbtransform) Filter
9.4.8. dbrename.js Filter
9.4.9. dbselector.js Filter
9.4.10. dbupper.js Filter
9.4.11. dropcolumn.js Filter
9.4.12. dropcomments.js Filter
9.4.13. dropmetadata.js Filter
9.4.14. dropstatementdata.js Filter
9.4.15. Dummy Filter
9.4.16. EnumToString Filter
9.4.17. EventMetadata Filter
9.4.18. foreignkeychecks.js Filter
9.4.19. Heartbeat Filter
9.4.20. insertsonly.js Filter
9.4.21. Logging Filter
9.4.22. MySQLSessionSupport (mysqlsessions) Filter
9.4.23. NetworkClient Filter
9.4.24. nocreatedbifnotexists.js Filter
9.4.25. OptimizeUpdates Filter
9.4.26. PrimaryKey Filter
9.4.27. PrintEvent Filter
9.4.28. Rename Filter
9.4.29. Replicate Filter
9.4.30. ReplicateColumns Filter
9.4.31. Row Add Database Name Filter
9.4.32. SetToString Filter
9.4.33. Shard Filter
9.4.34. shardbyseqno.js Filter
9.4.35. shardbytable.js Filter
9.4.36. SkipEventByType Filter
9.4.37. TimeDelay (delay) Filter
9.4.38. tosingledb.js Filter
9.4.39. truncatetext.js Filter
9.4.40. zerodate2null.js Filter
9.5. Standard JSON Filter Configuration
9.5.1. Rule Handling and Processing
9.5.2. Schema, Table, and Column Selection
9.6. JavaScript Filters
9.6.1. Writing JavaScript Filters
9.6.2. Installing Custom JavaScript Filters
10. Performance and Tuning
10.1. Block Commit
10.1.1. Monitoring Block Commit Status
10.2. Improving Network Performance
10.3. Tungsten Replicator Block Commit and Memory Usage
A. Release Notes
A.1. Tungsten Replicator 5.4.1 GA (28 October 2019)
A.2. Tungsten Replicator 5.4.0 GA (31 July 2019)
B. Prerequisites
B.1. Requirements
B.1.1. Operating Systems Support
B.1.2. Database Support
B.1.3. RAM Requirements
B.1.4. Disk Requirements
B.1.5. Java Requirements
B.1.6. Cloud Deployment Requirements
B.1.7. Docker Support Policy
B.2. Staging Host Configuration
B.3. Host Configuration
B.3.1. Creating the User Environment
B.3.2. Configuring Network and SSH Environment
B.3.3. Directory Locations and Configuration
B.3.4. Configure Software
B.3.5. sudo Configuration
B.3.6. SELinux Configuration
B.4. MySQL Database Setup
B.4.1. MySQL Version Support
B.4.2. MySQL Configuration
B.4.3. MySQL Configuration for Active/Active Deployments
B.4.4. MySQL Configuration for Heterogeneous Deployments
B.4.5. MySQL User Configuration
B.4.6. MySQL Unprivileged Users
B.5. Prerequisite Checklist
C. Troubleshooting
C.1. Contacting Support
C.1.1. Support Request Procedure
C.1.2. Creating a Support Account
C.1.3. Generating Diagnostic Information
C.1.4. Open a Support Ticket
C.1.5. Open a Support Ticket via Email
C.1.6. Getting Updates for all Company Support Tickets
C.1.7. Support Severity Level Definitions
C.1.8. Generating Advanced Diagnostic Information
C.2. Error/Cause/Solution
C.2.1. MySQLExtractException: unknown data type 0
C.2.2. Services requires a reset
C.2.3. OptimizeUpdatesFilter cannot filter, because column and key count is different. Make sure that it is defined before filters which remove keys (eg. PrimaryKeyFilter)
C.2.4. Unable to update the configuration of an installed directory
C.2.5. Too many open processes or files
C.2.6. There were issues configuring the sandbox MySQL server
C.2.7. Attempt to write new log record with equal or lower fragno: seqno=3 previous stored fragno=32767 attempted new fragno=-32768
C.2.8. The session variable SQL_MODE when set to include ALLOW_INVALID_DATES does not apply statements correctly on the Replica.
C.2.9. Replicator runs out of memory
C.3. Known Issues
C.3.1. Triggers
C.4. Troubleshooting Timeouts
C.5. Troubleshooting Backups
C.6. Running Out of Diskspace
C.7. Troubleshooting SSH and tpm
C.8. Troubleshooting Data Differences
C.8.1. Identify Structural Differences
C.8.2. Identify Data Differences
C.9. Comparing Table Data
C.10. Troubleshooting Memory Usage
D. Files, Directories, and Environment
D.1. The Tungsten Cluster Install Directory
D.1.1. The backups Directory
D.1.2. The releases Directory
D.1.3. The service_logs Directory
D.1.4. The share Directory
D.1.5. The thl Directory
D.1.6. The tungsten Directory
D.2. Log Files
D.3. Environment Variables
E. Terminology Reference
E.1. Transaction History Log (THL)
E.1.1. THL Format
E.2. Generated Field Reference
E.2.1. Terminology: Fields masterConnectUri
E.2.2. Terminology: Fields masterListenUri
E.2.3. Terminology: Fields accessFailures
E.2.4. Terminology: Fields active
E.2.5. Terminology: Fields activeSeqno
E.2.6. Terminology: Fields appliedLastEventId
E.2.7. Terminology: Fields appliedLastSeqno
E.2.8. Terminology: Fields appliedLatency
E.2.9. Terminology: Fields applier.class
E.2.10. Terminology: Fields applier.name
E.2.11. Terminology: Fields applyTime
E.2.12. Terminology: Fields autoRecoveryEnabled
E.2.13. Terminology: Fields autoRecoveryTotal
E.2.14. Terminology: Fields averageBlockSize
E.2.15. Terminology: Fields blockCommitRowCount
E.2.16. Terminology: Fields cancelled
E.2.17. Terminology: Fields channel
E.2.18. Terminology: Fields channels
E.2.19. Terminology: Fields clusterName
E.2.20. Terminology: Fields commits
E.2.21. Terminology: Fields committedMinSeqno
E.2.22. Terminology: Fields criticalPartition
E.2.23. Terminology: Fields currentBlockSize
E.2.24. Terminology: Fields currentEventId
E.2.25. Terminology: Fields currentLastEventId
E.2.26. Terminology: Fields currentLastFragno
E.2.27. Terminology: Fields currentLastSeqno
E.2.28. Terminology: Fields currentTimeMillis
E.2.29. Terminology: Fields dataServerHost
E.2.30. Terminology: Fields discardCount
E.2.31. Terminology: Fields doChecksum
E.2.32. Terminology: Fields estimatedOfflineInterval
E.2.33. Terminology: Fields eventCount
E.2.34. Terminology: Fields extensions
E.2.35. Terminology: Fields extractTime
E.2.36. Terminology: Fields extractor.class
E.2.37. Terminology: Fields extractor.name
E.2.38. Terminology: Fields filter.#.class
E.2.39. Terminology: Fields filter.#.name
E.2.40. Terminology: Fields filterTime
E.2.41. Terminology: Fields flushIntervalMillis
E.2.42. Terminology: Fields fsyncOnFlush
E.2.43. Terminology: Fields headSeqno
E.2.44. Terminology: Fields intervalGuard
E.2.45. Terminology: Fields lastCommittedBlockSize
E.2.46. Terminology: Fields lastCommittedBlockTime
E.2.47. Terminology: Fields latestEpochNumber
E.2.48. Terminology: Fields logConnectionTimeout
E.2.49. Terminology: Fields logDir
E.2.50. Terminology: Fields logFileRetainMillis
E.2.51. Terminology: Fields logFileSize
E.2.52. Terminology: Fields maxChannel
E.2.53. Terminology: Fields maxDelayInterval
E.2.54. Terminology: Fields maxOfflineInterval
E.2.55. Terminology: Fields maxSize
E.2.56. Terminology: Fields maximumStoredSeqNo
E.2.57. Terminology: Fields minimumStoredSeqNo
E.2.58. Terminology: Fields name
E.2.59. Terminology: Fields offlineRequests
E.2.60. Terminology: Fields otherTime
E.2.61. Terminology: Fields pendingError
E.2.62. Terminology: Fields pendingErrorCode
E.2.63. Terminology: Fields pendingErrorEventId
E.2.64. Terminology: Fields pendingErrorSeqno
E.2.65. Terminology: Fields pendingExceptionMessage
E.2.66. Terminology: Fields pipelineSource
E.2.67. Terminology: Fields processedMinSeqno
E.2.68. Terminology: Fields queues
E.2.69. Terminology: Fields readOnly
E.2.70. Terminology: Fields relativeLatency
E.2.71. Terminology: Fields resourcePrecedence
E.2.72. Terminology: Fields rmiPort
E.2.73. Terminology: Fields role
E.2.74. Terminology: Fields seqnoType
E.2.75. Terminology: Fields serializationCount
E.2.76. Terminology: Fields serialized
E.2.77. Terminology: Fields serviceName
E.2.78. Terminology: Fields serviceType
E.2.79. Terminology: Fields shard_id
E.2.80. Terminology: Fields simpleServiceName
E.2.81. Terminology: Fields siteName
E.2.82. Terminology: Fields sourceId
E.2.83. Terminology: Fields stage
E.2.84. Terminology: Fields started
E.2.85. Terminology: Fields state
E.2.86. Terminology: Fields stopRequested
E.2.87. Terminology: Fields store.#
E.2.88. Terminology: Fields storeClass
E.2.89. Terminology: Fields syncInterval
E.2.90. Terminology: Fields taskCount
E.2.91. Terminology: Fields taskId
E.2.92. Terminology: Fields timeInCurrentEvent
E.2.93. Terminology: Fields timeInStateSeconds
E.2.94. Terminology: Fields timeoutMillis
E.2.95. Terminology: Fields totalAssignments
E.2.96. Terminology: Fields transitioningTo
E.2.97. Terminology: Fields uptimeSeconds
E.2.98. Terminology: Fields version
F. Internals
F.1. Extending Backup and Restore Behavior
F.1.1. Backup Behavior
F.1.2. Restore Behavior
F.1.3. Writing a Custom Backup/Restore Script
F.1.4. Enabling a Custom Backup Script
F.2. Character Sets in Database and Tungsten Cluster
F.3. Understanding Replication of Date/Time Values
F.3. Best Practices
F.4. Memory Tuning and Performance
F.4.1. Understanding Tungsten Replicator Memory Tuning
F.5. Tungsten Replicator Pipelines and Stages
F.6. Tungsten Cluster Schemas
G. Frequently Asked Questions (FAQ)
H. Ecosystem Support
H.1. Continuent Github Repositories
I. Configuration Property Reference

List of Tables

1.1. Supported Extractors
1.2. Supported Appliers
2.1. Key Terminology
4.1. Optional Kafka Applier Properties
4.2. Hadoop Replication Directory Locations
4.3. Data Type differences when replicating data from MySQL to Oracle
5.1. Continuent Tungsten Directory Structure
7.1. check_tungsten_latency Options
7.2. check_tungsten_online Options
7.3. check_tungsten_services Options
7.4. ddlscan Command-line Options
7.5. ddlscan Supported Templates
7.6. dsctl Commands
7.7. dsctl Command-line Options
7.8. dsctl Command-line Options
7.9. dsctl Command-line Options
7.10. multi_trepctl Command-line Options
7.11. multi_trepctl--output Option
7.12. multi_trepctl Commands
7.13. query Common Options
7.14. replicator Commands
7.15. replicator Commands Options for condrestart
7.16. replicator Commands Options for console
7.17. replicator Commands Options for restart
7.18. replicator Commands Options for start
7.19. thl Options
7.20. trepctl Command-line Options
7.21. trepctl Replicator Wide Commands
7.22. trepctl Service Commands
7.23. trepctl backup Command Options
7.24. trepctl clients Command Options
7.25. trepctl offline-deferred Command Options
7.26. trepctl online Command Options
7.27. trepctl purge Command Options
7.28. trepctl reset Command Options
7.29. trepctl setrole Command Options
7.30. trepctl shard Command Options
7.31. trepctl status Command Options
7.32. trepctl wait Command Options
7.33. tpasswd Common Options
7.34. tungsten_health_check Command-line Options
7.35. tungsten_monitor Command-line Options
7.36. tungsten_provision_slave Command-line Options
7.37. tungsten_read_master_events Command-line Options
7.38. tungsten_send_diag Command-line Options
7.39. tungsten_set_position Command-line Options
8.1. TPM Deployment Methods
8.2. tpm Core Options
8.3. tpm Commands
8.4. tpm Common Options
8.5. tpm Validation Checks
8.6. tpm Configuration Options
D.1. Continuent Tungsten Directory Structure
D.2. Continuent Tungsten tungsten Sub-Directory Structure
E.1. THL Event Format