Tungsten Replicator 2.1 Manual

Continuent Ltd

Abstract

This manual documents Tungsten Replicator 2.1, a high-performance database replication application for replicating data between MySQL and Oracle to MySQL, Oracle, and to data warehouse solutions inculding HP Vertica.

This manual includes information for 2.1, up to and including 2.1.1.

Build date: 2019-05-09 (3a71f04a)

Up to date builds of this document: Tungsten Replicator 2.1 Manual (Online), Tungsten Replicator 2.1 Manual (PDF)


Table of Contents

Preface
1. Legal Notice
2. Conventions
3. Differences Between Open Source and Enterprise Releases
4. Quickstart Guide
1. Introduction
1.1. Tungsten Replicator
1.1.1. Extractor
1.1.2. Appliers
1.1.3. Transaction History Log (THL)
1.1.4. Filtering
2. Deployment Overview
2.1. Best Practices
2.1.1. Best Practices: Deployment
2.1.2. Best Practices: Operations
2.1.3. Best Practices: Maintenance
2.2. Prepare Hosts
2.2.1. Prepare MySQL Hosts
2.2.2. Deploy SSH Keys
2.3. Common tpm Options During Deployment
2.4. Starting and Stopping Tungsten Replicator
2.5. Configuring Startup on Boot
2.6. Removing Datasources from a Deployment
2.6.1. Removing a Datasource from an Existing Deployment
2.7. Understanding Deployment Styles and Topologies
2.7.1. Tungsten Replicator Extraction Operation
2.8. Understanding Heterogeneous Deployments
2.8.1. How Heterogeneous Replication Works
3. Deploying MySQL Extractors
3.1. MySQL Replication Pre-Requisites
3.2. Deploying a Master/Slave Topology
3.2.1. Monitoring the MySQL Extractor
3.3. Deploying an Extractor for Amazon RDS/Amazon Aurora
3.3.1. Changing Amazon RDS/Aurora Instance Configurations
3.4. Replicating Data Out of a Cluster
3.4.1. Prepare: Replicating Data Out of a Cluster
3.4.2. Deploy: Replicating Data Out of a Cluster
4. Deploying Oracle Extractors
4.1. Deploying Oracle Replication using CDC
4.1.1. How Oracle Extraction Works
4.1.2. Data Type Differences and Limitations
4.1.3. Creating an Oracle to MySQL Deployment
4.1.4. Creating an Oracle to Oracle Deployment
4.1.5. Updating CDC after Schema Changes
4.1.6. CDC Cleanup and Correction
4.1.7. Tuning CDC Extraction
4.1.8. Troubleshooting Oracle CDC Deployments
5. Deploying Appliers
5.1. Deploying the MongoDB Applier
5.1.1. Preparing for MongoDB Replication
5.1.2. Install MongoDB Applier
5.1.3. Management and Monitoring of MongoDB Deployments
5.2. Deploying the MySQL Applier
5.2.1. Preparing for MySQL Replication
5.2.2. Prepare Amazon RDS/Amazon Aurora
5.2.3. Install MySQL Applier
5.2.4. Management and Monitoring of MySQL Deployments
5.3. Deploying the Oracle Applier
5.3.1. Preparing for Oracle Replication
5.3.2. Install Oracle Applier
5.4. Deploying the Vertica Applier
5.4.1. Preparing for Vertica Deployments
5.4.2. Install Vertica Applier
5.4.3. Management and Monitoring of Vertica Deployments
5.4.4. Troubleshooting Vertica Installations
6. Advanced Deployments
6.1. Deploying a Fan-In Topology
6.1.1. Management and Monitoring Fan-in Deployments
6.2. Deploying Multiple Replicators on a Single Host
6.2.1. Preparing Multiple Replicators
6.2.2. Install Multiple Replicators
6.2.3. Best Practices: Multiple Replicators
6.3. Replicating Data Into an Existing Dataservice
6.4. Deploying Parallel Replication
6.4.1. Application Prerequisites for Parallel Replication
6.4.2. Enabling Parallel Apply During Install
6.4.3. Channels
6.4.4. Parallel Replication and Offline Operation
6.4.5. Adjusting Parallel Replication After Installation
6.4.6. Monitoring Parallel Replication
6.4.7. Controlling Assignment of Shards to Channels
6.4.8. Disk vs. Memory Parallel Queues
6.5. Batch Loading for Data Warehouses
6.5.1. How It Works
6.5.2. Important Limitations
6.5.3. Batch Applier Setup
6.5.4. Connect and Merge Scripts
6.5.5. Staging Tables
6.5.6. Character Sets
6.5.7. Supported CSV Formats
6.5.8. Columns in Generated CSV Files
6.5.9. Batchloading Opcodes
6.5.10. Time Zones
6.5.11. Data File Partitioning
6.6. Deploying SSL Secured Replication and Administration
6.6.1. Creating the Truststore and Keystore
6.6.2. SSL and Administration Authentication
6.6.3. Configuring the Secure Service through tpm
7. Operations Guide
7.1. The Tungsten Clustering Home Directory
7.2. Establishing the Shell Environment
7.3. Replicator Roles
7.4. Checking Replication Status
7.4.1. Understanding Replicator States
7.4.2. Replicator States During Operations
7.4.3. Changing Replicator States
7.5. Managing Transaction Failures
7.5.1. Identifying a Transaction Mismatch
7.5.2. Skipping Transactions
7.6. Creating a Backup
7.6.1. Using a Different Backup Tool
7.6.2. Using a Different Directory Location
7.6.3. Creating an External Backup
7.7. Restoring a Backup
7.7.1. Restoring a Specific Backup
7.7.2. Restoring an External Backup
7.7.3. Restoring from Another Slave
7.7.4. Manually Recovering from Another Slave
7.8. Migrating and Seeding Data
7.8.1. Migrating from MySQL Native Replication 'In-Place'
7.8.2. Migrating from MySQL Native Replication Using a New Service
7.8.3. Seeding Data through MySQL
7.9. Switching Master Hosts
7.10. Configuring Parallel Replication
7.11. Performing Database or OS Maintenance
7.11.1. Performing Maintenance on a Single Slave
7.11.2. Performing Maintenance on a Master
7.11.3. Performing Maintenance on an Entire Dataservice
7.11.4. Upgrading or Updating your JVM
7.12. Upgrading Tungsten Replicator
7.12.1. Upgrading Installations using update
7.12.2. Upgrading Tungsten Clustering to use tpm
7.12.3. Upgrading Tungsten Clustering using tpm
7.12.4. Installing an Upgraded JAR Patch
7.13. Monitoring Tungsten Clustering
7.13.1. Managing Log Files with logrotate
7.13.2. Monitoring Status Using cacti
7.13.3. Monitoring Status Using nagios
7.14. Rebuilding THL on the Master
8. Command-line Tools
8.1. The check_tungsten_latency Command
8.2. The check_tungsten_online Command
8.3. The check_tungsten_services Command
8.4. The deployall Command
8.5. The ddlscan Command
8.5.1. Optional Arguments
8.5.2. Supported Templates and Usage
8.6. env.sh Script
8.7. The replicator Command
8.8. The setupCDC.sh Command
8.9. The startall Command
8.10. The stopall Command
8.11. The thl Command
8.11.1. thl Position Commands
8.11.2. thl list Command
8.11.3. thl index Command
8.11.4. thl purge Command
8.11.5. thl info Command
8.11.6. thl help Command
8.12. The trepctl Command
8.12.1. trepctl Options
8.12.2. trepctl Global Commands
8.12.3. trepctl Service Commands
8.13. The tpasswd Command
8.14. The undeployall Command
8.15. The updateCDC.sh Command
9. The tpm Deployment Command
9.1. Processing Installs and Upgrades
9.2. tpm Staging Configuration
9.2.1. Configuring default options for all services
9.2.2. Configuring a single service
9.2.3. Configuring a single host
9.2.4. Reviewing the current configuration
9.2.5. Installation
9.2.6. Upgrades from a Staging Directory
9.2.7. Configuration Changes from a Staging Directory
9.2.8. Converting from INI to Staging
9.3. tpm Commands
9.3.1. tpm configure Command
9.3.2. tpm diag Command
9.3.3. tpm fetch Command
9.3.4. tpm firewall Command
9.3.5. tpm help Command
9.3.6. tpm install Command
9.3.7. tpm mysql Command
9.3.8. tpm query Command
9.3.9. tpm reset Command
9.3.10. tpm reset-thl Command
9.3.11. tpm restart Command
9.3.12. tpm reverse Command
9.3.13. tpm ssh-copy-cert Command
9.3.14. tpm start Command
9.3.15. tpm stop Command
9.3.16. tpm update Command
9.3.17. tpm validate Command
9.3.18. tpm validate-update Command
9.4. tpm Common Options
9.5. tpm Validation Checks
9.6. tpm Configuration Options
9.6.1. A tpm Options
9.6.2. B tpm Options
9.6.3. C tpm Options
9.6.4. D tpm Options
9.6.5. E tpm Options
9.6.6. H tpm Options
9.6.7. I tpm Options
9.6.8. J tpm Options
9.6.9. L tpm Options
9.6.10. M tpm Options
9.6.11. N tpm Options
9.6.12. O tpm Options
9.6.13. P tpm Options
9.6.14. R tpm Options
9.6.15. S tpm Options
9.6.16. T tpm Options
9.6.17. U tpm Options
9.6.18. V tpm Options
9.6.19. W tpm Options
10. Replication Filters
10.1. Enabling/Disabling Filters
10.2. Enabling Additional Filters
10.3. Filter Status
10.4. Filter Reference
10.4.1. ansiquotes.js Filter
10.4.2. BidiRemoteSlave (BidiSlave) Filter
10.4.3. breadcrumbs.js Filter
10.4.4. BuildAuditTable Filter
10.4.5. BuildIndexTable Filter
10.4.6. CaseMapping (CaseTransform) Filter
10.4.7. CDCMetadata (CustomCDC) Filter
10.4.8. ColumnName Filter
10.4.9. ConsistencyCheck Filter
10.4.10. DatabaseTransform (dbtransform) Filter
10.4.11. dbrename.js Filter
10.4.12. dbselector.js Filter
10.4.13. dbupper.js Filter
10.4.14. dropcomments.js Filter
10.4.15. dropmetadata.js Filter
10.4.16. dropstatementdata.js Filter
10.4.17. Dummy Filter
10.4.18. EnumToString Filter
10.4.19. EventMetadata Filter
10.4.20. foreignkeychecks.js Filter
10.4.21. Heartbeat Filter
10.4.22. insertsonly.js Filter
10.4.23. Logging Filter
10.4.24. MySQLSessionSupport (mysqlsessions) Filter
10.4.25. NetworkClient Filter
10.4.26. nocreatedbifnotexists.js Filter
10.4.27. OptimizeUpdates Filter
10.4.28. PrimaryKey Filter
10.4.29. PrintEvent Filter
10.4.30. Rename Filter
10.4.31. ReplicateColumns Filter
10.4.32. Replicate Filter
10.4.33. SetToString Filter
10.4.34. Shard Filter
10.4.35. shardbyseqno.js Filter
10.4.36. shardbytable.js Filter
10.4.37. TimeDelay (delay) Filter
10.4.38. tosingledb.js Filter
10.4.39. truncatetext.js Filter
10.4.40. zerodate2null.js Filter
10.5. JavaScript Filters
10.5.1. Writing JavaScript Filters
11. Performance and Tuning
11.1. Improving Network Performance
11.2. Tungsten Replicator Block Commit and Memory Usage
12. Configuration Files and Format
12.1. Replicator Configuration Properties
A. Troubleshooting
A.1. Contacting Support
A.1.1. Support Request Procedure
A.1.2. Creating a Support Account
A.1.3. Generating Diagnostic Information
A.1.4. Open a Support Ticket
A.1.5. Open a Support Ticket via Email
A.1.6. Getting Updates for all Company Support Tickets
A.1.7. Support Severity Level Definitions
A.1.8. Generating Advanced Diagnostic Information
A.2. Error/Cause/Solution
A.2.1. Replicator runs out of memory
A.2.2. Unable to update the configuration of an installed directory
A.2.3. Too many open processes or files
A.2.4. Services requires a reset
A.2.5. ORA-00257: ARCHIVER ERROR. CONNECT INTERNAL ONLY, UNTIL FREED
A.2.6. Attempt to write new log record with equal or lower fragno: seqno=3 previous stored fragno=32767 attempted new fragno=-32768
A.2.7. The session variable SQL_MODE when set to include ALLOW_INVALID_DATES does not apply statements correctly on the slave.
A.3. Known Issues
A.3.1. Triggers
A.4. Troubleshooting Timeouts
A.5. Troubleshooting Backups
A.6. Running Out of Diskspace
A.7. Troubleshooting SSH and tpm
A.8. Troubleshooting Data Differences
A.8.1. Identify Structural Differences
A.8.2. Identify Data Differences
A.9. Comparing Table Data
A.10. Troubleshooting Memory Usage
B. Release Notes
B.1. Tungsten Replicator 2.1.2-44 Maintenance Release (27 November 2013)
B.2. Tungsten Replicator 2.1.2 GA (30 August 2013)
B.3. Tungsten Replicator 2.1.1 Recalled (21 August 2013)
B.4. Tungsten Replicator 2.1.0 GA (14 June 2013)
C. Prerequisites
C.1. Requirements
C.1.1. Operating Systems Support
C.1.2. Database Support
C.1.3. RAM Requirements
C.1.4. Disk Requirements
C.1.5. Java Requirements
C.1.6. Cloud Deployment Requirements
C.2. Staging Host Configuration
C.3. Host Configuration
C.3.1. Creating the User Environment
C.3.2. Configuring Network and SSH Environment
C.3.3. Directory Locations and Configuration
C.3.4. Configure Software
C.3.5. sudo Configuration
C.3.6. SELinux Configuration
C.4. MySQL Database Setup
C.4.1. MySQL Version Support
C.4.2. MySQL Configuration
C.4.3. MySQL Configuration for Multimaster Deployments
C.4.4. MySQL Configuration for Heterogeneous Deployments
C.4.5. MySQL User Configuration
C.5. Oracle Database Setup
C.5.1. Oracle Version Support
C.5.2. Oracle Environment Variables
C.6. Prerequisite Checklist
D. Terminology Reference
D.1. Transaction History Log (THL)
D.1.1. THL Format
D.2. Generated Field Reference
D.2.1. Terminology: Fields accessFailures
D.2.2. Terminology: Fields active
D.2.3. Terminology: Fields activeSeqno
D.2.4. Terminology: Fields appliedLastEventId
D.2.5. Terminology: Fields appliedLastSeqno
D.2.6. Terminology: Fields appliedLatency
D.2.7. Terminology: Fields applier.class
D.2.8. Terminology: Fields applier.name
D.2.9. Terminology: Fields applyTime
D.2.10. Terminology: Fields averageBlockSize
D.2.11. Terminology: Fields blockCommitRowCount
D.2.12. Terminology: Fields cancelled
D.2.13. Terminology: Fields channel
D.2.14. Terminology: Fields channels
D.2.15. Terminology: Fields clusterName
D.2.16. Terminology: Fields commits
D.2.17. Terminology: Fields committedMinSeqno
D.2.18. Terminology: Fields criticalPartition
D.2.19. Terminology: Fields currentBlockSize
D.2.20. Terminology: Fields currentEventId
D.2.21. Terminology: Fields currentLastEventId
D.2.22. Terminology: Fields currentLastFragno
D.2.23. Terminology: Fields currentLastSeqno
D.2.24. Terminology: Fields currentTimeMillis
D.2.25. Terminology: Fields dataServerHost
D.2.26. Terminology: Fields discardCount
D.2.27. Terminology: Fields doChecksum
D.2.28. Terminology: Fields estimatedOfflineInterval
D.2.29. Terminology: Fields eventCount
D.2.30. Terminology: Fields extensions
D.2.31. Terminology: Fields extractTime
D.2.32. Terminology: Fields extractor.class
D.2.33. Terminology: Fields extractor.name
D.2.34. Terminology: Fields filter.#.class
D.2.35. Terminology: Fields filter.#.name
D.2.36. Terminology: Fields filterTime
D.2.37. Terminology: Fields flushIntervalMillis
D.2.38. Terminology: Fields fsyncOnFlush
D.2.39. Terminology: Fields headSeqno
D.2.40. Terminology: Fields intervalGuard
D.2.41. Terminology: Fields lastCommittedBlockSize
D.2.42. Terminology: Fields lastCommittedBlockTime
D.2.43. Terminology: Fields latestEpochNumber
D.2.44. Terminology: Fields logConnectionTimeout
D.2.45. Terminology: Fields logDir
D.2.46. Terminology: Fields logFileRetainMillis
D.2.47. Terminology: Fields logFileSize
D.2.48. Terminology: Fields masterConnectUri
D.2.49. Terminology: Fields masterListenUri
D.2.50. Terminology: Fields maxChannel
D.2.51. Terminology: Fields maxDelayInterval
D.2.52. Terminology: Fields maxOfflineInterval
D.2.53. Terminology: Fields maxSize
D.2.54. Terminology: Fields maximumStoredSeqNo
D.2.55. Terminology: Fields minimumStoredSeqNo
D.2.56. Terminology: Fields name
D.2.57. Terminology: Fields offlineRequests
D.2.58. Terminology: Fields otherTime
D.2.59. Terminology: Fields pendingError
D.2.60. Terminology: Fields pendingErrorCode
D.2.61. Terminology: Fields pendingErrorEventId
D.2.62. Terminology: Fields pendingErrorSeqno
D.2.63. Terminology: Fields pendingExceptionMessage
D.2.64. Terminology: Fields pipelineSource
D.2.65. Terminology: Fields processedMinSeqno
D.2.66. Terminology: Fields queues
D.2.67. Terminology: Fields readOnly
D.2.68. Terminology: Fields relativeLatency
D.2.69. Terminology: Fields resourcePrecedence
D.2.70. Terminology: Fields rmiPort
D.2.71. Terminology: Fields role
D.2.72. Terminology: Fields seqnoType
D.2.73. Terminology: Fields serializationCount
D.2.74. Terminology: Fields serialized
D.2.75. Terminology: Fields serviceName
D.2.76. Terminology: Fields serviceType
D.2.77. Terminology: Fields shard_id
D.2.78. Terminology: Fields simpleServiceName
D.2.79. Terminology: Fields siteName
D.2.80. Terminology: Fields sourceId
D.2.81. Terminology: Fields stage
D.2.82. Terminology: Fields started
D.2.83. Terminology: Fields state
D.2.84. Terminology: Fields stopRequested
D.2.85. Terminology: Fields store.#
D.2.86. Terminology: Fields storeClass
D.2.87. Terminology: Fields syncInterval
D.2.88. Terminology: Fields taskCount
D.2.89. Terminology: Fields taskId
D.2.90. Terminology: Fields timeInStateSeconds
D.2.91. Terminology: Fields timeoutMillis
D.2.92. Terminology: Fields totalAssignments
D.2.93. Terminology: Fields transitioningTo
D.2.94. Terminology: Fields uptimeSeconds
D.2.95. Terminology: Fields version
E. Files, Directories, and Environment
E.1. The Tungsten Clustering Install Directory
E.1.1. The backups Directory
E.1.2. The releases Directory
E.1.3. The service_logs Directory
E.1.4. The share Directory
E.1.5. The thl Directory
E.1.6. The tungsten Directory
E.2. Log Files
E.3. Environment Variables
F. Internals
F.1. Extending Backup and Restore Behavior
F.1.1. Backup Behavior
F.1.2. Restore Behavior
F.1.3. Writing a Custom Backup/Restore Script
F.1.4. Enabling a Custom Backup Script
F.2. Character Sets in Database and Tungsten Clustering
F.3. Memory Tuning and Performance
F.3.1. Understanding Tungsten Replicator Memory Tuning
F.4. Tungsten Replicator Pipelines and Stages
F.5. Tungsten Clustering Schemas
G. Frequently Asked Questions (FAQ)
H. Ecosystem Support
I. Configuration Property Reference

List of Tables

1. Differences between Open Source and Enterprise Releases
1.1. Supported Extractors
1.2. Supported Appliers
2.1. Key Terminology
4.1. Data Type differences when replicating data from MySQL to Oracle
4.2. Data Type Differences when Replicating from Oracle to MySQL or Oracle
4.3. setupCDC.conf Configuration File Parameters
5.1. Data Type differences when replicating data from MySQL to Oracle
6.1. Continuent Tungsten Directory Structure
8.1. check_tungsten_latency Options
8.2. check_tungsten_online Options
8.3. check_tungsten_services Options
8.4. ddlscan Command-line Options
8.5. ddlscan Supported Templates
8.6. replicator Commands
8.7. replicator Commands Options for condrestart
8.8. replicator Commands Options for console
8.9. replicator Commands Options for restart
8.10. replicator Commands Options for start
8.11. setupCDC.conf Configuration Options
8.12. thl Options
8.13. trepctl Command-line Options
8.14. trepctl Replicator Wide Commands
8.15. trepctl Service Commands
8.16. trepctl backup Command Options
8.17. trepctl clients Command Options
8.18. trepctl offline-deferred Command Options
8.19. trepctl online Command Options
8.20. trepctl purge Command Options
8.21. trepctl reset Command Options
8.22. trepctl setrole Command Options
8.23. trepctl shard Command Options
8.24. trepctl status Command Options
8.25. trepctl wait Command Options
8.26. tpasswd Common Options
9.1. tpm Core Options
9.2. tpm Commands
9.3. tpm Common Options
9.4. tpm Validation Checks
9.5. tpm Configuration Options
D.1. THL Event Format
E.1. Continuent Tungsten Directory Structure
E.2. Continuent Tungsten tungsten Sub-Directory Structure