Tungsten Replicator 4.0 Manual

Continuent Ltd

Abstract

This manual documents Tungsten Replicator 4.0, a high-performance database replication application for replicating data between MySQL and Oracle to MySQL, Oracle, and to data warehouse solutions inculding HP Vertica.

This manual includes information for 4.0, up to and including 4.0.7.

Build date: 2017-03-25 (1c2ec19)

Up to date builds of this document: Tungsten Replicator 4.0 Manual (Online), Tungsten Replicator 4.0 Manual (PDF)


Table of Contents

Preface
1. Legal Notice
2. Conventions
3. Quickstart Guide
1. Introduction
1.1. Tungsten Replicator
1.1.1. Extractor
1.1.2. Appliers
1.1.3. Transaction History Log (THL)
1.1.4. Filtering
2. Deployment
2.1. Deployment Sources
2.1.1. Using the TAR/GZipped files
2.1.2. Using the RPM and DEB package files
2.2. Best Practices
2.2.1. Best Practices: Deployment
2.2.2. Best Practices: Operations
2.2.3. Best Practices: Maintenance
2.3. Prepare Hosts
2.3.1. Prepare MySQL Hosts
2.3.2. Deploy SSH Keys
2.4. Common tpm Options During Deployment
2.5. Starting and Stopping Tungsten Replicator
2.6. Configuring Startup on Boot
2.7. Removing Datasources from a Deployment
2.7.1. Removing a Datasource from an Existing Deployment
3. Heterogeneous Deployments
3.1. How Heterogeneous Replication Works
3.1. MySQL to Oracle, Oracle to MySQL, and Oracle to Oracle Replication
3.1. Native Applier Replication (e.g. MongoDB)
3.1. Batch Loading
3.1. Schema Creation and Replication
4. MySQL-only Deployments
4.1. Deploying a Master/Slave Topology
4.1.1. Monitoring a Master/Slave Dataservice
4.2. Deploying a Multi-master Topology
4.2.1. Preparing Hosts for Multimaster
4.2.2. Installing Multimaster Deployments
4.2.3. Management and Monitoring of Multimaster Deployments
4.2.4. Alternative Multimaster Deployments
4.3. Deploying a Fan-In Topology
4.3.1. Management and Monitoring Fan-in Deployments
4.4. Deploying Multiple Replicators on a Single Host
4.4.1. Prepare: Multiple Replicators
4.4.2. Install: Multiple Replicators
4.4.3. Best Practices: Multiple Replicators
4.5. Replicating Data Out of a Cluster
4.5.1. Prepare: Replicating Data Out of a Cluster
4.5.2. Deploy: Replicating Data Out of a Cluster
4.5.3. Best Practices: Replicating Data Out of a Cluster
4.6. Replicating Data Into an Existing Dataservice
5. Heterogeneous MySQL Deployments
5.1. Deploying MySQL to Oracle Replication
5.1.1. Prepare: MySQL to Oracle Replication
5.1.2. Install: MySQL to Oracle Replication
5.2. Deploying MySQL to Hadoop Replication
5.2.1. Hadoop Replication Operation
5.2.2. Preparing Hosts for Hadoop Replication
5.2.3. Installing Hadoop Replication
5.2.4. Generating Materialized Views
5.2.5. Accessing Generated Tables in Hive
5.2.6. Management and Monitoring of Hadoop Deployments
5.3. Deploying MySQL to Amazon Redshift Replication
5.3.1. Redshift Replication Operation
5.3.2. Preparing Hosts for Amazon Redshift Deployments
5.3.3. Installing Amazon Redshift Replication
5.3.4. Verifying your Redshift Installation
5.3.5. Keeping CDC Information
5.3.6. Management and Monitoring of Amazon Redshift Deployments
5.3.7. Troubleshooting Amazon Redshift Installations
5.4. Deploying MySQL to Vertica Replication
5.4.1. Preparing Hosts for Vertica Deployments
5.4.2. Installing Vertica Replication
5.4.3. Management and Monitoring of Vertica Deployments
5.4.4. Troubleshooting Vertica Installations
6. Heterogeneous Oracle Deployments
6.1. Deploying Oracle Replication
6.1.1. How Oracle Extraction Works
6.1.2. Data Type Differences and Limitations
6.1.3. Creating an Oracle to MySQL Deployment
6.1.4. Creating an Oracle to Oracle Deployment
6.1.5. Deployment with Provisioning
6.1.6. Updating CDC after Schema Changes
6.1.7. CDC Cleanup and Correction
6.1.8. Tuning CDC Extraction
6.1.9. Troubleshooting Oracle CDC Deployments
7. Advanced Deployments
7.1. Deploying Parallel Replication
7.1.1. Application Prerequisites for Parallel Replication
7.1.2. Enabling Parallel Apply
7.1.3. Channels
7.1.4. Disk vs. Memory Parallel Queues
7.1.5. Parallel Replication and Offline Operation
7.1.6. Adjusting Parallel Replication After Installation
7.1.7. Monitoring Parallel Replication
7.1.8. Controlling Assignment of Shards to Channels
7.2. Batch Loading for Data Warehouses
7.2.1. How It Works
7.2.2. Important Limitations
7.2.3. Batch Applier Setup
7.2.4. JavaScript Batchloader Scripts
7.2.5. Staging Tables
7.2.6. Character Sets
7.2.7. CSV Formats
7.2.8. Time Zones
7.3. Additional Configuration and Deployment Options
7.3.1. Deploying Multiple Replicators on a Single Host
7.4. Deploying SSL Secured Replication and Administration
7.4.1. Creating the Truststore and Keystore
7.4.2. SSL and Administration Authentication
7.4.3. Configuring the Secure Service through tpm
8. Operations Guide
8.1. The Tungsten Replicator Home Directory
8.2. Establishing the Shell Environment
8.3. Checking Replication Status
8.3.1. Understanding Replicator States
8.3.2. Replicator States During Operations
8.3.3. Changing Replicator States
8.4. Managing Transaction Failures
8.4.1. Identifying a Transaction Mismatch
8.4.2. Skipping Transactions
8.5. Provision or Reprovision a Slave
8.6. Creating a Backup
8.6.1. Using a Different Backup Tool
8.6.2. Using a Different Directory Location
8.6.3. Creating an External Backup
8.7. Restoring a Backup
8.7.1. Restoring a Specific Backup
8.7.2. Restoring an External Backup
8.7.3. Restoring from Another Slave
8.7.4. Manually Recovering from Another Slave
8.8. Deploying Automatic Replicator Recovery
8.9. Migrating and Seeding Data
8.9.1. Migrating from MySQL Native Replication 'In-Place'
8.9.2. Migrating from MySQL Native Replication Using a New Service
8.9.3. Seeding Data through MySQL
8.9.4. Seeding Data through tungsten_provision_thl
8.10. Using the Parallel Extractor
8.10.1. Advanced Configuration Parameters
8.11. Switching Master Hosts
8.12. Configuring Parallel Replication
8.13. Performing Database or OS Maintenance
8.13.1. Performing Maintenance on a Single Slave
8.13.2. Performing Maintenance on a Master
8.13.3. Performing Maintenance on an Entire Dataservice
8.13.4. Upgrading or Updating your JVM
8.14. Making Online Schema Changes
8.15. Upgrading Tungsten Replicator
8.15.1. Upgrading Installations using update
8.15.2. Upgrading Tungsten Replicator to use tpm
8.15.3. Upgrading Tungsten Replicator using tpm
8.15.4. Installing an Upgraded JAR Patch
8.16. Monitoring Tungsten Replicator
8.16.1. Managing Log Files with logrotate
8.16.2. Monitoring Status Using cacti
8.16.3. Monitoring Status Using nagios
9. Command-line Tools
9.1. The check_tungsten_latency Command
9.2. The check_tungsten_online Command
9.3. The check_tungsten_services Command
9.4. The deployall Command
9.5. The ddlscan Command
9.5.1. Optional Arguments
9.5.2. Supported Templates and Usage
9.6. The dsctl Command
9.6.1. dsctl get Command
9.6.2. dsctl set Command
9.6.3. dsctl reset Command
9.6.4. dsctl help Command
9.7. env.sh Script
9.8. The load-reduce-check Tool
9.8.1. Generating Staging DDL
9.8.2. Generating Live DDL
9.8.3. Materializing a View
9.8.4. Compare Loaded Data
9.9. The materialize Command
9.10. The multi_trepctl Command
9.10.1. multi_trepctl Options
9.10.2. multi_trepctl Commands
9.11. The replicator Command
9.12. The setupCDC.sh Command
9.13. The startall Command
9.14. The stopall Command
9.15. The thl Command
9.15.1. thl list Command
9.15.2. thl index Command
9.15.3. thl purge Command
9.15.4. thl info Command
9.15.5. thl help Command
9.16. The trepctl Command
9.16.1. trepctl Options
9.16.2. trepctl Global Commands
9.16.3. trepctl Service Commands
9.17. The tpasswd Command
9.18. The tungsten_provision_thl Command
9.18.1. Provisioning from RDS
9.18.2. tungsten_provision_thl Reference
9.19. The tungsten_provision_slave Script
9.20. The tungsten_read_master_events Script
9.21. The tungsten_set_position Script
9.22. The undeployall Command
9.23. The updateCDC.sh Command
10. The tpm Deployment Command
10.1. Comparing Staging and INI tpm Methods
10.2. Processing Installs and Upgrades
10.3. tpm Staging Configuration
10.3.1. Configuring default options for all services
10.3.2. Configuring a single service
10.3.3. Configuring a single host
10.3.4. Reviewing the current configuration
10.3.5. Installation
10.3.6. Upgrades from a Staging Directory
10.3.7. Configuration Changes from a Staging Directory
10.3.8. Converting from INI to Staging
10.4. tpm INI File Configuration
10.4.1. Creating an INI file
10.4.2. Installation with INI File
10.4.3. Upgrades with an INI File
10.4.4. Configuration Changes with an INI file
10.4.5. Converting from Staging to INI
10.5. tpm Commands
10.5.1. tpm configure Command
10.5.2. tpm diag Command
10.5.3. tpm fetch Command
10.5.4. tpm firewall Command
10.5.5. tpm help Command
10.5.6. tpm install Command
10.5.7. tpm mysql Command
10.5.8. tpm query Command
10.5.9. tpm reset Command
10.5.10. tpm reset-thl Command
10.5.11. tpm restart Command
10.5.12. tpm reverse Command
10.5.13. tpm ssh-copy-cert Command
10.5.14. tpm start Command
10.5.15. tpm stop Command
10.5.16. tpm update Command
10.5.17. tpm validate Command
10.5.18. tpm validate-update Command
10.6. tpm Common Options
10.7. tpm Configuration Options
10.7.1. A tpm Options
10.7.2. B tpm Options
10.7.3. C tpm Options
10.7.4. D tpm Options
10.7.5. E tpm Options
10.7.6. F tpm Options
10.7.7. H tpm Options
10.7.8. I tpm Options
10.7.9. J tpm Options
10.7.10. L tpm Options
10.7.11. M tpm Options
10.7.12. N tpm Options
10.7.13. O tpm Options
10.7.14. P tpm Options
10.7.15. Q tpm Options
10.7.16. R tpm Options
10.7.17. S tpm Options
10.7.18. T tpm Options
10.7.19. U tpm Options
10.7.20. V tpm Options
10.7.21. W tpm Options
11. Replication Filters
11.1. Enabling/Disabling Filters
11.2. Enabling Additional Filters
11.3. Filter Status
11.4. Filter Reference
11.4.1. ansiquotes.js Filter
11.4.2. BidiRemoteSlave (BidiSlave) Filter
11.4.3. breadcrumbs.js Filter
11.4.4. BuildAuditTable Filter
11.4.5. BuildIndexTable Filter
11.4.6. CaseMapping (CaseTransform) Filter
11.4.7. CDCMetadata (CustomCDC) Filter
11.4.8. ColumnName Filter
11.4.9. ConsistencyCheck Filter
11.4.10. DatabaseTransform (dbtransform) Filter
11.4.11. dbrename.js Filter
11.4.12. dbselector.js Filter
11.4.13. dbupper.js Filter
11.4.14. dropcolumn.js Filter
11.4.15. dropcomments.js Filter
11.4.16. dropmetadata.js Filter
11.4.17. dropstatementdata.js Filter
11.4.18. Dummy Filter
11.4.19. EnumToString Filter
11.4.20. EventMetadata Filter
11.4.21. foreignkeychecks.js Filter
11.4.22. Heartbeat Filter
11.4.23. insertsonly.js Filter
11.4.24. Logging Filter
11.4.25. MySQLSessionSupport (mysqlsessions) Filter
11.4.26. NetworkClient Filter
11.4.27. nocreatedbifnotexists.js Filter
11.4.28. OptimizeUpdates Filter
11.4.29. PrimaryKey Filter
11.4.30. PrintEvent Filter
11.4.31. Rename Filter
11.4.32. ReplicateColumns Filter
11.4.33. Replicate Filter
11.4.34. SetToString Filter
11.4.35. Shard Filter
11.4.36. shardbyseqno.js Filter
11.4.37. shardbytable.js Filter
11.4.38. TimeDelay (delay) Filter
11.4.39. tosingledb.js Filter
11.4.40. truncatetext.js Filter
11.4.41. zerodate2null.js Filter
11.5. JavaScript Filters
11.5.1. Writing JavaScript Filters
12. Performance and Tuning
12.1. Block Commit
12.1.1. Monitoring Block Commit Status
12.2. Improving Network Performance
12.3. Tungsten Replicator Block Commit and Memory Usage
A. Troubleshooting
A.1. Contacting Support
A.1.1. Support Request Procedure
A.1.2. Creating a Support Account
A.1.3. Generating Diagnostic Information
A.1.4. Open a Support Ticket
A.1.5. Open a Support Ticket via Email
A.1.6. Getting Updates for all Company Support Tickets
A.1.7. Support Severity Level Definitions
A.2. Error/Cause/Solution
A.2.1. ORA-00257: ARCHIVER ERROR. CONNECT INTERNAL ONLY, UNTIL FREED
A.2.2. 'subscription exists' when setting up CDC on Oracle
A.2.3. Replicator runs out of memory
A.2.4. MySQLExtractException: unknown data type 0
A.2.5. Attempt to write new log record with equal or lower fragno: seqno=3 previous stored fragno=32767 attempted new fragno=-32768
A.2.6. OptimizeUpdatesFilter cannot filter, because column and key count is different. Make sure that it is defined before filters which remove keys (eg. PrimaryKeyFilter)
A.2.7. Unable to update the configuration of an installed directory
A.2.8. The session variable SQL_MODE when set to include ALLOW_INVALID_DATES does not apply statements correctly on the slave.
A.2.9. Services requires a reset
A.2.10. There were issues configuring the sandbox MySQL server
A.2.11. Too many open processes or files
A.3. Known Issues
A.3.1. Triggers
A.4. Troubleshooting Timeouts
A.5. Troubleshooting Backups
A.6. Running Out of Diskspace
A.7. Troubleshooting SSH and tpm
A.8. Troubleshooting Data Differences
A.8.1. Identify Structural Differences
A.8.2. Identify Data Differences
A.9. Comparing Table Data
A.10. Troubleshooting Memory Usage
B. Release Notes
B.1. Tungsten Replicator 4.0.7 GA (23 Feb 2017)
B.2. Tungsten Replicator 4.0.6 GA (8 Dec 2016)
B.3. Tungsten Replicator 4.0.5 GA (4 Mar 2016)
B.4. Tungsten Replicator 4.0.4 GA (24 Feb 2016)
B.5. Tungsten Replicator 4.0.3 Not Released (NA)
B.6. Tungsten Replicator 4.0.2 GA (1 Oct 2015)
B.7. Tungsten Replicator 4.0.1 GA (20 Jul 2015)
B.8. Tungsten Replicator 4.0.0 GA (17 Apr 2015)
C. Prerequisites
C.1. Requirements
C.1.1. Operating Systems Support
C.1.2. Database Support
C.1.3. RAM Requirements
C.1.4. Disk Requirements
C.1.5. Java Requirements
C.1.6. Cloud Deployment Requirements
C.2. Staging Host Configuration
C.3. Host Configuration
C.3.1. Creating the User Environment
C.3.2. Configuring Network and SSH Environment
C.3.3. Directory Locations and Configuration
C.3.4. Configure Software
C.3.5. sudo Configuration
C.4. MySQL Database Setup
C.4.1. MySQL Version Support
C.4.2. MySQL Configuration
C.4.3. MySQL User Configuration
C.4.4. MySQL Unprivileged Users
C.5. Oracle Database Setup
C.5.1. Oracle Version Support
C.5.2. Oracle Environment Variables
D. Files, Directories, and Environment
D.1. The Tungsten Replicator Install Directory
D.1.1. The backups Directory
D.1.2. The releases Directory
D.1.3. The service_logs Directory
D.1.4. The share Directory
D.1.5. The thl Directory
D.1.6. The tungsten Directory
D.2. Log Files
D.3. Environment Variables
E. Terminology Reference
E.1. Transaction History Log (THL)
E.1.1. THL Format
E.2. Generated Field Reference
E.2.1. Terminology: Fields accessFailures
E.2.2. Terminology: Fields active
E.2.3. Terminology: Fields activeSeqno
E.2.4. Terminology: Fields appliedLastEventId
E.2.5. Terminology: Fields appliedLastSeqno
E.2.6. Terminology: Fields appliedLatency
E.2.7. Terminology: Fields applier.class
E.2.8. Terminology: Fields applier.name
E.2.9. Terminology: Fields applyTime
E.2.10. Terminology: Fields autoRecoveryEnabled
E.2.11. Terminology: Fields autoRecoveryTotal
E.2.12. Terminology: Fields averageBlockSize
E.2.13. Terminology: Fields blockCommitRowCount
E.2.14. Terminology: Fields cancelled
E.2.15. Terminology: Fields channel
E.2.16. Terminology: Fields channels
E.2.17. Terminology: Fields clusterName
E.2.18. Terminology: Fields commits
E.2.19. Terminology: Fields committedMinSeqno
E.2.20. Terminology: Fields criticalPartition
E.2.21. Terminology: Fields currentBlockSize
E.2.22. Terminology: Fields currentEventId
E.2.23. Terminology: Fields currentLastEventId
E.2.24. Terminology: Fields currentLastFragno
E.2.25. Terminology: Fields currentLastSeqno
E.2.26. Terminology: Fields currentTimeMillis
E.2.27. Terminology: Fields dataServerHost
E.2.28. Terminology: Fields discardCount
E.2.29. Terminology: Fields doChecksum
E.2.30. Terminology: Fields estimatedOfflineInterval
E.2.31. Terminology: Fields eventCount
E.2.32. Terminology: Fields extensions
E.2.33. Terminology: Fields extractTime
E.2.34. Terminology: Fields extractor.class
E.2.35. Terminology: Fields extractor.name
E.2.36. Terminology: Fields filter.#.class
E.2.37. Terminology: Fields filter.#.name
E.2.38. Terminology: Fields filterTime
E.2.39. Terminology: Fields flushIntervalMillis
E.2.40. Terminology: Fields fsyncOnFlush
E.2.41. Terminology: Fields headSeqno
E.2.42. Terminology: Fields intervalGuard
E.2.43. Terminology: Fields lastCommittedBlockSize
E.2.44. Terminology: Fields lastCommittedBlockTime
E.2.45. Terminology: Fields latestEpochNumber
E.2.46. Terminology: Fields logConnectionTimeout
E.2.47. Terminology: Fields logDir
E.2.48. Terminology: Fields logFileRetainMillis
E.2.49. Terminology: Fields logFileSize
E.2.50. Terminology: Fields masterConnectUri
E.2.51. Terminology: Fields masterListenUri
E.2.52. Terminology: Fields maxChannel
E.2.53. Terminology: Fields maxDelayInterval
E.2.54. Terminology: Fields maxOfflineInterval
E.2.55. Terminology: Fields maxSize
E.2.56. Terminology: Fields maximumStoredSeqNo
E.2.57. Terminology: Fields minimumStoredSeqNo
E.2.58. Terminology: Fields name
E.2.59. Terminology: Fields offlineRequests
E.2.60. Terminology: Fields otherTime
E.2.61. Terminology: Fields pendingError
E.2.62. Terminology: Fields pendingErrorCode
E.2.63. Terminology: Fields pendingErrorEventId
E.2.64. Terminology: Fields pendingErrorSeqno
E.2.65. Terminology: Fields pendingExceptionMessage
E.2.66. Terminology: Fields pipelineSource
E.2.67. Terminology: Fields processedMinSeqno
E.2.68. Terminology: Fields queues
E.2.69. Terminology: Fields readOnly
E.2.70. Terminology: Fields relativeLatency
E.2.71. Terminology: Fields resourcePrecedence
E.2.72. Terminology: Fields rmiPort
E.2.73. Terminology: Fields role
E.2.74. Terminology: Fields seqnoType
E.2.75. Terminology: Fields serializationCount
E.2.76. Terminology: Fields serialized
E.2.77. Terminology: Fields serviceName
E.2.78. Terminology: Fields serviceType
E.2.79. Terminology: Fields shard_id
E.2.80. Terminology: Fields simpleServiceName
E.2.81. Terminology: Fields siteName
E.2.82. Terminology: Fields sourceId
E.2.83. Terminology: Fields stage
E.2.84. Terminology: Fields started
E.2.85. Terminology: Fields state
E.2.86. Terminology: Fields stopRequested
E.2.87. Terminology: Fields store.#
E.2.88. Terminology: Fields storeClass
E.2.89. Terminology: Fields syncInterval
E.2.90. Terminology: Fields taskCount
E.2.91. Terminology: Fields taskId
E.2.92. Terminology: Fields timeInStateSeconds
E.2.93. Terminology: Fields timeoutMillis
E.2.94. Terminology: Fields totalAssignments
E.2.95. Terminology: Fields transitioningTo
E.2.96. Terminology: Fields uptimeSeconds
E.2.97. Terminology: Fields version
F. Internals
F.1. Extending Backup and Restore Behavior
F.1.1. Backup Behavior
F.1.2. Restore Behavior
F.1.3. Writing a Custom Backup/Restore Script
F.1.4. Enabling a Custom Backup Script
F.2. Character Sets in Database and Tungsten Replicator
F.3. Understanding Replication of Date/Time Values
F.3. Best Practices
F.4. Memory Tuning and Performance
F.4.1. Understanding Tungsten Replicator Memory Tuning
F.5. Tungsten Replicator Stages
F.6. Tungsten Replicator Schemas
G. Frequently Asked Questions (FAQ)
H. Ecosystem Support
H.1. Continuent Github Repositories
I. Configuration Property Reference

List of Tables

2.1. Key Terminology
5.1. Data Type differences when replicating data from MySQL to Oracle
5.2. Hadoop Replication Directory Locations
6.1. Data Type differences when replicating data from MySQL to Oracle
6.2. Data Type Differences when Replicating from Oracle to MySQL or Oracle
6.3. setupCDC.conf Configuration File Parameters
9.1. check_tungsten_latency Options
9.2. check_tungsten_online Options
9.3. check_tungsten_services Options
9.4. ddlscan Command-line Options
9.5. ddlscan Supported Templates
9.6. dsctl Commands
9.7. dsctl Command-line Options
9.8. dsctl Command-line Options
9.9. multi_trepctl Command-line Options
9.10. multi_trepctl--output Option
9.11. multi_trepctl Commands
9.12. replicator Commands
9.13. replicator Commands Options for condrestart
9.14. replicator Commands Options for console
9.15. replicator Commands Options for restart
9.16. replicator Commands Options for start
9.17. setupCDC.conf Configuration Options
9.18. thl Options
9.19. trepctl Command-line Options
9.20. trepctl Replicator Wide Commands
9.21. trepctl Service Commands
9.22. trepctl backup Command Options
9.23. trepctl clients Command Options
9.24. trepctl offline-deferred Command Options
9.25. trepctl online Command Options
9.26. trepctl purge Command Options
9.27. trepctl reset Command Options
9.28. trepctl setrole Command Options
9.29. trepctl shard Command Options
9.30. trepctl status Command Options
9.31. trepctl wait Command Options
9.32. tpasswd Common Options
9.33. tungsten_provision_slave Command-line Options
9.34. tungsten_read_master_events Command-line Options
9.35. tungsten_set_position Command-line Options
10.1. TPM Deployment Methods
10.2. tpm Core Options
10.3. tpm Commands
10.4. tpm Common Options
10.5. tpm Configuration Options
D.1. Continuent Tungsten Directory Structure
D.2. Continuent Tungsten tungsten Sub-Directory Structure
E.1. THL Event Format