Tungsten Replicator 3.0 Manual

Continuent

Abstract

This manual documents Tungsten Replicator 3.0.

Build date: 2014-10-13, Revision: 1163

Up to date builds of this document: Tungsten Replicator 3.0 Manual (Online), Tungsten Replicator 3.0 Manual (PDF)


Table of Contents

Preface
1. Legal Notice
2. Conventions
3. Quickstart Guide
1. Introduction
1.1. Tungsten Replicator
1.1.1. Extractor
1.1.2. Appliers
1.1.3. Transaction History Log (THL)
1.1.4. Filtering
2. Deployment
2.1. Requirements
2.1.1. Operating Systems Support
2.1.2. Database Support
2.1.3. RAM Requirements
2.1.4. Disk Requirements
2.1.5. Java Requirements
2.1.6. Cloud Deployment Requirements
2.2. Deployment Sources
2.2.1. Using the TAR/GZipped files
2.2.2. Using the RPM and DEB package files
2.3. Deploying a Master/Slave Topology
2.3.1. Monitoring a Master/Slave Dataservice
2.4. Deploying a Multi-master Topology
2.4.1. Preparing Hosts for Multimaster
2.4.2. Installing Multimaster Deployments
2.4.3. Management and Monitoring of Multimaster Deployments
2.4.4. Alternative Multimaster Deployments
2.5. Deploying a Fan-In Topology
2.5.1. Management and Monitoring Fan-in Deployments
2.6. Deploying a Star Topology
2.6.1. Management and Monitoring of Star Topology Deployments
2.7. Additional Configuration and Deployment Options
2.7.1. Deploying Multiple Replicators on a Single Host
2.8. Replicating Data Into an Existing Dataservice
2.9. Removing Datasources from a Deployment
2.9.1. Removing a Datasource from an Existing Deployment
2.10. Starting and Stopping Tungsten Replicator
2.11. Configuring Startup on Boot
2.12. Upgrading Tungsten Replicator
2.12.1. Upgrading Installations using update
2.12.2. Upgrading Continuent Tungsten to use tpm
2.12.3. Upgrading Continuent Tungsten using tpm
2.12.4. Installing an Upgraded JAR Patch
3. Heterogenous Deployments
3.1. How Heterogeneous Replication Works
3.1. MySQL to Oracle, Oracle to MySQL, and Oracle to Oracle Replication
3.1. Native Applier Replication (e.g. MongoDB)
3.1. Batch Loading
3.1. Schema Creation and Replication
3.2. Deploying Oracle Replication
3.2.1. How Oracle Extraction Works
3.2.2. Data Type Differences and Limitations
3.2.3. Creating a MySQL to Oracle Deployment
3.2.3.1. Configure the MySQL database
3.2.3.2. Configure the Oracle database
3.2.3.3. Create the Destination Schema
3.2.3.4. Install the Master Replicator Service
3.2.3.5. Install Slave Replicator
3.2.4. Creating an Oracle to MySQL Deployment
3.2.4.1. Configuring the Oracle Environment
3.2.4.2. Creating the MySQL Environment
3.2.4.3. Creating the Destination Schema
3.2.4.4. Creating the Master Replicator
3.2.4.5. Creating the Slave Replicator
3.2.5. Creating an Oracle to Oracle Deployment
3.2.5.1. Setting up the Source Oracle Environment
3.2.5.2. Setting up the Target Oracle Environment
3.2.5.3. Creating the Destination Schema
3.2.5.4. Installing the Master Replicator
3.2.5.5. Installing the Slave Replicator
3.2.6. Deployment with Provisioning
3.2.7. Updating CDC after Schema Changes
3.2.8. CDC Cleanup and Correction
3.2.9. Tuning CDC Extraction
3.2.10. Troubleshooting Oracle Deployments
3.2.10.1. 'subscription exists' when setting up CDC on Oracle
3.3. Deploying MySQL to MongoDB Replication
3.3.1. Preparing Hosts for MongoDB Replication
3.3.2. Installing MongoDB Replication
3.3.3. Management and Monitoring of MongoDB Deployments
3.4. Deploying MySQL to Hadoop Replication
3.4.1. Hadoop Replication Operation
3.4.2. Preparing Hosts for Hadoop Replication
3.4.2.1. MySQL Host
3.4.2.2. Hadoop Host
3.4.2.3. Schema Generation
3.4.3. Installing Hadoop Replication
3.4.3.1. MySQL to Hadoop Master Replicator Service
3.4.3.2. Oracle to Hadoop Master Replicator Service
3.4.3.3. Slave Replicator Service
3.4.4. Generating Materialized Views
3.4.5. Accessing Generated Tables in Hive
3.4.6. Management and Monitoring of Hadoop Deployments
3.4.6.1. Troubleshooting Hadoop Replication
3.5. Deploying MySQL to Amazon RDS Replication
3.5.1. Preparing Hosts for Amazon RDS Replication
3.5.2. Installing MySQL to Amazon RDS Replication
3.5.3. Management and Monitoring of Amazon RDS Deployments
3.5.4. Changing Amazon RDS Instance Configurations
3.6. Deploying Amazon RDS to MySQL Replication
3.6.1. Preparing Hosts for Replicating from Amazon RDS
3.6.2. Installing Amazon RDS to MySQL Replication
3.6.3. Management and Monitoring Deployment from Amazon RDS
3.6.4. Changing Amazon RDS Instance Configurations
3.7. Deploying MySQL to Amazon RedShift Replication
3.7.1. RedShift Replication Operation
3.7.2. Preparing Hosts for Amazon RedShift Deployments
3.7.2.1. MySQL Preparation for Amazon RedShift Deployments
3.7.2.2. RedShift Preparation for Amazon RedShift Deployments
3.7.2.3. Amazon RedShift DDL Generation for Amazon RedShift Deployments
3.7.3. Installing Amazon RedShift Replication
3.7.4. Verifying your RedShift Installation
3.7.5. Keeping CDC Information
3.7.6. Management and Monitoring of Amazon RedShift Deployments
3.7.7. Troubleshooting Amazon RedShift Installations
3.8. Deploying MySQL to Vertica Replication
3.8.1. Preparing Hosts for Vertica Deployments
3.8.2. Installing Vertica Replication
3.8.3. Management and Monitoring of Vertica Deployments
3.8.4. Troubleshooting Vertica Installations
3.9. Deploying Infobright Replication
3.9.1. Preparing Hosts for InfoBright Replication
3.9.2. Installing Infobright Replication
3.9.3. Management and Monitoring of Infobright Deployments
3.10. Deploying InfiniDB Replication
3.10.1. Preparing Hosts for InfiniDB Replication
3.10.2. Installing InfiniDB Replication
3.10.3. Management and Monitoring of InfiniDB Deployments
3.10.4. Troubleshooting InfiniDB Deployments
3.11. Deploying PostgreSQL Replication
3.11.1. Preparing Hosts for PostgreSQL Replication
3.11.2. Installing PostgreSQL Replication
3.11.3. Management and Monitoring of PostgreSQL Deployments
4. Advanced Deployments
4.1. Migrating and Seeding Data
4.1.1. Migrating from MySQL Native Replication 'In-Place'
4.1.2. Seeding Data through Oracle
4.2. Deploying Parallel Replication
4.2.1. Application Prerequisites for Parallel Replication
4.2.2. Enabling Parallel Apply
4.2.3. Channels
4.2.4. Disk vs. Memory Parallel Queues
4.2.5. Parallel Replication and Offline Operation
4.2.5.1. Clean Offline Operation
4.2.5.2. Tuning the Time to Go Offline Cleanly
4.2.5.3. Unclean Offline
4.2.6. Adjusting Parallel Replication After Installation
4.2.6.1. How to Change Channels Safely
4.2.6.2. How to Switch Parallel Queue Types Safely
4.2.7. Monitoring Parallel Replication
4.2.7.1. Useful Commands for Parallel Monitoring Replication
4.2.7.2. Parallel Replication and Applied Latency On Slaves
4.2.7.3. Relative Latency
4.2.7.4. Serialization Count
4.2.7.5. Maximum Offline Interval
4.2.7.6. Workload Distribution
4.2.8. Controlling Assignment of Shards to Channels
4.3. Batch Loading for Data Warehouses
4.3.1. How It Works
4.3.2. Important Limitations
4.3.3. Batch Applier Setup
4.3.4. JavaScript Batchloader Scripts
4.3.4.1. JavaScript Batchloader with Parallel Apply
4.3.5. Staging Tables
4.3.5.1. Staging Table Names
4.3.5.2. Whole Record Staging
4.3.5.3. Delete Key Staging
4.3.5.4. Staging Table Generation
4.3.6. Character Sets
4.3.7. Time Zones
4.4. Using the Parallel Extractor
4.4.1. Advanced Configuration Parameters
4.5. Deploying SSL Secured Replication and Administration
4.5.1. Creating the Truststore and Keystore
4.5.1.1. Creating Your Own Client and Server Certificates
4.5.1.2. Creating a Custom Certificate and Getting it Signed
4.5.1.3. Using an existing Certificate
4.5.1.4. Converting SSL Certificates for keytool
4.5.2. SSL and Administration Authentication
4.5.3. Configuring the Secure Service through tpm
5. Operations Guide
5.1. Checking Replication Status
5.1.1. Understanding Replicator States
5.1.2. Replicator States During Operations
5.1.3. Changing Replicator States
5.2. Managing Transaction Failures
5.2.1. Identifying a Transaction Mismatch
5.2.2. Skipping Transactions
5.3. Automatic Replicator Recovery
5.4. Provision or Reprovision a host
5.5. Creating a Backup
5.5.1. Using a Different Backup Tool
5.5.2. Backup a Different Host
5.6. Restoring a Backup
5.6.1. Restoring a Backup to a Different Node
5.7. Switching Master Hosts
5.8. Configuring Parallel Replication
5.9. Performing Database or OS Maintenance
5.9.1. Performing Maintenance on a Single Slave
5.9.2. Performing Maintenance on a Master
5.9.3. Performing Maintenance on an Entire Dataservice
5.9.4. Upgrading or Updating your JVM
5.10. Making Online Schema Changes
6. Command-line Tools
6.1. The check_tungsten_services Command
6.2. The check_tungsten_online Command
6.3. The check_tungsten_latency Command
6.4. The ddlscan Command
6.4.1. Optional Arguments
6.4.2. Supported Templates and Usage
6.4.2.1. ddl-check-pkeys.vm
6.4.2.2. ddl-mysql-hive-0.10.vm
6.4.2.3. ddl-mysql-hive-0.10-staging.vm
6.4.2.4. ddl-mysql-hive-metadata.vm
6.4.2.5. ddl-mysql-oracle.vm
6.4.2.6. ddl-mysql-oracle-cdc.vm
6.4.2.7. ddl-mysql-vertica.vm
6.4.2.8. ddl-mysql-vertica-staging.vm
6.4.2.9. ddl-oracle-mysql.vm
6.4.2.10. ddl-oracle-mysql-pk-only.vm
6.5. The load-reduce-check Tool
6.5.1. Generating Staging DDL
6.5.2. Generating Live DDL
6.5.3. Materializing a View
6.5.4. Compare Loaded Data
6.6. The materialize Command
6.7. The multi_trepctl Command
6.7.1. multi_trepctl Options
6.7.2. multi_trepctl Commands
6.7.2.1. multi_trepctl list Command
6.7.2.2. multi_trepctl run Command
6.8. The setupCDC.sh Command
6.9. The thl Command
6.9.1. thl list Command
6.9.2. thl index Command
6.9.3. thl purge Command
6.9.4. thl info Command
6.9.5. thl help Command
6.10. The tpm Command
6.10.1. Comparing Staging and INI tpm Methods
6.10.2. Processing Installs and Upgrades
6.10.3. tpm Staging Configuration
6.10.3.1. Configuring default options for all services
6.10.3.2. Configuring a single service
6.10.3.3. Configuring a single host
6.10.3.4. Reviewing the current configuration
6.10.3.5. Installation
6.10.3.6. Upgrades and Updates with tpm
6.10.3.7. Making configuration changes
6.10.4. tpm INI File Configuration
6.10.4.1. Creating an INI file
6.10.4.2. Installation with INI File
6.10.4.3. Upgrades with INI File
6.10.4.4. Making configuration changes
6.10.5. tpm Commands
6.10.5.1. tpm configure Command
6.10.5.2. tpm diag Command
6.10.5.3. tpm fetch Command
6.10.5.4. tpm firewall Command
6.10.5.5. tpm help Command
6.10.5.6. tpm install Command
6.10.5.7. tpm mysql Command
6.10.5.8. tpm query Command
6.10.5.9. tpm reset Command
6.10.5.10. tpm reset-thl Command
6.10.5.11. tpm restart Command
6.10.5.12. tpm reverse Command
6.10.5.13. tpm start Command
6.10.5.14. tpm stop Command
6.10.5.15. tpm update Command
6.10.5.16. tpm validate Command
6.10.5.17. tpm validate-update Command
6.10.6. tpm Configuration Options
6.10.7. Troubleshooting
6.11. The trepctl Command
6.11.1. trepctl Options
6.11.2. trepctl Global Commands
6.11.2.1. trepctl kill Command
6.11.2.2. trepctl services Command
6.11.2.3. trepctl version Command
6.11.3. trepctl Service Commands
6.11.3.1. trepctl backup Command
6.11.3.2. trepctl capabilities Command
6.11.3.3. trepctl check Command
6.11.3.4. trepctl clear Command
6.11.3.5. trepctl clients Command
6.11.3.6. trepctl flush Command
6.11.3.7. trepctl heartbeat Command
6.11.3.8. trepctl load Command
6.11.3.9. trepctl offline Command
6.11.3.10. trepctl offline-deferred Command
6.11.3.11. trepctl online Command
6.11.3.12. trepctl properties Command
6.11.3.13. trepctl purge Command
6.11.3.14. trepctl reset Command
6.11.3.15. trepctl restore Command
6.11.3.16. trepctl setrole Command
6.11.3.17. trepctl shard Command
6.11.3.18. trepctl status Command
6.11.3.19. trepctl unload Command
6.11.3.20. trepctl wait Command
6.12. The tungsten_provision_slave Script
6.13. The tungsten_read_master_events Script
6.14. The tungsten_set_position Script
6.15. The updateCDC.sh Command
7. Using the Cookbook
8. Replication Filters
8.1. Enabling/Disabling Filters
8.2. Enabling Additional Filters
8.3. Filter Status
8.4. Filter Reference
8.4.1. BidiRemoteSlave Filter
8.4.2. BuildAuditTable Filter
8.4.3. BuildIndexTable Filter
8.4.4. CaseMapping Filter
8.4.5. CDCMetadata Filter
8.4.6. ColumnName Filter
8.4.7. ConsistencyCheck Filter
8.4.8. DatabaseTransform Filter
8.4.9. Dummy Filter
8.4.10. EnumToString Filter
8.4.11. EventMetadata Filter
8.4.12. Heartbeat Filter
8.4.13. Logging Filter
8.4.14. MySQLSessionSupport Filter
8.4.15. OptimizeUpdates Filter
8.4.16. PrimaryKey Filter
8.4.17. PrintEvent Filter
8.4.18. Rename Filter
8.4.18.1. Rename Filter Examples
8.4.19. ReplicateColumns Filter
8.4.20. Replicate Filter
8.4.21. SetToString Filter
8.4.22. Shard Filter
8.4.23. TimeDelay Filter
8.5. JavaScript Filters
8.5.1. Writing JavaScript Filters
8.5.1.1. Implementable Functions
8.5.1.2. Getting Configuration Parameters
8.5.1.3. Logging Information and Exceptions
8.5.1.4. Exposed Data Structures
8.5.2. JavaScript Filter Reference
8.5.2.1. ansiquotes.js Filter
8.5.2.2. breadcrumbs.js Filter
8.5.2.3. dbrename.js Filter
8.5.2.4. dbselector.js Filter
8.5.2.5. dbupper.js Filter
8.5.2.6. dropcolumn.js Filter
8.5.2.7. dropcomments.js Filter
8.5.2.8. dropmetadata.js Filter
8.5.2.9. dropstatementdata.js Filter
8.5.2.10. foreignkeychecks.js Filter
8.5.2.11. insertsonly.js Filter
8.5.2.12. nocreatedbifnotexists.js Filter
8.5.2.13. noonlykeywords.js Filter
8.5.2.14. pgddl.js Filter
8.5.2.15. shardbyseqno.js Filter
8.5.2.16. shardbytable.js Filter
8.5.2.17. tosingledb.js Filter
8.5.2.18. truncatetext.js Filter
8.5.2.19. zerodate2null.js Filter
9. Performance and Tuning
9.1. Block Commit
9.1.1. Monitoring Block Commit Status
9.2. Improving Network Performance
10. Configuration Files and Format
10.1. Replicator Configuration Properties
A. Troubleshooting
A.1. Contacting Support
A.1. Creating a Support Account
A.1. Generating Diagnostic Information
A.1. Open a Support Ticket
A.1. Open a Support Ticket via Email
A.1. Getting Updates for all Company Support Tickets
A.2. Error/Cause/Solution
A.2.1. Too many open processes or files
A.2.2. OptimizeUpdatesFilter cannot filter, because column and key count is different. Make sure that it is defined before filters which remove keys (eg. PrimaryKeyFilter).
A.2.3. Unable to update the configuration of an installed directory
A.2.4. 'subscription exists' when setting up CDC on Oracle
A.2.5. MySQLExtractException: unknown data type 0
A.2.6. The session variable SQL_MODE when set to include ALLOW_INVALID_DATES does not apply statements correctly on the slave.
A.3. Known Issues
A.3.1. Triggers
A.4. Troubleshooting Timeouts
A.5. Troubleshooting Backups
A.6. Running Out of Diskspace
A.7. Troubleshooting SSH and tpm
A.8. Troubleshooting Data Differences
A.9. Comparing Table Data
A.10. Troubleshooting Memory Usage
B. Release Notes
B.1. Tungsten Replicator 3.0.0 NYR (Not Yet Released)
C. Prerequisites
C.1. Staging Host Configuration
C.2. Host Configuration
C.2.1. Creating the User Environment
C.2.2. Configuring Network and SSH Environment
C.2.2.1. Network Ports
C.2.2.2. SSH Configuration
C.2.3. Directory Locations and Configuration
C.2.4. Configure Software
C.2.5. sudo Configuration
C.3. MySQL Database Setup
C.3.1. MySQL Configuration
C.3.2. MySQL User Configuration
C.3.3. MySQL Unprivileged Users
C.4. Oracle Database Setup
C.4.1. Oracle Environment Variables
C.5. PostgreSQL Database Setup
D. Terminology Reference
D.1. Transaction History Log (THL)
D.1.1. THL Format
D.2. Generated Field Reference
D.2.1. Terminology: Fields accessFailures
D.2.2. Terminology: Fields active
D.2.3. Terminology: Fields activeSeqno
D.2.4. Terminology: Fields appliedLastEventId
D.2.5. Terminology: Fields appliedLastSeqno
D.2.6. Terminology: Fields appliedLatency
D.2.7. Terminology: Fields applier.class
D.2.8. Terminology: Fields applier.name
D.2.9. Terminology: Fields applyTime
D.2.10. Terminology: Fields autoRecoveryEnabled
D.2.11. Terminology: Fields autoRecoveryTotal
D.2.12. Terminology: Fields averageBlockSize
D.2.13. Terminology: Fields blockCommitRowCount
D.2.14. Terminology: Fields cancelled
D.2.15. Terminology: Fields channel
D.2.16. Terminology: Fields channels
D.2.17. Terminology: Fields clusterName
D.2.18. Terminology: Fields commits
D.2.19. Terminology: Fields committedMinSeqno
D.2.20. Terminology: Fields criticalPartition
D.2.21. Terminology: Fields currentBlockSize
D.2.22. Terminology: Fields currentEventId
D.2.23. Terminology: Fields currentLastEventId
D.2.24. Terminology: Fields currentLastFragno
D.2.25. Terminology: Fields currentLastSeqno
D.2.26. Terminology: Fields currentTimeMillis
D.2.27. Terminology: Fields dataServerHost
D.2.28. Terminology: Fields discardCount
D.2.29. Terminology: Fields doChecksum
D.2.30. Terminology: Fields estimatedOfflineInterval
D.2.31. Terminology: Fields eventCount
D.2.32. Terminology: Fields extensions
D.2.33. Terminology: Fields extractTime
D.2.34. Terminology: Fields extractor.class
D.2.35. Terminology: Fields extractor.name
D.2.36. Terminology: Fields filter.#.class
D.2.37. Terminology: Fields filter.#.name
D.2.38. Terminology: Fields filterTime
D.2.39. Terminology: Fields flushIntervalMillis
D.2.40. Terminology: Fields fsyncOnFlush
D.2.41. Terminology: Fields headSeqno
D.2.42. Terminology: Fields intervalGuard
D.2.43. Terminology: Fields lastCommittedBlockSize
D.2.44. Terminology: Fields lastCommittedBlockTime
D.2.45. Terminology: Fields latestEpochNumber
D.2.46. Terminology: Fields logConnectionTimeout
D.2.47. Terminology: Fields logDir
D.2.48. Terminology: Fields logFileRetainMillis
D.2.49. Terminology: Fields logFileSize
D.2.50. Terminology: Fields masterConnectUri
D.2.51. Terminology: Fields masterListenUri
D.2.52. Terminology: Fields maxChannel
D.2.53. Terminology: Fields maxDelayInterval
D.2.54. Terminology: Fields maxOfflineInterval
D.2.55. Terminology: Fields maxSize
D.2.56. Terminology: Fields maximumStoredSeqNo
D.2.57. Terminology: Fields minimumStoredSeqNo
D.2.58. Terminology: Fields name
D.2.59. Terminology: Fields offlineRequests
D.2.60. Terminology: Fields otherTime
D.2.61. Terminology: Fields pendingError
D.2.62. Terminology: Fields pendingErrorCode
D.2.63. Terminology: Fields pendingErrorEventId
D.2.64. Terminology: Fields pendingErrorSeqno
D.2.65. Terminology: Fields pendingExceptionMessage
D.2.66. Terminology: Fields pipelineSource
D.2.67. Terminology: Fields processedMinSeqno
D.2.68. Terminology: Fields queues
D.2.69. Terminology: Fields readOnly
D.2.70. Terminology: Fields relativeLatency
D.2.71. Terminology: Fields resourcePrecedence
D.2.72. Terminology: Fields rmiPort
D.2.73. Terminology: Fields role
D.2.74. Terminology: Fields seqnoType
D.2.75. Terminology: Fields serializationCount
D.2.76. Terminology: Fields serialized
D.2.77. Terminology: Fields serviceName
D.2.78. Terminology: Fields serviceType
D.2.79. Terminology: Fields shard_id
D.2.80. Terminology: Fields simpleServiceName
D.2.81. Terminology: Fields siteName
D.2.82. Terminology: Fields sourceId
D.2.83. Terminology: Fields stage
D.2.84. Terminology: Fields started
D.2.85. Terminology: Fields state
D.2.86. Terminology: Fields stopRequested
D.2.87. Terminology: Fields store.#
D.2.88. Terminology: Fields storeClass
D.2.89. Terminology: Fields syncInterval
D.2.90. Terminology: Fields taskCount
D.2.91. Terminology: Fields taskId
D.2.92. Terminology: Fields timeInStateSeconds
D.2.93. Terminology: Fields timeoutMillis
D.2.94. Terminology: Fields totalAssignments
D.2.95. Terminology: Fields transitioningTo
D.2.96. Terminology: Fields uptimeSeconds
D.2.97. Terminology: Fields version
E. Files and Directories
E.1. The Continuent Tungsten Install Directory
E.1.1. The backups Directory
E.1.1.1. Purging Backup Files
E.1.1.2. Copying Backup Files
E.1.1.3. Relocating Backup Storage
E.1.2. The confs Directory
E.1.3. The releases Directory
E.1.4. The service_logs Directory
E.1.5. The share Directory
E.1.6. The thl Directory
E.1.6.1. Purging THL Log Information
E.1.6.2. Moving the THL File Location
E.1.6.3. Changing the THL Retention Times
F. Internals
F.1. Extending Backup and Restore Behavior
F.1.1. Backup Behavior
F.1.2. Restore Behavior
F.1.3. Writing a Custom Backup/Restore Script
F.1.4. Enabling a Custom Backup Script
F.2. Character Sets in Database and Continuent Tungsten
F.3. Memory Tuning and Performance
F.3.1. Understanding Tungsten Replicator Memory Tuning
G. Frequently Asked Questions (FAQ)
H. Ecosystem Support
H.1. Continuent Github Repositories
H.2. Managing Log Files with logrotate
H.3. Monitoring Status Using cacti
H.4. Monitoring Status Using nagios