Tungsten Clustering (for MySQL) 7.1 Manual

Continuent Ltd

Abstract

This manual documents Tungsten Clustering (for MySQL), a high performance, High Availability and Disaster Recovery for MySQL clustering.

This manual includes information for 7.1, up to and including 7.1.4.

Build date: 2024-11-28 (505168f6)

Up to date builds of this document: Tungsten Clustering (for MySQL) 7.1 Manual (Online), Tungsten Clustering (for MySQL) 7.1 Manual (PDF)


Table of Contents

Preface
1. Legal Notice
2. Conventions
3. Quickstart Guide
1. Introduction
1.1. Tungsten Replicator
1.1.1. Transaction History Log (THL)
1.2. Tungsten Manager
1.3. Tungsten Connector
2. Deployment
2.1. Host Types
2.1.1. Manager Hosts
2.1.2. Connector (Router) Hosts
2.1.3. Replicator Hosts
2.1.4. Active Witness Hosts
2.2. Requirements
2.2.1. Operating Systems Support
2.2.2. Database Support
2.2.2. Version Support Matrix
2.2.2. MySQL "Innovation" Releases
2.2.3. RAM Requirements
2.2.4. Disk Requirements
2.2.5. Java Requirements
2.2.6. Cloud Deployment Requirements
2.2.7. Docker Support Policy
2.2.7.1. Overview
2.2.7.2. Background
2.2.7.3. Current State
2.2.7.4. Summary
2.3. Deployment Sources
2.3.1. Using the TAR/GZipped files
2.3.2. Using the RPM package files
2.4. Common tpm Options During Deployment
2.5. Best Practices
2.5.1. Best Practices: Deployment
2.5.2. Best Practices: Upgrade
2.5.3. Best Practices: Operations
2.5.4. Best Practices: Maintenance
3. Deployment: MySQL Topologies
3.1. Deploying Standalone HA Clusters
3.1.1. Prepare: Standalone HA Cluster
3.1.2. Install: Standalone HA Cluster
3.1.3. Best Practices: Standalone HA Cluster
3.2. Deploying Composite Active/Passive Clustering
3.2.1. Prepare: Composite Active/Passive Cluster
3.2.2. Install: Composite Active/Passive Cluster
3.2.3. Best Practices: Composite Active/Passive Cluster
3.2.4. Adding a remote Composite Cluster
3.3. Deploying Multi-Site/Active-Active Clustering
3.3.1. Prepare: Multi-Site/Active-Active Clusters
3.3.2. Install: Multi-Site/Active-Active Clusters
3.3.3. Best Practices: Multi-Site/Active-Active Clusters
3.3.4. Configuring Startup on Boot
3.3.5. Resetting a single dataservice
3.3.6. Resetting all dataservices
3.3.7. Provisioning during live operations
3.3.8. Adding a new Cluster/Dataservice
3.3.9. Enabling SSL for Replicators Only
3.3.10. Dataserver maintenance
3.3.10.1. Fixing Replication Errors
3.4. Deploying Composite Active/Active Clusters
3.4.1. Prepare: Composite Active/Active Clusters
3.4.2. Install: Composite Active/Active Clusters
3.4.3. Best Practices: Composite Active/Active Clusters
3.4.4. Configuring Startup on Boot
3.4.5. Resetting a single dataservice
3.4.6. Resetting all dataservices
3.4.7. Dataserver maintenance
3.4.7.1. Fixing Replication Errors
3.4.8. Adding a Cluster to a Composite Active/Active Topology
3.4.8.1. Pre-Requisites
3.4.8.2. Backup and Restore
3.4.8.3. Update Existing Configuration
3.4.8.4. New Host Configuration
3.4.8.5. Install on new nodes
3.4.8.6. Update existing nodes
3.4.8.7. Start the new cluster
3.4.8.8. Validate and check
3.5. Deploying Composite Dynamic Active/Active
3.5.1. Enabling Composite Dynamic Active/Active
3.6. Deploying Tungsten Connector Only
3.7. Deploying Additional Datasources, Managers, or Connectors
3.7.1. Adding Datasources to an Existing Deployment
3.7.2. Adding Active Witnesses to an Existing Deployment
3.7.3. Replacing an Active Witness as a Full Cluster Node
3.7.4. Replacing a Full Cluster Node as an Active Witness
3.7.5. Adding Connectors to an Existing Deployment
3.7.6. Converting from a single cluster to a composite cluster
3.7.6.1. Convert and add new nodes as a new service
3.7.6.2. Convert and move nodes to a new service
3.8. Replicating Data Into an Existing Dataservice
3.9. Replicating Data Out of a Cluster
3.9.1. Prepare: Replicating Data Out of a Cluster
3.9.2. Deploy: Replicating Data Out of a Cluster
3.10. Replicating from a Cluster to a Datawarehouse
3.10.1. Replicating from a Cluster to a Datawarehouse - Prerequisites
3.10.2. Replicating from a Cluster to a Datawarehouse - Configuring the Cluster Nodes
3.10.3. Replicating from a Cluster to a Datawarehouse - Configuring the Cluster-Extractor
3.10.3.1. Replicating Data from a Cluster to a Datawarehouse (Staging Use Case)
3.10.3.2. Replicating Data from a Cluster to a Datawarehouse (INI Use Case)
3.11. Migrating and Seeding Data
3.11.1. Migrating from MySQL Native Replication 'In-Place'
3.11.2. Migrating from MySQL Native Replication Using a New Service
3.11.3. Seeding Data through MySQL
4. Deployment: Advanced
4.1. Deploying Parallel Replication
4.1.1. Application Prerequisites for Parallel Replication
4.1.2. Enabling Parallel Apply During Install
4.1.3. Channels
4.1.4. Parallel Replication and Offline Operation
4.1.4.1. Clean Offline Operation
4.1.4.2. Tuning the Time to Go Offline Cleanly
4.1.4.3. Unclean Offline
4.1.5. Adjusting Parallel Replication After Installation
4.1.5.1. How to Enable Parallel Apply After Installation
4.1.5.2. How to Change Channels Safely
4.1.5.3. How to Disable Parallel Replication Safely
4.1.5.4. How to Switch Parallel Queue Types Safely
4.1.6. Monitoring Parallel Replication
4.1.6.1. Useful Commands for Parallel Monitoring Replication
4.1.6.2. Parallel Replication and Applied Latency On Replicas
4.1.6.3. Relative Latency
4.1.6.4. Serialization Count
4.1.6.5. Maximum Offline Interval
4.1.6.6. Workload Distribution
4.1.7. Controlling Assignment of Shards to Channels
4.1.8. Disk vs. Memory Parallel Queues
4.2. Distributed Datasource Groups
4.2.1. Introduction to DDG
4.2.2. How DDG Works
4.2.3. Configuring DDG
4.3. Starting and Stopping Tungsten Cluster
4.3.1. Restarting the Replicator Service
4.3.2. Restarting the Connector Service
4.3.3. Restarting the Manager Service
4.3.4. Restarting the Multi-Site/Active-Active Replicator Service
4.4. Configuring Startup on Boot
4.4.1. Configuring Multi-Site/Active-Active Replicator Startup on Boot
4.5. Upgrading Tungsten Cluster
4.5.1. Upgrading using the Staging Method (with ssh Access)
4.5.2. Upgrading when using INI-based configuration, or without ssh Access
4.5.2.1. Upgrading
4.5.2.2. Upgrading a Single Host using tpm
4.5.3. Upgrade/Convert: From Multi-Site/Active-Active (MSAA) to Composite Active/Passive (CAP)
4.5.3.1. Conversion Prerequisites
4.5.3.2. Step 1: Backups
4.5.3.3. Step 2: Redirect Client Connections
4.5.3.4. Step 3: Enter Maintenance Mode
4.5.3.5. Step 4: Stop the Cross-site Replicators
4.5.3.6. Step 5: Export the tracking schema databases
4.5.3.7. Step 6: Uninstall the Cross-site Replicators
4.5.3.8. Step 7: Create Composite Tracking Schema
4.5.3.9. Step 8: Reload the tracking schema for Passive clusters
4.5.3.10. Step 9: Stop local cluster Replicators
4.5.3.11. Step 10: Remove THL
4.5.3.12. Step 11: Export the tracking schema database on Active cluster
4.5.3.13. Step 12: Reload the tracking schema for Active cluster
4.5.3.14. Step 13: Update Configuration
4.5.3.15. Step 14: Install the Software on Active Cluster
4.5.3.16. Step 15: Start Local Replicators on Active cluster
4.5.3.17. Step 16: Install the Software on remaining Clusters
4.5.3.18. Step 17: Start Local Replicators on remaining clusters
4.5.3.19. Step 18: Convert Datasource roles for Passive clusters
4.5.3.20. Step 19: Upgrade the Software on Connectors
4.5.4. Upgrade/Convert: From Multi-Site/Active-Active (MSAA) to Composite Active/Active (CAA)
4.5.4.1. Supported Upgrade Paths
4.5.4.2. Upgrade Prerequisites
4.5.4.3. Step 1: Backups
4.5.4.4. Step 2: Stop the Cross-site Replicators
4.5.4.5. Step 3: Export the tungsten_* Databases
4.5.4.6. Step 4: Uninstall the Cross-site Replicators
4.5.4.7. Step 5: Reload the tracking schema
4.5.4.8. Step 6: Update Configuration
4.5.4.9. Step 7: Enter Maintenance Mode
4.5.4.10. Step 8: Stop Managers
4.5.4.11. Step 9: Install/Update the Software
4.5.4.12. Step 10: Start Managers
4.5.4.13. Step 11: Return to Automatic Mode
4.5.4.14. Step 12: Validate
4.5.5. Installing an Upgraded JAR Patch
4.5.6. Installing Patches
4.5.7. Upgrading to v7.0.0+
4.5.7.1. Background
4.5.7.2. Upgrade Decisions
4.5.7.3. Setup internal encryption and authentication
4.5.7.4. Enable Tungsten to Database Encryption
4.5.7.5. Enable Connector to Database Encryption
4.5.7.6. Enable MySQL SSL
4.5.7.7. Steps to upgrade using tpm
4.5.7.8. Optional Post-Upgrade steps to configure API
4.6. Removing Datasources, Managers or Connectors
4.6.1. Removing a Datasource from an Existing Deployment
4.6.2. Removing a Composite Datasource/Cluster from an Existing Deployment
4.6.3. Removing a Connector from an Existing Deployment
5. Deployment: Security
5.1. Enabling Security
5.1.1. Enabling Security using the Staging Method
5.1.2. Enabling Security using the INI Method
5.2. Disabling Security
5.3. Creating Suitable Certificates
5.3.1. Creating Tungsten Internal Certificates Using tpm cert
5.3.2. Creating Tungsten Internal Certificates Manually
5.4. Installing from a Staging Host with Custom Certificates
5.4.1. Installing from a Staging Host with Manually-Generated Certificates
5.4.2. Installing from a Staging Host with Certificates Generated by tpm cert
5.5. Installing via INI File with Custom Certificates
5.5.1. Installing via INI File with Manually-Generated Certificates
5.5.2. Installing via INI File with Certificates Generated by tpm cert
5.6. Installing via INI File with CA-Signed Certificates
5.7. Replacing the JGroups Certificate from a Staging Directory
5.8. Replacing the TLS Certificate from a Staging Directory
5.9. Removing JGroups Encryption from a Staging Directory
5.10. Removing JGroups Encryption via INI File
5.11. Removing TLS Encryption from a Staging Directory
5.12. Removing TLS Encryption via INI File
5.13. Enabling Tungsten<>Database Security
5.13.1. Enabling Database SSL
5.13.1.1. Generate the Database Certs
5.13.1.2. Common Steps for Enabling Database SSL
5.13.2. Configure Tungsten<>Database Secure Communication
5.13.3. Configuring Connector SSL
5.13.3.1. Enable and Test SSL encryption from the Connector to the Database
5.13.3.2. Test SSL encryption from the Application to the Database
6. Operations Guide
6.1. The Home Directory
6.2. Establishing the Shell Environment
6.3. Checking Dataservice Status
6.3.1. Latency or Relative Latency Display
6.3.2. Getting Detailed Information
6.3.3. Understanding Datasource Roles
6.3.4. Understanding Datasource States
6.3.4.1. ONLINE State
6.3.4.2. OFFLINE State
6.3.4.3. FAILED State
6.3.4.4. SHUNNED State
6.3.5. Understanding Replicator Roles
6.3.6. Changing Datasource States
6.3.6.1. Shunning a Datasource
6.3.6.2. Recover a Datasource
6.3.6.3. Offline a Datasource
6.3.6.4. Mark a Datasource as Standby
6.3.6.5. Mark a Datasource as Archive
6.3.7. Datasource Statuses
6.3.8. Datasource States and Policy Mode Interactions
6.4. Policy Modes
6.4.1. Setting Policy Modes
6.5. Switching Primary Hosts
6.5.1. Automatic Primary Failover
6.5.2. Manual Primary Switch
6.6. Datasource Recovery Steps
6.6.1. Recover a failed Replica
6.6.1.1. Provision or Reprovision a Replica
6.6.1.2. Recover a Replica from manually shunned state
6.6.1.3. Replica Datasource Extended Recovery
6.6.2. Recover a failed Primary
6.6.2.1. Recover when there are no Primaries
6.6.2.2. Recover a shunned Primary
6.6.2.3. Manually Failing over a Primary in MAINTENANCE policy mode
6.6.2.4. Failing over a Primary
6.6.2.5. Split-Brain Discussion
6.7. Composite Cluster Switching, Failover and Recovery
6.7.1. Composite Cluster Site Switch
6.7.2. Composite Cluster Site Failover (Forced Switch)
6.7.3. Composite Cluster Site Recovery
6.7.4. Composite Cluster Relay Recovery
6.8. Composite Active/Active Recovery
6.9. Managing Transaction Failures
6.9.1. Identifying a Transaction Mismatch
6.9.2. Skipping Transactions
6.10. Creating a Backup
6.10.1. Using a Different Backup Tool
6.10.2. Automating Backups
6.10.3. Using a Different Directory Location
6.10.4. Creating an External Backup
6.11. Restoring a Backup
6.11.1. Restoring a Specific Backup
6.11.2. Restoring an External Backup
6.11.3. Restoring from Another Replica
6.11.4. Manually Recovering from Another Replica
6.11.5. Reprovision a MySQL Replica using rsync
6.11.6. Rebuilding a Lost Datasource
6.11.7. Resetting an Entire Dataservice from Filesystem Snapshots
6.12. Migrating and Seeding Data
6.12.1. Migrating from MySQL Native Replication 'In-Place'
6.12.2. Migrating from MySQL Native Replication Using a New Service
6.13. Resetting a Tungsten Cluster Dataservice
6.13.1. Reset a Single Site in a Multi-Site/Active-Active Topology
6.13.2. Reset All Sites in a Multi-Site/Active-Active topology
6.14. Replicator Fencing
6.14.1. Fencing a Replica Node Due to a Replication Fault
6.14.2. Fencing Primary Replicators
6.15. Performing Database or OS Maintenance
6.15.1. Performing Maintenance on a Single Replica
6.15.2. Performing Maintenance on a Primary
6.15.3. Performing Maintenance on an Entire Dataservice
6.15.4. Making Schema Changes
6.15.5. Upgrading or Updating your JVM
6.15.6. Upgrading between MySQL 5.x and MySQL 8.x
6.16. Performing Tungsten Cluster Maintenance
6.17. Monitoring Tungsten Cluster
6.17.1. Managing Log Files with logrotate
6.17.2. Monitoring Status Using cacti
6.17.3. Monitoring Status Using nagios
6.17.4. Monitoring Status Using Prometheus Exporters
6.17.4.1. Monitoring Status with Exporters Overview
6.17.4.2. Customizing the Prometheus Exporter Configuration
6.17.4.3. Disabling the Prometheus Exporters
6.17.4.4. Managing and Testing Exporters Using the tmonitor Command
6.17.4.5. Monitoring Node Status Using the External node_exporter
6.17.4.6. Monitoring MySQL Server Status Using the External mysqld_exporter
6.17.4.7. Monitoring Tungsten Replicator Status Using the Built-In Exporter
6.17.4.8. Monitoring Tungsten Manager Status Using the Built-In Exporter
6.17.4.9. Monitoring Tungsten Connector Status Using the Built-In Exporter
6.17.4.10. Alerting Using Prometheus Rules
6.18. Rebuilding THL on the Primary
6.19. THL Encryption and Compression
6.19.1. In-Flight Compression
6.19.2. Encryption and Compression On-Disk
7. Tungsten Connector
7.1. Connector/Application Basics
7.1.1. Connector Control Flow
7.2. Basic Connector Configuration
7.3. Clients and Deployment
7.3.1. Connection Pools
7.4. Routing Methods
7.4.1. Connector Routing Methods
7.4.2. Primary/Replica Selection
7.4.3. Connector Load Balancers
7.4.4. Specifying Required Latency
7.4.4.1. Applied and Relative Latency Comparison
7.4.4.2. Advanced Troubleshooting for Latency-based Routing
7.4.5. Setting Read Affinity and Direct Reads
7.4.6. Setting Read/Write Affinity in Composite Active/Active Environments
7.4.7. Setting Negative Affinity
7.4.8. Routing using embedded syntax in connect string
7.4.9. Connector Datasource Selection in Composite Clusters
7.4.10. Smartscale Routing
7.4.10.1. Specifying the Session ID
7.4.10.2. Enabling SmartScale Routing
7.4.10.3. Disabling SmartScale Routing
7.4.10.4. SmartScale Exploit
7.4.11. Direct Routing
7.4.11.1. Enabling Direct Routing
7.4.11.2. Limitations of Direct Routing
7.4.12. SQL Routing
7.4.12.1. Enabling SQL Routing
7.4.12.2. Limitations of SQL Routing
7.4.13. Host-based Routing
7.4.13.1. Enabling Host-based Routing
7.4.13.2. Limitations of Host-based Routing
7.4.14. Port-based Routing
7.4.14.1. Enabling Port-based Routing
7.4.15. Read-only Routing
7.5. Using Bridge Mode
7.5.1. Enabling Bridge Mode
7.6. User Authentication
7.6.1. user.map File Format
7.6.2. Dual-Paswords and MySQL 8
7.6.3. user.map Direct Routing
7.6.4. user.map Host Options
7.6.5. user.map Updates
7.6.6. Generating user.map Entries from a Script
7.6.7. Encrypting user.map Data
7.6.8. Synchronizing user.map Data
7.6.9. user.map Limitations
7.6.10. Host-based Authentication
7.7. Testing Connectivity Via the Connector
7.7.1. Testing Connectivity in Bridge Mode
7.7.2. Testing Connectivity in Proxy Mode with No R/W Splitting Enabled
7.7.3. Testing Connectivity in Proxy Mode with R/W Splitting Enabled (SmartScale or @direct)
7.8. Connector Operational States
7.8.1. Connections During Automatic Failure/Failover
7.8.2. Connections During Manual Switch
7.8.3. Connections During Connection Failures
7.8.4. Other Errors
7.8.5. Connector Keepalive
7.8.6. Connector Change User as Ping
7.9. Connector/Manager Interface
7.10. Connector Command-Line Interface
7.11. Connector Inline Command Interface
7.11.1. Connector tungsten cluster status Command
7.11.1.1. Connector connector cluster status on the Command-line
7.11.2. Connector tungsten connection count Command
7.11.3. Connector tungsten connection status Command
7.11.4. Connector tungsten flush privileges Command
7.11.5. Connector tungsten gc Command
7.11.6. Connector tungsten help Command
7.11.7. Connector tungsten mem info Command
7.11.8. Connector tungsten show [full] processlist Command
7.11.9. Connector show slave status Command
7.11.10. Connector tungsten show variables Command
7.12. Advanced Configuration
7.12.1. Working with Proxy Protocol v1
7.12.2. Using Multiple Dataservices
7.12.3. Advanced Listeners
7.12.4. Connector Automatic Reconnect
7.12.5. Using the Connector with HA Proxy
7.12.5.1. Configuring HA Proxy using the native MySQL Check
7.12.5.2. Configuring HA Proxy with a Check Script
7.12.6. Using Fall-Back Bridge Mode
7.12.6.1. Using Fall-Back SSL To Bridge Mode
7.12.7. Using the Max Connections Feature
7.12.8. Adjusting the Client Disconnect Delay During Manual Switch
7.12.9. Adjusting the Bridge Mode Forced Client Disconnect Timeout
7.12.10. Adjusting the Connnector Response to Resource Losses
7.12.10.1. Adjusting the Connnector Response to Datasource Loss
7.12.10.2. Adjusting the Connnector Response to Manager Loss
7.12.11. Connector Logging Configuration
7.12.11.1. Connector Logging to Syslog
7.12.12. Connector Audit Logging
7.12.13. Connector SSL Advertisement Configuration
7.12.14. Connector IP Address Configuration
7.12.15. Deploying a Connector through Docker
7.13. Connector General Limitations
8. Tungsten Manager
8.1. Tungsten Manager Introduction
8.1.1. How Does Failover Work?
8.1.1.1. Roles for Nodes and Clusters
8.1.1.2. Moving the Primary Role to Another Node or Cluster
8.1.2. Best Practices for Proper Cluster Failovers
8.1.3. Command-Line Monitoring Tools
8.2. Tungsten Manager Failover Behavior
8.2.1. Failover Replication State Scenarios
8.2.2. Recovery Behavior After Failover
8.2.3. Failover Response when MySQL Server Fails
8.2.4. Failover Response when Replica Applier is Latent
8.3. Tungsten Manager Failover Tuning
8.4. Tungsten Manager Failover Internals
8.4.1. Manual Switch Versus Automatic Failover
8.4.2. Switch and Failover Steps for Local Clusters
8.4.3. Switch and Failover Steps for Composite Services
8.5. Tungsten Manager Fault Detection, Fencing and Recovery
8.5.1. Tungsten Manager Definitions
8.5.2. Cluster Monitoring and Notification Events
8.5.3. Rule Organization - Detection, Investigation, Fencing, Recovery
8.6. Cluster State Savepoints
8.7. Adjusting JVM Settings for the Manager
9. Command-line Tools
9.1. The cctrl Command
9.1.1. cctrl Command-line Options
9.1.2. cctrl Modes
9.1.3. cctrl Commands
9.1.3.1. cctrl admin Command
9.1.3.2. cctrl cd Command
9.1.3.3. cctrl cluster Command
9.1.3.4. cctrl create composite Command
9.1.3.5. cctrl datasource Command
9.1.3.6. cctrl expert Command
9.1.3.7. cctrl failover Command
9.1.3.8. cctrl help Command
9.1.3.9. cctrl ls Command
9.1.3.10. cctrl members Command
9.1.3.11. cctrl physical Command
9.1.3.12. cctrl ping Command
9.1.3.13. cctrl quit Command
9.1.3.14. cctrl recover Command
9.1.3.15. cctrl recover master using Command
9.1.3.16. cctrl recover relay using Command
9.1.3.17. cctrl recover using Command
9.1.3.18. cctrl replicator Command
9.1.3.19. cctrl rm Command
9.1.3.20. cctrl router Command
9.1.3.21. cctrl service Command
9.1.3.22. cctrl set Command
9.1.3.23. cctrl show topology Command
9.1.3.24. cctrl set master Command
9.1.3.25. cctrl switch Command
9.2. The check_tungsten_latency Command
9.3. The check_tungsten_online Command
9.4. The check_tungsten_policy Command
9.5. The check_tungsten_progress Command
9.6. The check_tungsten_services Command
9.7. The clean_release_directory Command
9.8. The cluster_backup Command
9.9. The connector Command
9.10. The ddlscan Command
9.10.1. Optional Arguments
9.10.2. Supported Templates and Usage
9.10.2.1. ddl-check-pkeys.vm
9.10.2.2. ddl-mysql-hive-0.10.vm
9.10.2.3. ddl-mysql-hive-0.10-staging.vm
9.10.2.4. ddl-mysql-hive-metadata.vm
9.10.2.5. ddl-mysql-oracle.vm
9.10.2.6. ddl-mysql-oracle-cdc.vm
9.10.2.7. ddl-mysql-redshift.vm
9.10.2.8. ddl-mysql-redshift-staging.vm
9.10.2.9. ddl-mysql-vertica.vm
9.10.2.10. ddl-mysql-vertica-staging.vm
9.10.2.11. ddl-oracle-mysql.vm
9.10.2.12. ddl-oracle-mysql-pk-only.vm
9.11. The deployall Command
9.12. The dsctl Command
9.12.1. dsctl get Command
9.12.2. dsctl set Command
9.12.3. dsctl reset Command
9.12.4. dsctl help Command
9.13. env.sh Script
9.14. The load-reduce-check Tool
9.14.1. Generating Staging DDL
9.14.2. Generating Live DDL
9.14.3. Materializing a View
9.14.4. Generating Sqoop Load Commands
9.14.5. Generating Metadata
9.14.6. Compare Loaded Data
9.15. The manager Command
9.16. The materialize Command
9.17. The multi_trepctl Command
9.17.1. multi_trepctl Options
9.17.2. multi_trepctl Commands
9.17.2.1. multi_trepctl backups Command
9.17.2.2. multi_trepctl heartbeat Command
9.17.2.3. multi_trepctl masterof Command
9.17.2.4. multi_trepctl list Command
9.17.2.5. multi_trepctl run Command
9.18. The query Command
9.19. The replicator Command
9.20. The startall Command
9.21. The stopall Command
9.22. The tapi Command
9.23. The thl Command
9.23.1. thl Position Commands
9.23.2. thl dsctl Command
9.23.3. thl list Command
9.23.4. thl tail Command
9.23.5. thl index Command
9.23.6. thl purge Command
9.23.7. thl info Command
9.23.8. thl help Command
9.24. The trepctl Command
9.24.1. trepctl Options
9.24.2. trepctl Global Commands
9.24.2.1. trepctl kill Command
9.24.2.2. trepctl services Command
9.24.2.3. trepctl servicetable Command
9.24.2.4. trepctl thl Command
9.24.2.5. trepctl version Command
9.24.3. trepctl Service Commands
9.24.3.1. trepctl backup Command
9.24.3.2. trepctl capabilities Command
9.24.3.3. trepctl check Command
9.24.3.4. trepctl clear Command
9.24.3.5. trepctl clients Command
9.24.3.6. trepctl error Command
9.24.3.7. trepctl flush Command
9.24.3.8. trepctl heartbeat Command
9.24.3.9. trepctl load Command
9.24.3.10. trepctl offline Command
9.24.3.11. trepctl offline-deferred Command
9.24.3.12. trepctl online Command
9.24.3.13. trepctl pause Command
9.24.3.14. trepctl perf Command
9.24.3.15. trepctl properties Command
9.24.3.16. trepctl purge Command
9.24.3.17. trepctl qs Command
9.24.3.18. trepctl reset Command
9.24.3.19. trepctl restore Command
9.24.3.20. trepctl resume Command
9.24.3.21. trepctl setdynamic Command
9.24.3.22. trepctl setrole Command
9.24.3.23. trepctl shard Command
9.24.3.24. trepctl status Command
9.24.3.25. trepctl unload Command
9.24.3.26. trepctl wait Command
9.25. The tmonitor Command
9.26. The tpasswd Command
9.27. The tprovision Script
9.28. The tungsten_find_orphaned Command
9.28. Examples:
9.29. The tungsten_find_position Command
9.30. The tungsten_find_seqno Command
9.31. The tungsten_get_mysql_datadir Script
9.32. The tungsten_get_status Script
9.33. The tungsten_get_ports Script
9.34. The tungsten_health_check Script
9.35. The tungsten_merge_logs Script
9.36. The tungsten_monitor Script
9.37. The tungsten_mysql_ssl_setup Script
9.38. The tungsten_newrelic_event Command
9.39. The tungsten_nagios_backups Command
9.40. The tungsten_nagios_online Command
9.41. The tungsten_post_process Command
9.42. The tungsten_prep_upgrade Script
9.43. The tungsten_provision_thl Command
9.43.1. Provisioning from RDS
9.43.2. tungsten_provision_thl Reference
9.44. The tungsten_purge_thl Command
9.45. The tungsten_reset_manager Command
9.46. The tungsten_send_diag Script
9.47. The tungsten_skip_all Command
9.48. The tungsten_show_processlist Script
9.49. The tungsten_skip_seqno Script
9.50. The undeployall Command
9.51. The zabbix_tungsten_latency Command
9.52. The zabbix_tungsten_online Command
9.53. The zabbix_tungsten_progress Command
9.54. The zabbix_tungsten_services Command
10. The tpm Deployment Command
10.1. Comparing Staging and INI tpm Methods
10.2. Processing Installs and Upgrades
10.3. tpm Staging Configuration
10.3.1. Configuring default options for all services
10.3.2. Configuring a single service
10.3.3. Configuring a single host
10.3.4. Reviewing the current configuration
10.3.5. Installation
10.3.5.1. Installing a set of specific services
10.3.5.2. Installing a set of specific hosts
10.3.6. Upgrades from a Staging Directory
10.3.7. Configuration Changes from a Staging Directory
10.3.8. Converting from INI to Staging
10.4. tpm INI File Configuration
10.4.1. Creating an INI file
10.4.2. Installation with INI File
10.4.3. Upgrades with an INI File
10.4.4. Configuration Changes with an INI file
10.4.5. Converting from Staging to INI
10.4.6. Using the translatetoini.pl Script
10.5. tpm Commands
10.5.1. tpm ask Command
10.5.2. tpm cert Command
10.5.2. Introduction
10.5.2.1. tpm cert Usage
10.5.2.2. tpm cert {typeSpec}, Defined
10.5.2.3. {typeSpec} definitions
10.5.2.4. {passwordSpec} definitions
10.5.2.5. tpm cert: Getting Started - Basic Examples
10.5.2.6. tpm cert: Getting Started - Functional Database Cert Rotation Example
10.5.2.7. tpm cert: Getting Started - Conversion to Custom-Generated Security Files Example
10.5.2.8. tpm cert: Getting Started - Advanced Example
10.5.2.9. Using tpm cert add
10.5.2.10. Using tpm cert aliases
10.5.2.11. Using tpm cert ask
10.5.2.12. Using tpm cert backup
10.5.2.13. Using tpm cert cat
10.5.2.14. Using tpm cert changepass
10.5.2.15. Using tpm cert clean
10.5.2.16. Using tpm cert copy
10.5.2.17. Using tpm cert diff
10.5.2.18. Using tpm cert example
10.5.2.19. Using tpm cert info
10.5.2.20. Using tpm cert list
10.5.2.21. Using tpm cert gen
10.5.2.22. Using tpm cert remove
10.5.2.23. Using tpm cert rotate
10.5.2.24. Using tpm cert vi
10.5.3. tpm check Command
10.5.3.1. tpm check ini Command
10.5.3.2. tpm check ports Command
10.5.4. tpm configure Command
10.5.5. tpm connector Command
10.5.5.1. tpm connector --hosts Command
10.5.5.2. tpm connector --dataservice-name Command
10.5.5.3. tpm connector --samples Command
10.5.6. tpm copy Command
10.5.7. tpm delete-service Command
10.5.8. tpm diag Command
10.5.9. tpm fetch Command
10.5.10. tpm firewall Command
10.5.11. tpm find-seqno Command
10.5.12. tpm generate-haproxy-for-api Command
10.5.13. tpm help Command
10.5.14. tpm install Command
10.5.15. tpm keep Command
10.5.16. tpm mysql Command
10.5.17. tpm policy Command
10.5.18. tpm post-process Command
10.5.19. tpm promote-connector Command
10.5.20. tpm purge-thl Command
10.5.21. tpm query Command
10.5.21.1. tpm query config
10.5.21.2. tpm query dataservices
10.5.21.3. tpm query deployments
10.5.21.4. tpm query manifest
10.5.21.5. tpm query modified-files
10.5.21.6. tpm query staging
10.5.21.7. tpm query topology
10.5.21.8. tpm query usermap
10.5.21.9. tpm query version
10.5.22. tpm report Command
10.5.23. tpm reset Command
10.5.24. tpm reset-thl Command
10.5.25. tpm reverse Command
10.5.26. tpm uninstall Command
10.5.27. tpm update Command
10.5.28. tpm validate Command
10.5.29. tpm validate-update Command
10.6. tpm Common Options
10.7. tpm Validation Checks
10.8. tpm Configuration Options
10.8.1. A tpm Options
10.8.2. B tpm Options
10.8.3. C tpm Options
10.8.4. D tpm Options
10.8.5. E tpm Options
10.8.6. F tpm Options
10.8.7. H tpm Options
10.8.8. I tpm Options
10.8.9. J tpm Options
10.8.10. L tpm Options
10.8.11. M tpm Options
10.8.12. N tpm Options
10.8.13. O tpm Options
10.8.14. P tpm Options
10.8.15. R tpm Options
10.8.16. S tpm Options
10.8.17. T tpm Options
10.8.18. U tpm Options
10.8.19. W tpm Options
11. Tungsten REST API (APIv2)
11.1. Getting Started with Tungsten REST API
11.1.1. Configuring the API
11.1.1.1. Network Ports
11.1.1.2. User Management
11.1.1.3. SSL/Encryption
11.1.1.4. Enabling and Disabling the API
11.1.2. How to Access the API
11.1.2.1. CURL calls and Examples
11.1.2.2. tapi
11.1.2.3. External Tools
11.1.3. Data Structures
11.1.3.1. Generic Payloads
11.1.3.2. INPUT and OUTPUT payloads
11.1.3.3. TAPI Datastructures
11.2. Proxy (Connector) API Specifics
11.2.1. Proxy States
11.2.2. List and Set Proxy Configuration
11.2.3. User Map
11.2.4. Cluster Configuration Manipulation
11.2.4.1. Creating a Data Service
11.2.4.2. Adding a Data Source
11.2.4.3. Creating a Full Cluster Configuration
11.2.4.4. Data Source Changes, Switches and Failovers
11.2.4.5. Resetting the Cluster Configuration
11.3. Manager API Specifics
11.3.1. Manager Status
11.3.2. Cluster Topology
11.3.3. Service Status
11.3.4. Datasource Status
11.3.5. Cluster Control Commands
11.3.5.1. Ping a Datasource
11.3.5.2. Issue a Switch
11.4. Replicator API Specifics
11.4.1. Replicator Endpoints
11.4.1.1. services
11.4.1.2. status
11.4.1.3. version
11.4.1.4. offline/online
11.4.1.5. purge
11.4.1.6. reset
11.4.2. Service Endpoints
11.4.2.1. backupCapabilities
11.4.2.2. backups
11.4.2.3. backup / restore
11.4.2.4. setrole
11.4.3. Service THL Endpoints
11.4.3.1. compression / encryption
11.4.3.2. genkey
12. Replication Filters
12.1. Enabling/Disabling Filters
12.2. Enabling Additional Filters
12.3. Filter Status
12.4. Filter Reference
12.4.1. ansiquotes.js Filter
12.4.2. BidiRemoteSlave (BidiSlave) Filter
12.4.3. breadcrumbs.js Filter
12.4.4. CaseTransform Filter
12.4.5. ColumnName Filter
12.4.6. ConvertStringFromMySQL Filter
12.4.7. DatabaseTransform (dbtransform) Filter
12.4.8. dbrename.js Filter
12.4.9. dbselector.js Filter
12.4.10. dbupper.js Filter
12.4.11. dropcolumn.js Filter
12.4.12. dropcomments.js Filter
12.4.13. dropddl.js Filter
12.4.14. dropmetadata.js Filter
12.4.15. droprow.js Filter
12.4.16. dropstatementdata.js Filter
12.4.17. dropsqlmode.js Filter
12.4.18. dropxa.js Filter
12.4.19. Dummy Filter
12.4.20. EnumToString Filter
12.4.21. EventMetadata Filter
12.4.22. foreignkeychecks.js Filter
12.4.23. Heartbeat Filter
12.4.24. insertsonly.js Filter
12.4.25. Logging Filter
12.4.26. maskdata.js Filter
12.4.27. MySQLSessionSupport (mysqlsessions) Filter
12.4.28. mapcharset Filter
12.4.29. NetworkClient Filter
12.4.29.1. Network Client Configuration
12.4.29.2. Network Filter Protocol
12.4.29.3. Sample Network Client
12.4.30. nocreatedbifnotexists.js Filter
12.4.31. OptimizeUpdates Filter
12.4.32. PrimaryKey Filter
12.4.32.1. Setting Custom Primary Key Definitions
12.4.33. PrintEvent Filter
12.4.34. Rename Filter
12.4.34.1. Rename Filter Examples
12.4.35. Replicate Filter
12.4.36. ReplicateColumns Filter
12.4.37. Row Add Database Name Filter
12.4.38. Row Add Transaction Info Filter
12.4.39. SetToString Filter
12.4.40. Shard Filter
12.4.41. shardbyrules.js Filter
12.4.42. shardbyseqno.js Filter
12.4.43. shardbytable.js Filter
12.4.44. SkipEventByType Filter
12.4.45. TimeDelay (delay) Filter
12.4.46. TimeDelayMsFilter (delayInMS) Filter
12.4.47. tosingledb.js Filter
12.4.48. truncatetext.js Filter
12.4.49. zerodate2null.js Filter
12.5. Standard JSON Filter Configuration
12.5.1. Rule Handling and Processing
12.5.2. Schema, Table, and Column Selection
12.6. JavaScript Filters
12.6.1. Writing JavaScript Filters
12.6.1.1. Implementable Functions
12.6.1.2. Getting Configuration Parameters
12.6.1.3. Logging Information and Exceptions
12.6.1.4. Exposed Data Structures
12.6.2. Installing Custom JavaScript Filters
12.6.2.1. Step 1: Copy JavaScript files
12.6.2.2. Step 2: Create Template Files
12.6.2.3. Step 3: (Optional) Copy json files
12.6.2.4. Step 4: Update Configuration
13. Performance, Tuning and Testing
13.1. Block Commit
13.1.1. Monitoring Block Commit Status
13.2. Improving Network Performance
13.3. Tungsten Replicator Block Commit and Memory Usage
13.4. Connector Memory Management
13.5. Functional Testing
13.5.1. Manual and Automatic Failover
13.5.2. Backup and Restore
13.5.3. Connectivity
13.5.4. Performance and Other Tests
A. Release Notes
A.1. Tungsten Clustering 7.1.4 GA (1 Oct 2024)
A.2. Tungsten Clustering 7.1.3 GA (25 Jun 2024)
A.3. Tungsten Clustering 7.1.2 GA (3 Apr 2024)
A.4. Tungsten Clustering 7.1.1 GA (15 Dec 2023)
A.5. Tungsten Clustering 7.1.0 GA (16 Aug 2023)
B. Prerequisites
B.1. Staging Host Configuration
B.2. Host Configuration
B.2.1. Operating System Version Support
B.2.2. Creating the User Environment
B.2.3. Configuring Network and SSH Environment
B.2.3.1. Network Ports
B.2.3.2. SSH Configuration
B.2.3.3. Host Availability Checks
B.2.4. Directory Locations and Configuration
B.2.5. Configure Software
B.2.6. sudo Configuration
B.2.7. SELinux Configuration
B.3. MySQL Database Setup
B.3.1. MySQL Version Support
B.3.2. MySQL Configuration
B.3.3. MySQL Configuration for Active/Active Deployments
B.3.4. MySQL Configuration for Heterogeneous Deployments
B.3.5. MySQL User Configuration
B.3.6. MySQL Unprivileged Users
B.4. Prerequisite Checklist
C. Troubleshooting
C.1. Contacting Support
C.1.1. Support Request Procedure
C.1.2. Creating a Support Account
C.1.3. Open a Support Ticket
C.1.4. Open a Support Ticket via Email
C.1.5. Getting Updates for all Company Support Tickets
C.1.6. Support Severity Level Definitions
C.2. Support Tools
C.2.1. Generating Diagnostic Information
C.2.2. Generating Advanced Diagnostic Information
C.2.3. Using tungsten_upgrade_manager
C.3. Error/Cause/Solution
C.3.1. Lots of entries added to replicator log
C.3.2. Backup/Restore is not bringing my host back to normal
C.3.3. Services requires a reset
C.3.4. Error: could not settle on encryption_client algorithm
C.3.5. ERROR backup.BackupTask Backup operation failed: null
C.3.6. Unable to update the configuration of an installed directory
C.3.7. Cluster remains in MAINTENANCE mode after tpm update
C.3.8. Missing events, or events not extracted correctly
C.3.9. Triggers not firing correctly on Replica
C.3.10. Replicator reports an Out of Memory error
C.3.11. [S1000][unixODBC][MySQL][ODBC 5.3(w) Driver]SSL connection error: unknown error number [ISQL]ERROR: Could not SQLConnect
C.3.12. Latency is high: master:ONLINE, progress=41331580333, THL latency=78849.733
C.3.13. Connector shows errors with "java.net.SocketException: Broken pipe"
C.3.14. The Primary replicator stopped with a JDBC error.
C.3.15. cctrl reports MANAGER(state=STOPPED)
C.3.16. trepctl status hangs
C.3.17. Attempt to write new log record with equal or lower fragno: seqno=3 previous stored fragno=32767 attempted new fragno=-32768
C.3.18. Replicator runs out of memory
C.3.19. ERROR 1010 (HY000) at line 5094506: Error dropping database (can't rmdir './mysql-bin/', errno: 17)
C.3.20. ERROR >> host1 >> can't alloc thread
C.3.21. ERROR 1580 (HY000) at line 5093787: You cannot 'DROP' a log table if logging is enabled
C.3.22. WARNING: An illegal reflective access operation has occurred
C.3.23. ERROR 2013 (HY000) at line 583: Lost connection to MySQL server during query
C.3.24. pendingExceptionMessage": "Unable to update last commit seqno: Incorrect datetime value: '2016-03-13 02:02:26' for column 'update_timestamp' at row 1
C.3.25. Too many open processes or files
C.3.26. Replication latency very high
C.3.27. Backup agent name not found: xtrabackup-full
C.3.28. WARN [KeepAliveTimerTask] - Error while sending a KEEP_ALIVE query to connection.
C.3.29. Event application failed: seqno=20725782 fragno=0 message=java.sql.SQLDataException: Data too long for column 'eventid' at row 1
C.3.30. MySQL 8.0+, User Roles and Smartscale
C.3.31. element 'mysql_readonly' not found in path
C.3.32. cctrl hangs
C.3.33. Replicator fails to connect after updating password
C.3.34. There were issues configuring the sandbox MySQL server
C.3.35. Starting replication after performing a restore because of an invalid restart sequence number
C.3.36. MySQL is incorrectly configured
C.4. Known Issues
C.4.1. Triggers
C.5. Troubleshooting Timeouts
C.6. Troubleshooting Backups
C.7. Running Out of Diskspace
C.8. Troubleshooting SSH and tpm
C.9. Troubleshooting Data Differences
C.9.1. Identify Structural Differences
C.9.2. Identify Data Differences
C.10. Comparing Table Data
C.11. Troubleshooting Memory Usage
D. Files, Directories, and Environment
D.1. The Tungsten Cluster Install Directory
D.1.1. The backups Directory
D.1.1.1. Automatically Deleting Backup Files
D.1.1.2. Manually Deleting Backup Files
D.1.1.3. Copying Backup Files
D.1.1.4. Relocating Backup Storage
D.1.2. The releases Directory
D.1.3. The service_logs Directory
D.1.4. The share Directory
D.1.5. The thl Directory
D.1.5.1. Purging THL Log Information on a Replica
D.1.5.2. Purging THL Log Information on a Primary
D.1.5.3. Moving the THL File Location
D.1.5.4. Changing the THL Retention Times
D.1.6. The tungsten Directory
D.1.6.1. The tungsten-connector Directory
D.1.6.2. The tungsten-manager Directory
D.1.6.3. The tungsten-replicator Directory
D.2. Log Files
D.3. Environment Variables
E. Terminology Reference
E.1. Transaction History Log (THL)
E.1.1. THL Format
E.2. Generated Field Reference
E.2.1. Terminology: Fields masterConnectUri
E.2.2. Terminology: Fields masterListenUri
E.2.3. Terminology: Fields accessFailures
E.2.4. Terminology: Fields active
E.2.5. Terminology: Fields activeSeqno
E.2.6. Terminology: Fields appliedLastEventId
E.2.7. Terminology: Fields appliedLastSeqno
E.2.8. Terminology: Fields appliedLatency
E.2.9. Terminology: Fields applier.class
E.2.10. Terminology: Fields applier.name
E.2.11. Terminology: Fields applyTime
E.2.12. Terminology: Fields autoRecoveryEnabled
E.2.13. Terminology: Fields autoRecoveryTotal
E.2.14. Terminology: Fields averageBlockSize
E.2.15. Terminology: Fields blockCommitRowCount
E.2.16. Terminology: Fields cancelled
E.2.17. Terminology: Fields channel
E.2.18. Terminology: Fields channels
E.2.19. Terminology: Fields clusterName
E.2.20. Terminology: Fields commits
E.2.21. Terminology: Fields committedMinSeqno
E.2.22. Terminology: Fields criticalPartition
E.2.23. Terminology: Fields currentBlockSize
E.2.24. Terminology: Fields currentEventId
E.2.25. Terminology: Fields currentLastEventId
E.2.26. Terminology: Fields currentLastFragno
E.2.27. Terminology: Fields currentLastSeqno
E.2.28. Terminology: Fields currentTimeMillis
E.2.29. Terminology: Fields dataServerHost
E.2.30. Terminology: Fields discardCount
E.2.31. Terminology: Fields doChecksum
E.2.32. Terminology: Fields estimatedOfflineInterval
E.2.33. Terminology: Fields eventCount
E.2.34. Terminology: Fields extensions
E.2.35. Terminology: Fields extractTime
E.2.36. Terminology: Fields extractor.class
E.2.37. Terminology: Fields extractor.name
E.2.38. Terminology: Fields filter.#.class
E.2.39. Terminology: Fields filter.#.name
E.2.40. Terminology: Fields filterTime
E.2.41. Terminology: Fields flushIntervalMillis
E.2.42. Terminology: Fields fsyncOnFlush
E.2.43. Terminology: Fields headSeqno
E.2.44. Terminology: Fields intervalGuard
E.2.45. Terminology: Fields lastCommittedBlockSize
E.2.46. Terminology: Fields lastCommittedBlockTime
E.2.47. Terminology: Fields latestEpochNumber
E.2.48. Terminology: Fields logConnectionTimeout
E.2.49. Terminology: Fields logDir
E.2.50. Terminology: Fields logFileRetainMillis
E.2.51. Terminology: Fields logFileSize
E.2.52. Terminology: Fields maxChannel
E.2.53. Terminology: Fields maxDelayInterval
E.2.54. Terminology: Fields maxOfflineInterval
E.2.55. Terminology: Fields maxSize
E.2.56. Terminology: Fields maximumStoredSeqNo
E.2.57. Terminology: Fields minimumStoredSeqNo
E.2.58. Terminology: Fields name
E.2.59. Terminology: Fields offlineRequests
E.2.60. Terminology: Fields otherTime
E.2.61. Terminology: Fields pendingError
E.2.62. Terminology: Fields pendingErrorCode
E.2.63. Terminology: Fields pendingErrorEventId
E.2.64. Terminology: Fields pendingErrorSeqno
E.2.65. Terminology: Fields pendingExceptionMessage
E.2.66. Terminology: Fields pipelineSource
E.2.67. Terminology: Fields processedMinSeqno
E.2.68. Terminology: Fields queues
E.2.69. Terminology: Fields readOnly
E.2.70. Terminology: Fields relativeLatency
E.2.71. Terminology: Fields resourcePrecedence
E.2.72. Terminology: Fields rmiPort
E.2.73. Terminology: Fields role
E.2.74. Terminology: Fields seqnoType
E.2.75. Terminology: Fields serializationCount
E.2.76. Terminology: Fields serialized
E.2.77. Terminology: Fields serviceName
E.2.78. Terminology: Fields serviceType
E.2.79. Terminology: Fields shard_id
E.2.80. Terminology: Fields simpleServiceName
E.2.81. Terminology: Fields siteName
E.2.82. Terminology: Fields sourceId
E.2.83. Terminology: Fields stage
E.2.84. Terminology: Fields started
E.2.85. Terminology: Fields state
E.2.86. Terminology: Fields stopRequested
E.2.87. Terminology: Fields store.#
E.2.88. Terminology: Fields storeClass
E.2.89. Terminology: Fields syncInterval
E.2.90. Terminology: Fields taskCount
E.2.91. Terminology: Fields taskId
E.2.92. Terminology: Fields timeInCurrentEvent
E.2.93. Terminology: Fields timeInStateSeconds
E.2.94. Terminology: Fields timeoutMillis
E.2.95. Terminology: Fields totalAssignments
E.2.96. Terminology: Fields transitioningTo
E.2.97. Terminology: Fields uptimeSeconds
E.2.98. Terminology: Fields version
F. Internals
F.1. Extending Backup and Restore Behavior
F.1.1. Backup Behavior
F.1.2. Restore Behavior
F.1.3. Writing a Custom Backup/Restore Script
F.1.4. Enabling a Custom Backup Script
F.2. Character Sets in Database and Tungsten Cluster
F.3. Understanding Replication of Date/Time Values
F.3. Best Practices
F.4. Memory Tuning and Performance
F.4.1. Understanding Tungsten Replicator Memory Tuning
F.4.2. Connector Memory Management
F.5. Tungsten Replicator Pipelines and Stages
F.6. Tungsten Cluster Schemas
G. Frequently Asked Questions (FAQ)
G.1. General Questions
G.2. Cloud Deployment and Management
H. Ecosystem Support
H.1. Continuent Github Repositories
I. Configuration Property Reference

List of Figures

3.1. Topologies: Standalone HA Cluster
3.2. Topologies: Composite Active/Passive Cluster
3.3. Topologies: Multi-Site/Active-Active Clusters
3.4. Topologies: Composite Active/Active Clusters
3.5. Topologies: Composite Dynamic Active/Active Clusters
3.6. Topologies: Replicating into a Dataservice
3.7. Topologies: Replicating Data Out of a Cluster
3.8. Topologies: Replication from a Cluster to an Offboard Datawarehouse
3.9. Topologies: Replication from a Cluster to an Offboard Datawarehouse
3.10. Migration: Migrating Native Replication using a New Service
5.1. Security Internals: Cluster Communication Channels
6.1. Migration: Migrating Native Replication using a New Service
6.2. Cacti Monitoring: Example Graphs
7.1. Tungsten Connector Basic Architecture
7.2. Basic MySQL/Application Connectivity
7.3. Advanced MySQL/Application Connectivity
7.4. Using Tungsten Connector for MySQL/Application Connectivity
7.5. Tungsten Connector during a failed datasource
7.6. Tungsten Connector routing architecture
7.7. Tungsten Connector Bridge Mode Architecture
7.8. Tungsten Connector Authentication
8.1. Failover Scenario 1
8.2. Failover Scenario 2
8.3. Failover Scenario 3
10.1. tpm Staging Based Deployment
10.2. tpm INI Based Deployment
10.3. Internals: Cluster Communication Channels
12.1. Filters: Pipeline Stages on Extractors
12.2. Filters: Pipeline Stages on Appliers
B.1. Tungsten Deployment

List of Tables

1.1. Key Terminology
2.1. Key Terminology
2.2. Tungsten OS Support
2.3. MySQL/Tungsten Version Support
7.1. Routing Method Selection
7.2. Connector Command Line Sub-Commands
7.3. Inline Interface Commands
9.1. cctrl Command-line Options
9.2. cctrl Command-line Options
9.3. cctrl Command-line Options
9.4. cctrl Command-line Options
9.5. cctrl Command-line Options
9.6. cctrl Command-line Options
9.7. cctrl Command-line Options
9.8. cctrl Command-line Options
9.9. cctrl Command-line Options
9.10. cctrl Command-line Options
9.11. cctrl Command-line Options
9.12. cctrl Commands
9.13. cctrldatasource Commands
9.14. check_tungsten_latency Options
9.15. check_tungsten_online Options
9.16. check_tungsten_policy Options
9.17. check_tungsten_progress Options
9.18. check_tungsten_services Options
9.19. cluster_backup Command-line Options
9.20. connector Commands
9.21. ddlscan Command-line Options
9.22. ddlscan Supported Templates
9.23. dsctl Commands
9.24. dsctl Command-line Options
9.25. dsctl Command-line Options
9.26. dsctl Command-line Options
9.27. manager Commands
9.28. multi_trepctl Command-line Options
9.29. multi_trepctl--output Option
9.30. multi_trepctl Commands
9.31. query Common Options
9.32. replicator Commands
9.33. replicator Commands Options for condrestart
9.34. replicator Commands Options for console
9.35. replicator Commands Options for restart
9.36. replicator Commands Options for start
9.37. tapi Generic Options
9.38. tapi CURL-related Options
9.39. tapi Nagios/NRPE/Zabbix-related Options
9.40. tapi Admin-related Options
9.41. tapi Filter-related Options
9.42. tapi API-related Options
9.43. tapi Status-related Options
9.44. tapi Backup and Restore-related Options
9.45. thl Options
9.46. trepctl Command-line Options
9.47. trepctl Replicator Wide Commands
9.48. trepctl Service Commands
9.49. trepctl backup Command Options
9.50. trepctl clients Command Options
9.51. trepctl offline-deferred Command Options
9.52. trepctl online Command Options
9.53. trepctl pause Command Options
9.54. trepctl purge Command Options
9.55. trepctl reset Command Options
9.56. trepctl resume Command Options
9.57. trepctl setdynamic Command Options
9.58. trepctl setrole Command Options
9.59. trepctl shard Command Options
9.60. trepctl status Command Options
9.61. trepctl wait Command Options
9.62. tmonitor Common Options
9.63. tpasswd Common Options
9.64. tprovision Command-line Options
9.65. tungsten_find_orphaned Options
9.66. tungsten_find_position Options
9.67. tungsten_find_seqno Options
9.68. tungsten_get_mysql_datadir Command-line Options
9.69. tungsten_get_ports Options
9.70. tungsten_health_check Command-line Options
9.71. tungsten_merge_logs Command-line Options
9.72. tungsten_monitor Command-line Options
9.73. tungsten_monitor Command-line Options
9.74. tungsten_post_process Options
9.75. tungsten_prep_upgrade Command-line Options
9.76. tungsten_purge_thl Options
9.77. tungsten_reset_manager Options
9.78. tungsten_send_diag Command-line Options
9.79. tungsten_skip_all Options
9.80. tungsten_show_processlist Command-line Options
9.81. tungsten_skip_seqno Command-line Options
9.82. zabbix_tungsten_latency Options
9.83. zabbix_tungsten_online Options
9.84. zabbix_tungsten_progress Options
9.85. check_tungsten_services Options
10.1. TPM Deployment Methods
10.2. tpm Core Options
10.3. tpm Commands
10.4. tpm ask Common Options
10.5. tpm cert Read-Only Actions
10.6. tpm cert Write Actions
10.7. tpm cert Arguments
10.8. Convenience tags
10.9. typeSpecs for tpm cert ask
10.10. typeSpecs for tpm cert example
10.11. typeSpecs for tpm cert gen
10.12. typeSpecs for tpm cert vi
10.13. Options for tpm check
10.14. Options for tpm check ini
10.15. Options for tpm check ports
10.16. tpm copy Common Options
10.17. tpm delete-service Common Options
10.18. tpm find-seqno Options
10.19. tpm generate-haproxy-for-api Common Options
10.20. tpm keep Options
10.21. tpm policy Options
10.22. tpm post-process Options
10.23. tpm purge-thl Options
10.24. tpm report Common Options
10.25. tpm Common Options
10.26. tpm Validation Checks
10.27. tpm Configuration Options
C.1. tungsten_upgrade_manager Options
D.1. Continuent Tungsten Directory Structure
D.2. Continuent Tungsten tungsten Sub-Directory Structure
E.1. THL Event Format

Preface

This manual documents Tungsten Cluster 7.1 up to and including 7.1.4 build 10. Differences between minor versions are highlighted stating the explicit minor release version, such as 7.1.4.

For other versions and products, please use the appropriate manual.

1. Legal Notice

The trademarks, logos, and service marks in this Document are the property of Continuent or other third parties. You are not permitted to use these Marks without the prior written consent of Continuent or such appropriate third party. Continuent, Tungsten, uni/cluster, m/cluster, p/cluster, uc/connector, and the Continuent logo are trademarks or registered trademarks of Continuent in the United States, France, Finland and other countries.

All Materials on this Document are (and shall continue to be) owned exclusively by Continuent or other respective third party owners and are protected under applicable copyrights, patents, trademarks, trade dress and/or other proprietary rights. Under no circumstances will you acquire any ownership rights or other interest in any Materials by or through your access or use of the Materials. All right, title and interest not expressly granted is reserved to Continuent.

All rights reserved.

2. Conventions

This documentation uses a number of text and style conventions to indicate and differentiate between different types of information:

  • Text in this style is used to show an important element or piece of information. It may be used and combined with other text styles as appropriate to the context.

  • Text in this style is used to show a section heading, table heading, or particularly important emphasis of some kind.

  • Program or configuration options are formatted using this style. Options are also automatically linked to their respective documentation page when this is known. For example, tpm and --hosts both link automatically to the corresponding reference page.

  • Parameters or information explicitly used to set values to commands or options is formatted using this style.

  • Option values, for example on the command-line are marked up using this format: --help. Where possible, all option values are directly linked to the reference information for that option.

  • Commands, including sub-commands to a command-line tool are formatted using Text in this style. Commands are also automatically linked to their respective documentation page when this is known. For example, tpm links automatically to the corresponding reference page.

  • Text in this style indicates literal or character sequence text used to show a specific value.

  • Filenames, directories or paths are shown like this /etc/passwd. Filenames and paths are automatically linked to the corresponding reference page if available.

Bulleted lists are used to show lists, or detailed information for a list of items. Where this information is optional, a magnifying glass symbol enables you to expand, or collapse, the detailed instructions.

Code listings are used to show sample programs, code, configuration files and other elements. These can include both user input and replaceable values:

shell> cd /opt/continuent/software
shell> ar zxvf tungsten-clustering-7.1.4-10.tar.gz

In the above example command-lines to be entered into a shell are prefixed using shell. This shell is typically sh, ksh, or bash on Linux and Unix platforms.

If commands are to be executed using administrator privileges, each line will be prefixed with root-shell, for example:

root-shell> vi /etc/passwd

To make the selection of text easier for copy/pasting, ignorable text, such as shell> are ignored during selection. This allows multi-line instructions to be copied without modification, for example:

mysql> create database test_selection;
mysql> drop database test_selection;

Lines prefixed with mysql> should be entered within the mysql command-line.

If a command-line or program listing entry contains lines that are two wide to be displayed within the documentation, they are marked using the » character:

the first line has been extended by using a »
    continuation line

They should be adjusted to be entered on a single line.

Text marked up with this style is information that is entered by the user (as opposed to generated by the system). Text formatted using this style should be replaced with the appropriate file, version number or other variable information according to the operation being performed.

In the HTML versions of the manual, blocks or examples that can be userinput can be easily copied from the program listing. Where there are multiple entries or steps, use the 'Show copy-friendly text' link at the end of each section. This provides a copy of all the user-enterable text.

3. Quickstart Guide

Chapter 1. Introduction

Tungsten Clustering™ provides a suite of tools to aid the deployment of database clusters using MySQL. A Tungsten Cluster™ consists of three primary tools:

  • Tungsten Replicator

    Tungsten Replicator supports replication between different databases. Tungsten Replicator acts as a direct replacement for the native MySQL replication, in addition to supporting connectivity to Oracle, MongoDB, Vertica and others.

  • Tungsten Manager

    The Tungsten Manager is responsible for monitoring and managing a Tungsten Cluster dataservice. The manager has a number of control and supervisory roles for the operation of the cluster, and acts both as a control and a central information source for the status and health of the dataservice as a whole.

  • Tungsten Connector (or Tungsten Proxy)

    The Tungsten Connector is a service that sits between your application server and your MySQL database. The connector routes connections from your application servers to the datasources within the cluster, automatically distributing and redirecting queries to each datasource according to load balancing and availability requirements.

While there is no specific SLA because every customer’s environment is different, we strive to deliver a very low RTO and a very high RPO. For example, a cluster failover normally takes around 30 seconds depending on load, so the RTO is typically under 1 minute. Additionally, the RPO is 100%, since we keep copies of the database on Replica nodes, so that a failover happens with zero data loss under the vast majority of conditions.

Tungsten Cluster uses key terminology for different components in the system. These are used to distinguish specific elements of the overall system at the different levels of operations.

Table 1.1. Key Terminology

Continuent Term Traditional Term Description
composite dataservice Multi-Site Cluster A configured Tungsten Cluster service consisting of multiple dataservices, typically at different physical locations.
dataservice Cluster The collection of machines that make up a single Tungsten Dataservice. Individual hosts within the dataservice are called datasources. Each dataservice is identified by a unique name, and multiple dataservices can be managed from one server.
dataserver Database The database on a host.
datasource Host or Node One member of a dataservice and the associated Tungsten components.
staging host - The machine (and directory) from which Tungsten Cluster™ is installed and configured. The machine does not need to be the same as any of the existing hosts in the dataservice.
active witness - A machine in the dataservice that runs the manager process but is not running a database server. This server will be used to establish quorum in the event that a datasource becomes unavailable.
passive witness - A witness host is a host that can be contacted using the ping protocol to act as a network check for the other nodes of the cluster. Witness hosts should be on the same network and segment as the other nodes in the dataservice.
coordinator   The datasource or active witness in a dataservice that is responsible for making decisions on the state of the dataservice. The coordinator is usually the member that has been running the longest. It will not always be the Primary. When the manager process on the coordinator is stopped, or no longer available, a new coordinator will be chosen from the remaining members.

1.1. Tungsten Replicator

Tungsten Replicator is a high performance replication engine that works with a number of different source and target databases to provide high-performance and improved replication functionality over the native solution. With MySQL replication, for example, the enhanced functionality and information provided by Tungsten Replicator allows for global transaction IDs, advanced topology support such as Composite Active/Active, star, and fan-in, and enhanced latency identification.

In addition to providing enhanced functionality Tungsten Replicator is also capable of heterogeneous replication by enabling the replicated information to be transformed after it has been read from the data server to match the functionality or structure in the target server. This functionality allows for replication between MySQL and a variety of heterogeneous targets.

Understanding how Tungsten Replicator works requires looking at the overall replicator structure. There are three major components in the system that provide the core of the replication functionality:

  • Extractor

    The extractor component reads data from a MysQL data server and writes that information into the Transaction History Log (THL). The role of the extractor is to read the information from a suitable source of change information and write it into the THL in the native or defined format, either as SQL statements or row-based information.

    Information is always extracted from a source database and recorded within the THL in the form of a complete transaction. The full transaction information is recorded and logged against a single, unique, transaction ID used internally within the replicator to identify the data.

  • Applier

    Appliers within Tungsten Replicator convert the THL information and apply it to a destination data server. The role of the applier is to read the THL information and apply that to the data server.

    The applier works with a number of different target databases, and is responsible for writing the information to the database. Because the transactional data in the THL is stored either as SQL statements or row-based information, the applier has the flexibility to reformat the information to match the target data server. Row-based data can be reconstructed to match different database formats, for example, converting row-based information into an Oracle-specific table row, or a MongoDB document.

  • Transaction History Log (THL)

    The THL contains the information extracted from a data server. Information within the THL is divided up by transactions, either implied or explicit, based on the data extracted from the data server. The THL structure, format, and content provides a significant proportion of the functionality and operational flexibility within Tungsten Replicator.

    As the THL data is stored additional information, such as the metadata and options in place when the statement or row data was extracted are recorded. Each transaction is also recorded with an incremental global transaction ID. This ID enables individual transactions within the THL to be identified, for example to retrieve their content, or to determine whether different appliers within a replication topology have written a specific transaction to a data server.

These components will be examined in more detail as different aspects of the system are described with respect to the different systems, features, and functionality that each system provides.

From this basic overview and structure of Tungsten Replicator, the replicator allows for a number of different topologies and solutions that replicate information between different services. Straightforward replication topologies, such as Primary/Replica are easy to understand with the basic concepts described above. More complex topologies use the same core components. For example, Composite Active/Active topologies make use of the global transaction ID to prevent the same statement or row data being applied to a data server multiple times. Fan-in topologies allow the data from multiple data servers to be combined into one data server.

1.1.1. Transaction History Log (THL)

Tungsten Replicator operates by reading information from the source database and transferring that information to the Transaction History Log (THL).

Each transaction within the THL includes the SQL statement or the row-based data written to the database. The information also includes, where possible, transaction specific options and metadata, such as character set data, SQL modes and other information that may affect how the information is written when the data is applied. The combination of the metadata and the global transaction ID also enable more complex data replication scenarios to be supported, such as Composite Active/Active, without fear of duplicating statement or row data application because the source and global transaction ID can be compared.

In addition to all this information, the THL also includes a timestamp and a record of when the information was written into the database before the change was extracted. Using a combination of the global transaction ID and this timing information provides information on the latency and how up to date a dataserver is compared to the original datasource.

Depending on the underlying storage of the data, the information can be reformatted and applied to different data servers. When dealing with row-based data, this can be applied to a different type of data server, or completely reformatted and applied to non-table based services such as MongoDB.

THL information is stored for each replicator service, and can also be exchanged over the network between different replicator instances. This enables transaction data to be exchanged between different hosts within the same network or across wide-area-networks.

1.2. Tungsten Manager

The Tungsten Manager is responsible for monitoring and managing a Tungsten Cluster dataservice. The manager has a number of control and supervisory roles for the operation of the cluster, and acts both as a control and a central information source for the status and health of the dataservice as a whole.

Primarily, the Tungsten Manager handles the following tasks:

  • Monitors the replication status of each datasource (node) within the cluster.

  • Communicates and updates Tungsten Connector with information about the status of each datasource. In the event of a change of status, Tungsten Connectors are notified so that queries can be redirected accordingly.

  • Manages all the individual components of the system. Using the Java JMX system the manager is able to directly control the different components to change status, control the replication process, and

  • Checks to determine the availability of datasources by using either the Echo TCP/IP protocol on port 7 (default), or using the system ping protocol to determine whether a host is available. The configuration of the protocol to be used can be made by adjusting the manager properties. For more information, see Section B.2.3.3, “Host Availability Checks”.

  • Includes an advanced rules engine. The rule engine is used to respond to different events within the cluster and perform the necessary operations to keep the dataservice in optimal working state. During any change in status, whether user-selected or automatically triggered due to a failure, the rules are used to make decisions about whether to restart services, swap Primaries, or reconfigure connectors.

Please see the Tungsten Manager documentation section Chapter 8, Tungsten Manager for more information.

1.3. Tungsten Connector

The Tungsten Connector (or Tungsten Proxy) is a service that sits between your application server and your MySQL database. The connector routes connections from your application servers to the datasources within the cluster, automatically distributing and redirecting queries to each datasource according to load balancing and availability requirements.

The primary goal of Tungsten Connector is to effectively route and redirect queries between the Primary and Replica datasources within the cluster. Client applications talk to the connector, while the connector determines where the packets should really go, depending on the scaling and availability. Using a connector in this way effectively hides the complexities of the cluster size and configuration, allowing your cluster to grow and shrink without interrupting your client application connectivity. Client applications remain connected even though the number, configuration and orientation of the Replicas within the cluster may change.

During failover or system maintenance Tungsten Connector takes information from Tungsten Manager to determine which hosts are up and available, and redirects queries only to those servers that are online within the cluster.

For load balancing, Tungsten Connector supports a number of different solutions for redirecting queries to the different datasources within the network. Solutions are either based on explicit routing, or an implied or automatic read/write splitting mode where data is automatically distributed between Primary hosts (writes) and Replica hosts (reads).

Basic read/write splitting uses packet inspection to determine whether a query is a read operation (SELECT) or a write (INSERT, UPDATE, DELETE). The actual selection mechanism can be fine tuned using the different modes according to your application requirements.

The supported modes are:

  • Port Based Routing

    Port based routing employs a second port on the connector host. All connections to this port are sent to an available Replica.

  • Direct Reads

    Direct reads uses the read/write splitting model, but directs read queries to dedicated read-only connections on the Replica. No attempt is made to determine which host may have the most up to date version of the data. Connections are pooled between the connector and datasources, and this results in very fast execution.

  • SmartScale

    With SmartScale, data is automatically distributed among the datasources using read/write splitting. Where possible, the connector selects read queries by determining how up to date the Replica is, and using a specific session model to determine which host is up to date according to the session and replication status information. Session identification can be through predefined session types or user-defined session strings.

  • Host Based Routing

    Explicit host based routing uses different IP addresses on datasources to identify whether the operation should be directed to a Primary or a Replica. Each connector is configured with two IP addresses, connecting to one IP address triggers the connection to be routed to the current Primary, while connecting to the second IP routes queries to a Replica.

  • SQL Based Routing

    SQL based routing employs packet inspection to identify key strings within the query to determine where the packets should be routed.

These core read/write splitting modes can also be explicitly overridden at a user or host level to allow your application maximum flexibility.

Chapter 2. Deployment

Creating a Tungsten Clustering (for MySQL) Dataservice using Tungsten Cluster requires careful preparation and configuration of the required components. This section provides guidance on these core operations, preparation and information such as licensing and best practice that should be used for all installations.

2.1. Host Types

Before covering the basics of creating different dataservice types, there are some key terms that will be used throughout the setup and installation process that identify different components of the system. these are summarised in Table 2.1, “Key Terminology”.

Table 2.1. Key Terminology

Tungsten Term Traditional Term Description
composite dataservice Multi-Site Cluster A configured Tungsten Cluster service consisting of multiple dataservices, typically at different physical locations.
dataservice Cluster A configured Tungsten Cluster service consisting of dataservers, datasources and connectors.
dataserver Database The database on a host. Datasources include MySQL, PostgreSQL or Oracle.
datasource Host or Node One member of a dataservice and the associated Tungsten components.
staging host - The machine from which Tungsten Cluster is installed and configured. The machine does not need to be the same as any of the existing hosts in the cluster.
staging directory - The directory where the installation files are located and the installer is executed. Further configuration and updates must be performed from this directory.
connector - A connector is a routing service that provides management for connectivity between application services and the underlying dataserver.
Witness host - A witness host is a host that can be contacted using the ping protocol to act as a network check for the other nodes of the cluster. Witness hosts should be on the same network and segment as the other nodes in the dataservice.

2.1.1. Manager Hosts

The manager plays a key role within any dataservice, communicating between the replicator, connector and datasources to understand the current status, and controlling these components to handle failures, maintenance, and service availability.

The primary role of the manager is to monitor each of the services, identify problems, and react to those problems in the most effective way to keep the dataservice active. For example, in the case of a datasource failure, the datasource is temporarily removed from the cluster, the connector is updated to route queries to another available datasource, and the replication is disabled.

These decisions are driven by a rule-based system, which checks current status values, and performs different operations to achieve the correct result and return the dataservice to operational status.

In terms of control and management, the manager is capable of performing backup and restore information, automatically recovering from failure (including re-provisioning from backups), and is also able to individually control the configuration, service startup and shutdown, and overall control of the system.

Within a typical Tungsten Cluster deployment there are multiple managers and these keep in constant contact with each other, and the other services. When a failure occurs, multiple managers are involved in decisions. For example, if a host is no longer visible to one manager, it does not make the decision to disable the service on it's own; only when a majority of managers identify the same result is the decision made. For this reason, there should be an odd number of managers (to prevent deadlock), or managers can be augmented through the use of witness hosts.

One manager is automatically installed for each configured datasource; that is, in a three-node system with a Primary and two Replicas, three managers will be installed.

Checks to determine the availability of hosts are performed by using either the system ping protocol or the Echo TCP/IP protocol on port 7 to determine whether a host is available. The configuration of the protocol to be used can be made by adjusting the manager properties. For more information, see Section B.2.3.3, “Host Availability Checks”.

2.1.2. Connector (Router) Hosts

Connectors (known as routers within the dataservice) provide a routing mechanism between client applications and the dataservice. The Tungsten Connector component automatically routes database operations to the Primary or Replica, and takes account of the current cluster status as communicated to it by the Tungsten Manager. This functionality solves three primary issues that might normally need to be handled by the client application layer:

  • Datasource role redirection (i.e. Primary and Replica). This includes read/write splitting, and the ability to read data from a Replica that is up to date with a corresponding write.

  • Datasource failure (high-availability), including the ability to redirect client requests in the event of a failure or failover. This includes maintenance operations.

  • Dataservice topology changes, for example when expanding the number of datasources within a dataservice

The primary role of the connector is to act as the connection point for applications that can remain open and active, while simultaneously supporting connectivity to the datasources. This allows for changes to the topology and active role of individual datasources without interrupting the client application. Because the operation is through one or more static connectors, the application also does not need to be modified or changed when the number of datasources is expanded or altered.

Depending on the deployment environment and client application requirements, the connector can be installed either on the client application servers, the database servers, or independent hosts. For more information, see Section 7.3, “Clients and Deployment”.

Connectors can also be installed independently on specific hosts. The list of enabled connectors is defined by the --connectors option to tpm. A Tungsten Cluster dataservice can be installed with more connector servers than datasources or managers.

2.1.3. Replicator Hosts

Tungsten Replicator provides the core replication of information between datasources and, in composite deployment, between dataservices. The replicator operates by extracting data from the 'Primary' datasource (for example, using the MySQL binary log), and then applies the data to one or more target datasources.

Different deployments use different replicators and configurations, but in a typical Tungsten Cluster deployment a Primary/Replica or active/active deployment model is used. For Tungsten Cluster deployments there will be one replicator instance installed on each datasource host.

Within the dataservice, the manager controls each replicator service and it able to alter the replicator operation and role, for example by switching between Primary and Replica roles. The replicator also provides information to the manager about the latency of the replication operation, and uses this with the connectors to control client connectivity into the dataservice.

Replication within Tungsten Cluster is supported by Tungsten Replicator™ and this supports a wide range of additional deployment topologies, and heterogeneous deployments including MongoDB, Vertica, and Oracle. Replication to and from a dataservice are supported. For more information on replicating out of an existing dataservice, see:

Replicators are automatically configured according to the datasources and topology specified when the dataservice is created.

2.1.4. Active Witness Hosts

Tungsten Cluster operates through the rules built into the manager that make decisions about different configuration and status settings for all the services within the cluster. In the event of a communication failure within the system it is vital for the manager, in automatic policy mode, to perform a switch from a failed or unavailable Primary.

Within the network, the managers communicate with each other, in addition to the connectors and dataservers to determine their availability. The managers compare states and network connectivity. In the event of an issue, managers 'vote' on whether a failover or switch should occur.

The rules are designed to prevent unnecessary switches and failovers. Managers vote, and an odd number of managers helps to ensure that prevent split-brain scenarios when invalid failover decisions have been made.

  • Active Witness — an active witness is an instance of Tungsten Manager running on a host that is otherwise not part of the dataservice. An active witness has full voting rights within the managers and can therefore make informed decisions about the dataservice state in the event of a failure. Active witnesses can only be a member of one cluster at a time.

All managers are active witnesses, and active witnesses are the recommended solution for deployments where network availability is less certain (i.e. cloud environments), and where you have two-node deployments.

Tungsten Cluster Quorum Requirements

  • There should be at least three managers (including any active witnesses)

  • There should be, in total, an odd number of managers and witnesses, to prevent deadlocks.

  • If the dataservice contains only two hosts, at least one active witness must be installed.

These rules apply for all Tungsten Cluster installations and must be adhered to. Deployment will fail if these conditions are not met.

The rules for witness selection are as follows:

  • Active witnesses can be located beyond or across network segments, but all active witnesses must have clear communication channel to each other, and other managers. Difficulties in contacting other managers and services in the network could cause unwanted failover or shunning of datasources.

To enable active witnesses, the --enable-active-witnesses=true option must be specified and the hosts that will act as active witnesses must be added to the list of hosts provided to --members. This enables all specified witnesses to be enabled as active witnesses:

shell> ./tools/tpm install alpha --enable-active-witnesses=true \
      --witnesses=hostC \
       --members=hostA,hostB,hostC
  ...

2.2. Requirements

2.2.1. Operating Systems Support

The following Operating Systems are supported for installation of the various Tungsten components and are part of our regular QA testing processes. Other variants of Linux may work at your own risk, but use of them in production should be avoided and any issues that arise may not be supported; if in doubt we recommend that you contact Continuent Support for clarification. Windows/MAC OS is NOT supported, however appropriate Virtual Environments running any of the supported distributions listed would be supported, although only recommended for Development/Testing environments.

Virtual Environments running any of the supported distributions listed are supported, although only recommended for Development/Testing environments.

The list below also includes EOL dates published by the providers and should be taken into consideration when configuring your deployment

Table 2.2. Tungsten OS Support

Distribtion Published EOL Notes
Amazon Linux 2 30th June 2024
Amazon Linux 2023
CentOS 7 30th June 2024
Debian GNU/Linux 10 (Buster) June 2024
Debian GNU/Linux 11 (Bullseye) June 2026
Debian 12 June 2026
Oracle Linux 8.4 July 2029
Oracle Linux 9 July 2029 Version 7.1.3 onwards
RHEL 7 30th June 2024
RHEL 8.4.0 31st May 2029
RHEL 9
Rocky Linux 8 31st May 2029
Rocky Linux 9 31st May 2032
SUSE Linux Enterprise Server 15 21st June 2028
Ubuntu 20.04 LTS (Focal Fossa) April 2025
Ubuntu 22.04 LTS (Canonical) April 2027

2.2.2. Database Support

Unless stated, MySQL refers to the following variants:

  • MySQL Community Edition
  • MySQL Enterprise Edition
  • Percona MySQL

Version Support Matrix

Table 2.3. MySQL/Tungsten Version Support

Database MySQL Version Tungsten Version Notes
MySQL 5.7 All non-EOL Versions Full Support
MySQL 8.0.0-8.0.34 6.1.0-6.1.3 Supported, but does not support Partitioned Tables or the use of binlog-transaction-compression=ON introduced in 8.0.20
MySQL 8.0.0-8.0.34 6.1.4 onwards Fully Supported.
MariaDB 10.0, 10.1 All non-EOL Versions Full Support
MariaDB 10.2, 10.3 6.1.13-6.1.20 Partial Support. See note below.
MariaDB Up to, and including, 10.11 7.x Full Support

Known Issue affecting use of MySQL 8.0.21

In MySQL release 8.0.21 the behavior of CREATE TABLE ... AS SELECT ... has changed, resulting in the transactions being logged differenly in the binary log. This change in behavior will cause the replicators to fail.

Until a fix is implemented within the replicator, the workaround for this will be to split the action into separate CREATE TABLE ... followed by INSERT INTO ... SELECT FROM... statements.

If this is not possible, then you will need to manually create the table on all nodes, and then skip the resulting error in the replicator, allowing the subsequent loading of the data to continue.

MariaDB 10.3+ Support

Full support for MariaDB version 10.3 has been certified in v7.0.0 onwards of the Tungsten products.

Version 6.1.13 onwards of Tungsten will also work, however should you choose to deploy these versions, you do so at your own risk. There are a number of issues noted below that are all resolved from v7.0.0 onwards, therefore if you choose to use an earlier release, you should do so with the following limitations acknowledged:

  • tungsten_find_orphaned may fail, introducing the risk of data loss (Fixed in v6.1.13 onwards)

  • SSL from Tungsten Components TO the MariaDB is not supported.

  • Geometry data type is not supported.

  • Tungsten backup tools will fail as they rely on xtrabackup, which will not work with newer release of MariaDB.

  • tpm might fail to find correct mysql configuration file. (Fixed in 6.1.13 onwards)

  • MariaDB specific event types trigger lots of warnings in the replicator log file.

MySQL "Innovation" Releases

In 2023, Oracle announced a new MySQL version schema that introduced "Innovation" releases. From this point on, patch releases would only contain bug fixes and these would be, for example, the various 8.0.x releases, whereas new features would only be introduced in the "Innovation" releases, such as 8.1, 8.2 etc (Along with Bug Fixes)

"Innovation" releases will be released quartlery, and Oracle aim to make an LTS release every two years which will bundle all of the new features, behavior changes and bug fixes from all the previous "Innovation" releases.

Oracle do not advise the use of the "Innovation" releases in a production enviornment where a known behavior is expected to ensure system stability. We have chosen to follow this advice and as such we do not certify any release of Tungsten against "Innovation" releases for use in Production. We will naturally test against these releases in our QA environment so that we are able to certify and support the LTS release as soon as is practical. Any modifications needed to support an LTS release will not be backported to older Tungsten releases.

For more information on Oracles release policy, please read their blogpost here

2.2.3. RAM Requirements

RAM requirements are dependent on the workload being used and applied, but the following provide some guidance on the basic RAM requirements:

  • Tungsten Replicator requires 2GB of VM space for the Java execution, including the shared libraries, with approximate 1GB of Java VM heapspace. This can be adjusted as required, for example, to handle larger transactions or bigger commit blocks and large packets.

    Performance can be improved within the Tungsten Replicator if there is a 2-3GB available in the OS Page Cache. Replicators work best when pages written to replicator log files remain memory-resident for a period of time, so that there is no file system I/O required to read that data back within the replicator. This is the biggest potential point of contention between replicators and DBMS servers.

  • Tungsten Manager requires approximately 500MB of VM space for execution.

2.2.4. Disk Requirements

Disk space usage is based on the space used by the core application, the staging directory used for installation, and the space used for the THL files:

  • The staging directory containing the core installation is approximately 150MB. When performing a staging-directory based installation, this space requirement will be used once. When using a INI-file based deployment, this space will be required on each server. For more information on the different methods, see Section 10.1, “Comparing Staging and INI tpm Methods”.

  • Deployment of a live installation also requires approximately 150MB.

  • The THL files required for installation are based on the size of the binary logs generated by MySQL. THL size is typically twice the size of the binary log. This space will be required on each machine in the cluster. The retention times and rotation of THL data can be controlled, see Section D.1.5, “The thl Directory” for more information, including how to change the retention time and move files during operation.

A dedicated partition for THL and/or Tungsten Software is recommended to ensure that a full disk does not impact your OS or DBMS. Local disk, SAN, iSCSI and AWS EBS are suitable for storing THL. NFS is NOT recommended.

Because the replicator reads and writes information using buffered I/O in a serial fashion, there is no random-access or seeking.

2.2.5. Java Requirements

All components of Tungsten are certified with Java using the following versions:

  • Oracle JRE 8

  • Oracle JRE 11 (From release 6.1.2 only)

  • OpenJDK 8

  • OpenJDK 11 (From release 6.1.2 only)

  • Java 9, 10 and 13 have been tested and validated but certification and support will only cover Long Term releases.

Important

There are a number of known issues in earlier Java revisions that may cause performance degradation, high CPU, and/or component hangs, specifically when SSL is enabled. It is strongly advised that you ensure your Java version is one of the following MINIMUM releases:

  • Oracle JRE 8 Build 261
  • Oracle JRE 11 Build 8
  • OpenJDK 8 Build 222

All versions from 8u265, excluding version 13 onwards, contain a bug that can trigger unusually high CPU and/or system timeouts and hangs within the SSL protocol. To avoid this, you should add the following entry to the wrapper.conf file for all relevant components. This will be included by default from version 6.1.15 onwards of all Tungsten products.

wrapper.conf can be found in the following path {INSTALLDIR}/tungsten/tungsten-component/conf, for example: /opt/continuent/tungsten/tungsten-manager/conf

wrapper.java.additional.{next available number}=-Djdk.tls.acknowledgeCloseNotify=true

For example:

wrapper.java.additional.16=-Djdk.tls.acknowledgeCloseNotify=true

After editing the file, each component will need restarting

Important

If your original installation was performed with Java 8 installed, and you wish to upgrade to Java 11, you will need to issue tools/tpm update --replace-release on all nodes from within the software staging path.

This is to allow the components to detect the newer Java version and adjust to avoid calls to functions that were deprecated/renamed between version 8 and version 11.

2.2.6. Cloud Deployment Requirements

Cloud deployments require a different set of considerations over and above the general requirements. The following is a guide only, and where specific cloud environment requirements are known, they are explicitly included:

Instance Types/Configuration

Attribute Guidance Amazon Example
Instance Type Instance sizes and types are dependent on the workload, but larger instances are recommended for transactional databases. m4.xlarge or better
Instance Boot Volume Use block, not ephemeral storage. EBS
Instance Deployment Use standard Linux distributions and bases. For ease of deployment and configuration, the use of Ansible, Puppet or other script based solutions could be used. Amazon Linux AMIs

Development/QA nodes should always match the expected production environment.

AWS/EC2 Deployments

  • Use Virtual Private Cloud (VPC) deployments, as these provide consistent IP address support.

  • When using Active Witnesses, a micro instance can be used for a single cluster. For composite clusters, an instance size larger than micro must be used.

  • Multiple EBS-optimized volumes for data, using Provisioned IOPS for the EBS volumes depending on workload:

    Parameter tpm Option tpm Value MySQL my.cnf Option MySQL Value
    / (root)     
    MySQL Data datasource-mysql-data-directory /volumes/mysql/data datadir /volumes/mysql/data
    MySQL Binary Logs datasource-log-directory /volumes/mysql/binlogs log-bin /volumes/mysql/binlogs/mysql-bin
    Transaction History Logs (THL) thl-directory /volumes/mysql/thl   

Recommended Replication Formats

  • MIXED is recommended for MySQL Primary/Replica topologies (e.g., either single clusters or primary/data-recovery setups).

  • ROW is strongly recommended for Composite Active/Active setups. Without ROW, data drift is a possible problem when using MIXED or STATEMENT. Even with ROW there are still cases where drift is possible but the window is far smaller.

2.2.7. Docker Support Policy

2.2.7.1. Overview

Continuent has traditionally had a relaxed policy about Linux platform support for customers using our products.

While it is possible to install and run Continuent Tungsten products (i.e. Clustering/Replicator/etc.) inside Docker containers, there are many reasons why this is not a good idea.

2.2.7.2. Background

As background, every database node in a Tungsten Cluster runs at least three (3) layers or services:

  • MySQL Server (i.e. MySQL Community or Enterprise, MariaDB or Percona Server)

  • Tungsten Manager, which handles health-checking, signaling and failover decisions (Java-based)

  • Tungsten Replicator, which handles the movement of events from the MySQL Primary server binary logs to the Replica databases nodes (Java-based)

Optionally, a fourth service, the Tungsten Connector (Java-based), may be installed as well, and often is.

2.2.7.3. Current State

As such, this means that the Docker container would also need to support these 3 or 4 layers and all the resources needed to run them.

This is not what containers were designed to do. In a proper containerized architecture, each container would contain one single layer of the operation, so there would be 3-4 containers per “node”. This sort of architecture is best managed by some underlying technology like Swarm, Kubernetes, or Mesos.

More reasons to avoid using Docker containers with Continuent Tungsten solutions:

  • Our product is designed to run on a full Linux OS. By design Docker does not have a full init system like SystemD, SysV init, Upstart, etc… This means that if we have a process (Replicator, Manager, Connector, etc…) that process will run as PID 1. If this process dies the container will die. There are some solutions that let a Docker container to have a ‘full init’ system so the container can start more processes like ssh, replicator, manager, … all at once. However this is almost a heavyweight VM kind of behavior, and Docker wasn’t designed this way.

  • Requires a mutable container – to use Tungsten Clustering inside a Docker container, the Docker container must be launched as a mutable Linux instance, which is not the classic, nor proper way to use containers.

  • Our services are not designed as “serverless”. Serverless containers are totally stateless. Tungsten Cluster and Tungsten Replicator do not support this type of operation.

  • Until we make the necessary changes to our software, using Docker as a cluster node results in a minimum 1.2GB docker image.

  • Once Tungsten Cluster and Tungsten Replicator have been refactored using a microservices-based architecture, it will be much easier to scale our solution using containers.

  • A Docker container would need to allow for updates in order for the Tungsten Cluster and Tungsten Replicator software to be re-configured as needed. Otherwise, a new Docker container would need to be launched every time a config change was required.

  • There are known i/o and resource constraints for Docker containers, and therefore must be carefully deployed to avoid those pitfalls.

  • We test on CentOS-derived Linux platforms.

2.2.7.4. Summary

In closing, Continuent’s position on container support is as follows:

  • Unsupported at this time for all products (i.e. Clustering/Replicator/etc.)

  • Use at your own risk

2.3. Deployment Sources

Tungsten Cluster is available in a number of different distribution types, and the methods for configuration available for these different packages differs. See Section 10.1, “Comparing Staging and INI tpm Methods” for more information on the available installation methods.

Deployment Type/Package TAR/GZip RPM
Staging Installation Yes No
INI File Configuration Yes Yes
Deploy Entire Cluster Yes No
Deploy Per Machine Yes Yes

Two primary deployment sources are available:

All packages are named according to the product, version number, build release and extension. For example:

tungsten-clustering-7.1.4-10.tar.gz

The version number is 7.1.4 and build number 10. Build numbers indicate which build a particular release version is based on, and may be useful when installing patches provided by support.

2.3.1. Using the TAR/GZipped files

To use the TAR/GZipped packages, download the files to your machine and unpack them:

shell> cd /opt/continuent/software
shell> tar zxf tungsten-clustering-7.1.4-10.tar.gz

This will create a directory matching the downloaded package name, version, and build number from which you can perform an install using either the INI file or command-line configuration. To use, you will need to use the tpm command within the tools directory of the extracted package:

shell> cd tungsten-clustering-7.1.4-10

2.3.2. Using the RPM package files

The RPM packages can be used for installation, but are primarily designed to be in combination with the INI configuration file.

Installation

Installing the RPM package will do the following:

  1. Create the tungsten system user if it doesn't exist

  2. Make the tungsten system user part of the mysql group if it exists

  3. Create the /opt/continuent/software directory

  4. Unpack the software into /opt/continuent/software

  5. Define the $CONTINUENT_PROFILES and $REPLICATOR_PROFILES environment variables

  6. Update the profile script to include the /opt/continuent/share/env.sh script

  7. Create the /etc/tungsten directory

  8. Run tpm install if the /etc/tungsten.ini or /etc/tungsten/tungsten.ini file exists

Although the RPM packages complete a number of the pre-requisite steps required to configure your cluster, there are additional steps, such as configuring ssh, that you still need to complete. For more information, see Appendix B, Prerequisites.

By using the package files you are able to setup a new server by creating the /etc/tungsten.ini file and then installing the package. Any output from the tpm command will go to /opt/continuent/service_logs/rpm.output.

Note

If you download the package files directly, you may need to add the signing key to your environment before the package will load properly.

For yum platforms (RHEL/CentOS/Amazon Linux), the rpm command is used :

root-shell> rpm --import http://www.continuent.com/RPM-GPG-KEY-continuent

For Ubuntu/Debian platforms, the gpg command is used :

root-shell> gpg --keyserver keyserver.ubuntu.com --recv-key 7206c924

Upgrades

If you upgrade to a new version of the RPM package it will do the following:

  1. Unpack the software into /opt/continuent/software

  2. Run tpm update if the /etc/tungsten.ini or /etc/tungsten/tungsten.ini file exists

The tpm update will restart all Continuent Tungsten services so you do not need to do anything after upgrading the package file.

2.4. Common tpm Options During Deployment

There are a variety of tpm options that can be used to alter some aspect of the deployment during configuration. Although they might not be provided within the example deployments, they may be used or required for different installation environments. These include options such as altering the ports used by different components, or the commands and utilities used to monitor or manage the installation once deployment has been completed. Some of the most common options are included within this section.

Changes to the configuration should be made with tpm update. This continues the procedure of using tpm install during installation. See Section 10.5.27, “tpm update Command” for more information on using tpm update.

  • --datasource-systemctl-service

    On some platforms and environments the command used to manage and control the MySQL or MariaDB service is handled by a tool other than the services or /etc/init.d/mysql commands.

    Depending on the system or environment other commands using the same basic structure may be used. For example, within CentOS 7, the command is systemctl. You can explicitly set the command to be used by using the --datasource-systemctl-service to specify the name of the tool.

    The format of the corresponding command that will be used is expected to follow the same format as previous commands, for example to start the database service::

    shell> systemctl mysql stop

    Different commands must follow the same basic structure, the command configured by --datasource-systemctl-service, the servicename, and the status (i.e. stop).

2.5. Best Practices

A successful deployment depends on being mindful during deployment, operations and ongoing maintenance.

2.5.1. Best Practices: Deployment

  • Identify the best deployment method for your environment and use that in production and testing. See Section 10.1, “Comparing Staging and INI tpm Methods”.

  • Standardize the OS and database prerequisites. There are Ansible modules available for immediate use within AWS, or as a template for modifications.

    More information on the Ansible method is available in this blog article.

  • Ensure that the output of the `hostname` command and the nodename entries in the Tungsten configuration match exactly prior to installing Tungsten.

    The configuration keys that define nodenames are: --slaves, --dataservice-slaves, --members, --master, --dataservice-master-host, --masters and --relay

  • For security purposes you should ensure that you secure the following areas of your deployment:

  • Choose your topology from the deployment section and verify the configuration matches the basic settings. Additional settings may be included for custom features but the basics are needed to ensure proper operation. If your configuration is not listed or does not match our documented settings; we cannot guarantee correct operation.

  • If there are an even number of database servers in the cluster, configure the cluster with a witness host. An active witness is preferred but a passive one will ensure stability. See Section 2.1.4, “Active Witness Hosts” for an explanation of the differences and how to configure them.

  • If you are using ROW replication, any triggers that run additional INSERT/UPDATE/DELETE operations must be updated so they do not run on the Replica servers.

  • Make sure you know the structure of the Tungsten Cluster home directory and how to initialize your environment for administration. See Section 6.1, “The Home Directory” and Section 6.2, “Establishing the Shell Environment”.

  • Prior to migrating applications to Tungsten Cluster test failover and recovery procedures from Chapter 6, Operations Guide. Be sure to try recovering a failed Primary and reprovisioning failed Replicas.

  • When deciding on the Service Name for your configurations, keep them simple and short and only use alphanumerics (Aa-Zz,0-9) and underscores (_).

2.5.2. Best Practices: Upgrade

In this section we identify the best practices for performing a Tungsten Software upgrade.

  • Identify the deployment method chosen for your environment, Staging or INI. See Section 10.1, “Comparing Staging and INI tpm Methods”.

  • The best practice for Tungsten software is to upgrade All-at-Once, performing zero Primary switches.

  • The Staging deployment method automatically does an All-at-Once upgrade - this is the basic design of the Staging method.

  • For an INI upgrade, there are two possible ways, One-at-a-Time (with at least one Primary switch), and All-at-Once (no switches at all).

  • See Section 10.4.3, “Upgrades with an INI File” for more information.

  • Here is the sequence of events for a proper Tungsten upgrade on a 3-node cluster with the INI deployment method:

    • Login to the Customer Downloads Portal and get the latest version of the software.

    • Copy the file (i.e. tungsten-clustering-7.0.2-161.tar.gz) to each host that runs a Tungsten component.

    • Set the cluster to policy MAINTENANCE

    • On every host:

      • Extract the tarball under /opt/continuent/software/ (i.e. create /opt/continuent/software/tungsten-clustering-7.0.2-161)

      • cd to the newly extracted directory

      • Run the Tungsten Package Manager tool, tools/tpm update --replace-release

    • For example, here are the steps in order:

      On ONE database node:
      shell> cctrl
      cctrl> set policy maintenance
      cctrl> exit
      
      On EVERY Tungsten host at the same time:
      shell> cd /opt/continuent/software
      shell> tar xvzf tungsten-clustering-7.0.2-161.tar.gz
      shell> cd tungsten-clustering-7.0.2-161
      
      To perform the upgrade and restart the Connectors gracefully at the same time:
      shell> tools/tpm update --replace-release
      
      To perform the upgrade and delay the restart of the Connectors to a later time:
      shell> tools/tpm update --replace-release --no-connectors
      When it is time for the Connector to be promoted to the new version, perhaps after taking it out of the load balancer:
      shell> tpm promote-connector
      
      When all nodes are done, on ONE database node:
      shell> cctrl
      cctrl> set policy automatic
      cctrl> exit

WHY is it ok to upgrade and restart everything all at once?

Let’s look at each component to examine what happens during the upgrade, starting with the Manager layer.

Once the cluster is in Maintenance mode, the Managers cease to make changes to the cluster, and therefore Connectors will not reroute traffic either.

Since Manager control of the cluster is passive in Maintenance mode, it is safe to stop and restart all Managers - there will be zero impact to the cluster operations.

The Replicators function independently of client MySQL requests (which come through the Connectors and go to the MySQL database server), so even if the Replicators are stopped and restarted, there should be only a small window of delay while the replicas catch up with the Primary once upgraded. If the Connectors are reading from the Replicas, they may briefly get stale data if not using SmartScale.

Finally, when the Connectors are upgraded they must be restarted so the new version can take over. As discussed in this blog post, Zero-Downtime Upgrades, the Tungsten Cluster software upgrade process will do two key things to help keep traffic flowing during the Connector upgrade promote step:

  • Execute `connector graceful-stop 30` to gracefully drain existing connections and prevent new connections.

  • Using the new software version, initiate the start/retry feature which launches a new connector process while another one is still bound to the server socket. The new Connector process will wait for the socket to become available by retrying binding every 200ms by default (which is tunable), drastically reducing the window for application connection failures.

2.5.3. Best Practices: Operations

2.5.4. Best Practices: Maintenance

  • Your license allows for a testing cluster. Deploy a cluster that matches your production cluster and test all operations and maintenance operations there.

  • Schedule regular tests for local and DR failover. This should at least include switching the Primary server to another host in the local cluster. If possible, the DR cluster should be tested once per quarter.

  • Disable any automatic operating system patching processes. The use of automatic patching will cause issues when all database servers automatically restart without coordination. See Section 6.15.3, “Performing Maintenance on an Entire Dataservice”.

  • Regularly check for maintenance releases and upgrade your environment. Every version includes stability and usability fixes to ease the administrative process.

Chapter 3. Deployment: MySQL Topologies

Table of Contents

3.1. Deploying Standalone HA Clusters
3.1.1. Prepare: Standalone HA Cluster
3.1.2. Install: Standalone HA Cluster
3.1.3. Best Practices: Standalone HA Cluster
3.2. Deploying Composite Active/Passive Clustering
3.2.1. Prepare: Composite Active/Passive Cluster
3.2.2. Install: Composite Active/Passive Cluster
3.2.3. Best Practices: Composite Active/Passive Cluster
3.2.4. Adding a remote Composite Cluster
3.3. Deploying Multi-Site/Active-Active Clustering
3.3.1. Prepare: Multi-Site/Active-Active Clusters
3.3.2. Install: Multi-Site/Active-Active Clusters
3.3.3. Best Practices: Multi-Site/Active-Active Clusters
3.3.4. Configuring Startup on Boot
3.3.5. Resetting a single dataservice
3.3.6. Resetting all dataservices
3.3.7. Provisioning during live operations
3.3.8. Adding a new Cluster/Dataservice
3.3.9. Enabling SSL for Replicators Only
3.3.10. Dataserver maintenance
3.3.10.1. Fixing Replication Errors
3.4. Deploying Composite Active/Active Clusters
3.4.1. Prepare: Composite Active/Active Clusters
3.4.2. Install: Composite Active/Active Clusters
3.4.3. Best Practices: Composite Active/Active Clusters
3.4.4. Configuring Startup on Boot
3.4.5. Resetting a single dataservice
3.4.6. Resetting all dataservices
3.4.7. Dataserver maintenance
3.4.7.1. Fixing Replication Errors
3.4.8. Adding a Cluster to a Composite Active/Active Topology
3.4.8.1. Pre-Requisites
3.4.8.2. Backup and Restore
3.4.8.3. Update Existing Configuration
3.4.8.4. New Host Configuration
3.4.8.5. Install on new nodes
3.4.8.6. Update existing nodes
3.4.8.7. Start the new cluster
3.4.8.8. Validate and check
3.5. Deploying Composite Dynamic Active/Active
3.5.1. Enabling Composite Dynamic Active/Active
3.6. Deploying Tungsten Connector Only
3.7. Deploying Additional Datasources, Managers, or Connectors
3.7.1. Adding Datasources to an Existing Deployment
3.7.2. Adding Active Witnesses to an Existing Deployment
3.7.3. Replacing an Active Witness as a Full Cluster Node
3.7.4. Replacing a Full Cluster Node as an Active Witness
3.7.5. Adding Connectors to an Existing Deployment
3.7.6. Converting from a single cluster to a composite cluster
3.7.6.1. Convert and add new nodes as a new service
3.7.6.2. Convert and move nodes to a new service
3.8. Replicating Data Into an Existing Dataservice
3.9. Replicating Data Out of a Cluster
3.9.1. Prepare: Replicating Data Out of a Cluster
3.9.2. Deploy: Replicating Data Out of a Cluster
3.10. Replicating from a Cluster to a Datawarehouse
3.10.1. Replicating from a Cluster to a Datawarehouse - Prerequisites
3.10.2. Replicating from a Cluster to a Datawarehouse - Configuring the Cluster Nodes
3.10.3. Replicating from a Cluster to a Datawarehouse - Configuring the Cluster-Extractor
3.10.3.1. Replicating Data from a Cluster to a Datawarehouse (Staging Use Case)
3.10.3.2. Replicating Data from a Cluster to a Datawarehouse (INI Use Case)
3.11. Migrating and Seeding Data
3.11.1. Migrating from MySQL Native Replication 'In-Place'
3.11.2. Migrating from MySQL Native Replication Using a New Service
3.11.3. Seeding Data through MySQL

Creating a Tungsten Clustering (for MySQL) Dataservice using Tungsten Cluster combines a number of different components, systems, and functionality, to support a running database dataservice that is capable of handling database failures, complex replication topologies, and management of the client/database connection for both load balancing and failover scenarios.

How you choose to deploy depends on your requirements and environment. All deployments operate through the tpm command. tpm operates in two different modes:

  • tpm staging configuration — a tpm configuration is created by defining the command-line arguments that define the deployment type, structure and any additional parameters. tpm then installs all the software on all the required hosts by using ssh to distribute Tungsten Cluster and the configuration, and optionally automatically starts the services on each host. tpm manages the entire deployment, configuration and upgrade procedure.

  • tpm INI configuration — tpm uses an INI to configure the service on the local host. The INI file must be create on each host that will be part of the cluster. tpm only manages the services on the local host; in a multi-host deployment, upgrades, updates, and configuration must be handled separately on each host.

The following sections provide guidance and instructions for creating a number of different deployment scenarios using Tungsten Cluster.

3.1. Deploying Standalone HA Clusters

Within a Primary/Replica service, there is a single Primary which replicates data to the Replicas. The Tungsten Connector handles connectivity by the application and distributes the load to the datasources in the dataservice.

Figure 3.1. Topologies: Standalone HA Cluster

Topologies: Standalone HA

3.1.1. Prepare: Standalone HA Cluster

Before continuing with deployment you will need the following:

  1. The name to use for the cluster.

  2. The list of datasources in the cluster. These are the servers which will be running MySQL.

  3. The list of servers that will run the connector.

  4. The username and password of the MySQL replication user.

  5. The username and password of the first application user. You may add more users after installation.

All servers must be prepared with the proper prerequisites. See Appendix B, Prerequisites for additional details.

3.1.2. Install: Standalone HA Cluster

  1. Install the Tungsten Cluster package or download the Tungsten Cluster tarball, and unpack it:

    shell> cd /opt/continuent/software
    shell> tar zxf tungsten-clustering-7.1.4-10.tar.gz
  2. Change to the Tungsten Cluster directory:

    shell> cd tungsten-clustering-7.1.4-10
  3. Run tpm to perform the installation, using either the staging method or the INI method. Review Section 10.1, “Comparing Staging and INI tpm Methods” for more details on these two methods.

    Click the link below to switch examples between Staging and INI methods

    Show Staging

    Show INI

    shell> ./tools/tpm configure defaults \
        --reset \
        --user=tungsten \
        --install-directory=/opt/continuent \
        --profile-script=~/.bash_profile \
        --replication-user=tungsten \
        --replication-password=password \
        --replication-port=13306 \
        --application-user=app_user \
        --application-password=secret \
        --application-port=3306 \
        --rest-api-admin-user=apiuser \
        --rest-api-admin-pass=secret
    
    shell> ./tools/tpm configure alpha \
        --topology=clustered \
        --master=host1 \
        --members=host1,host2,host3 \
        --connectors=host4
    
    shell> vi /etc/tungsten/tungsten.ini
    [defaults]
    user=tungsten
    install-directory=/opt/continuent
    profile-script=~/.bash_profile
    replication-user=tungsten
    replication-password=password
    replication-port=13306
    application-user=app_user
    application-password=secret
    application-port=3306
    rest-api-admin-user=apiuser
    rest-api-admin-pass=secret
    
    [alpha]
    topology=clustered
    master=host1
    members=host1,host2,host3
    connectors=host4
    

    Configuration group defaults

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Configuration group alpha

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Note

    If you plan to make full use of the REST API (which is enabled by default) you will need to also configure a username and password for API Access. This must be done by specifying the following options in your configuration:

    rest-api-admin-user=tungsten
    rest-api-admin-pass=secret

    For more information on using and configuring the REST API, see Section 11.1, “Getting Started with Tungsten REST API”

    Run tpm to install the software with the configuration.

    shell > ./tools/tpm install

    During the startup and installation, tpm will notify you of any problems that need to be fixed before the service can be correctly installed and started. If the service starts correctly, you should see the configuration and current status of the service.

  4. Initialize your PATH and environment.

    shell > source /opt/continuent/share/env.sh

Important

Do not include start-and-report if you are taking over for MySQL native replication. See Section 3.11.1, “Migrating from MySQL Native Replication 'In-Place'” for next steps after completing installation.

3.1.3. Best Practices: Standalone HA Cluster

Follow the guidelines in Section 2.5, “Best Practices”.

3.2. Deploying Composite Active/Passive Clustering

Tungsten Cluster supports the creation of composite clusters. This includes multiple active/passive dataservices tied together. One of the dataservices is identified as the active, containing the Primary node and all other dataservices (passive) replicate from it.

Figure 3.2. Topologies: Composite Active/Passive Cluster

Topologies: Composite Active/Passive Cluster

3.2.1. Prepare: Composite Active/Passive Cluster

Before continuing with deployment you will need the following:

  1. The cluster name for each Active/Passive Cluster and a Composite cluster name to group them.

  2. The list of datasources in each cluster. These are the servers which will be running MySQL.

  3. The list of servers that will run the connector. Each connector will be associated with a preferred cluster but will have access to the Primary regardless of location.

  4. The username and password of the MySQL replication user.

  5. The username and password of the first application user. You may add more users after installation.

All servers must be prepared with the proper prerequisites. See Appendix B, Prerequisites for additional details.

3.2.2. Install: Composite Active/Passive Cluster

  1. Install the Tungsten Cluster package or download the Tungsten Cluster tarball, and unpack it:

    shell> cd /opt/continuent/software
    shell> tar zxf tungsten-clustering-7.1.4-10.tar.gz
  2. Change to the Tungsten Cluster directory:

    shell> cd tungsten-clustering-7.1.4-10
  3. Run tpm to perform the installation. This method assumes you are using the Section 10.3, “tpm Staging Configuration” method:

    Click the link below to switch examples between Staging and INI methods

    Show Staging

    Show INI

    shell> ./tools/tpm configure defaults \
        --reset \
        --user=tungsten \
        --install-directory=/opt/continuent \
        --profile-script=~/.bash_profile \
        --replication-user=tungsten \
        --replication-password=secret \
        --replication-port=13306 \
        --application-user=app_user \
        --application-password=secret \
        --application-port=3306 \
        --rest-api-admin-user=apiuser \
        --rest-api-admin-pass=secret
    
    shell> ./tools/tpm configure alpha \
        --topology=clustered \
        --master=host1.alpha \
        --members=host1.alpha,host2.alpha,host3.alpha \
        --connectors=host1.alpha,host2.alpha,host3.alpha
    
    shell> ./tools/tpm configure beta \
        --topology=clustered \
        --relay=host1.beta \
        --members=host1.beta,host2.beta,host3.beta \
        --connectors=host1.beta,host2.beta,host3.beta \
        --relay-source=alpha
    
    shell> ./tools/tpm configure gamma \
        --composite-datasources=alpha,beta
    
    shell> vi /etc/tungsten/tungsten.ini
    [defaults]
    user=tungsten
    install-directory=/opt/continuent
    profile-script=~/.bash_profile
    replication-user=tungsten
    replication-password=secret
    replication-port=13306
    application-user=app_user
    application-password=secret
    application-port=3306
    rest-api-admin-user=apiuser
    rest-api-admin-pass=secret
    
    [alpha]
    topology=clustered
    master=host1.alpha
    members=host1.alpha,host2.alpha,host3.alpha
    connectors=host1.alpha,host2.alpha,host3.alpha
    
    [beta]
    topology=clustered
    relay=host1.beta
    members=host1.beta,host2.beta,host3.beta
    connectors=host1.beta,host2.beta,host3.beta
    relay-source=alpha
    
    [gamma]
    composite-datasources=alpha,beta
    

    Configuration group defaults

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Configuration group alpha

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Configuration group beta

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Configuration group gamma

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Note

    If you plan to make full use of the REST API (which is enabled by default) you will need to also configure a username and password for API Access. This must be done by specifying the following options in your configuration:

    rest-api-admin-user=tungsten
    rest-api-admin-pass=secret

    Run tpm to install the software with the configuration.

    shell > ./tools/tpm install

    During the startup and installation, tpm will notify you of any problems that need to be fixed before the service can be correctly installed and started. If the service starts correctly, you should see the configuration and current status of the service.

  4. Initialize your PATH and environment.

    shell > source /opt/continuent/share/env.sh

The Composite Active/Passive Cluster should be installed and ready to use.

3.2.3. Best Practices: Composite Active/Passive Cluster

Follow the guidelines in Section 2.5, “Best Practices”.

3.2.4. Adding a remote Composite Cluster

Adding an entire new cluster provides significant level of availability and capacity. The new cluster nodes that form the cluster will be fully aware of the original cluster(s) and communicate with existing managers and datasources within the cluster.

The following steps guide you through updating the configuration to include the new hosts and services you are adding.

  1. On the new host(s), ensure the Appendix B, Prerequisites have been followed.

  2. Let's assume that we have a composite cluster dataservice called global with two clusters, east and west, with three nodes each.

    In this worked exmple, we show how to add an additional cluster service called north with three new nodes.

  3. Set the cluster to maintenance mode using cctrl:

    shell> cctrl
    [LOGICAL] / > use global
    [LOGICAL] /global > set policy maintenance
  4. Using the following as an example, update the configuration to include the new cluster and update the additional composite service block. If using an INI installation copy the ini file to all the new nodes in the new cluster.

    Click the link to switch between staging or ini examples

    Show Staging

    Show INI

    shell> tpm query staging
    tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
    The staging USER is tungsten
    
    shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
    The staging HOST is db1
    
    shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
    The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> ssh {STAGING_USER}@{STAGING_HOST}
    shell> cd {STAGING_DIRECTORY}
    shell> ./tools/tpm configure north \
        --connectors=db7,db8,db9 \
        --relay-source=east \
        --relay=db7 \
        --slaves=db8,db9 \
        --topology=clustered
    
    shell> ./tools/tpm configure global \
        --composite-datasources=east,west,north
    

    Run the tpm command to update the software with the Staging-based configuration:

    shell> ./tools/tpm update --no-connectors --replace-release

    For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

    shell> vi /etc/tungsten/tungsten.ini
    [north]
    ...
    connectors=db7,db8,db9
    relay-source=east
    relay=db7
    slaves=db8,db9
    topology=clustered
    
    [global]
    ...
    composite-datasources=east,west,north
    

    Run the tpm command to update the software with the INI-based configuration:

    shell> tpm query staging
    tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
    The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> cd {STAGING_DIRECTORY}
    
    shell> ./tools/tpm update --no-connectors --replace-release

    For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

  5. Using the --no-connectors option updates the current deployment without restarting the existing connectors.

  6. If installed via INI, on all nodes in the new cluster, download and unpack the software, and install

    shell> cd /opt/continunent/software
    shell> tar zxvf tungsten-clustering-7.1.4-10.tar.gz
    shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
    shell> tools/tpm install

  7. On every node in the original clusters, make sure all replicators are online:

    shell> trepctl online; trepctl services
  8. On all nodes in the new cluster start the software

    shell> startall
  9. The next steps will involve provisioning the new cluster nodes. An alternative approach to using this method would be to take a backup of a Replica from the existing cluster, and manually restoring it to ALL nodes in the new cluster PRIOR to issuing the install step above. If you take this approach then you can skip the next two re-provision steps.

  10. Go to the relay (Primary) node of the new cluster (i.e. db7) and provision it from any Replica in the original cluster (i.e. db2):

    shell> tungsten_provision_slave --source db2
  11. Go to a Replica node of the new cluster (i.e. db8) and provision it from the relay node of the new cluster (i.e. db7):

    shell> tungsten_provision_slave --source db7
  12. Repeat the process for the renamining Replicas nodes in the new cluster.

  13. Set the composite cluster to automatic mode using cctrl:

    shell> cctrl
    [LOGICAL] / > use global
    [LOGICAL] /global > set policy automatic
  14. During a period when it is safe to restart the connectors:

    shell> ./tools/tpm promote-connector

3.3. Deploying Multi-Site/Active-Active Clustering

Note

The procedures in this section are designed for the Multi-Site/Active-Active topology ONLY. Do NOT use these procedures for Composite Active/Active Clustering using v6 onwards.

For version 6.x onwards, Composite Active/Active Clustering, please refer to Section 3.4, “Deploying Composite Active/Active Clusters”

A Multi-Site/Active-Active topology provides all the benefits of a typical dataservice at a single location, but with the benefit of also replicating the information to another site. The underlying configuration within Tungsten Cluster uses the Tungsten Replicator which enables operation between the two sites.

The configuration is in two separate parts:

  • Tungsten Cluster dataservice that operates the main dataservice service within each site.

  • Tungsten Replicator dataservice that provides replication between the two sites; one to replicate from site1 to site2, and one for site2 to site1.

A sample display of how this operates is provided in Figure 3.3, “Topologies: Multi-Site/Active-Active Clusters”.

Figure 3.3. Topologies: Multi-Site/Active-Active Clusters

Topologies: Multi-Site/Active-Active Clusters

The service can be described as follows:

  • Tungsten Cluster Service: east

    Replicates data between east1, east2 and east3 (not shown).

  • Tungsten Cluster Service: west

    Replicates data between west1, west2 and west3 (not shown).

  • Tungsten Replicator Service: east

    Defines the replication of data within east as a replicator service using Tungsten Replicator. This service reads from all the hosts within the Tungsten Cluster service east and writes to west1, west2, and west3. The service name is the same to ensure that we do not duplicate writes from the clustered service already running.

    Data is read from the east Tungsten Cluster and replicated to the west Tungsten Cluster dataservice. The configuration allows for changes in the Tungsten Cluster dataservice (such as a switch or failover) without upsetting the site-to-site replication.

  • Tungsten Replicator Service: west

    Defines the replication of data within west as a replicator service using Tungsten Replicator. This service reads from all the hosts within the Tungsten Cluster service west and writes to east1, east2, and east3. The service name is the same to ensure that we do not duplicate writes from the clustered service already running.

    Data is read from the west Tungsten Cluster and replicated to the east Tungsten Cluster dataservice. The configuration allows for changes in the Tungsten Cluster dataservice (such as a switch or failover) without upsetting the site-to-site replication.

  • Tungsten Replicator Service: east_west

    Replicates data from East to West, using Tungsten Replicator. This is a service alias that defines the reading from the dataservice (as an trext;) to other servers within the destination cluster.

  • Tungsten Replicator Service: west_east

    Replicates data from West to East, using Tungsten Replicator. This is a service alias that defines the reading from the dataservice (as an trext;) to other servers within the destination cluster.

Requirements.  Recommended releases for Multi-Site/Active-Active deployments are Tungsten Cluster 5.4.x and Tungsten Replicator 5.4.x however this topology can also be installed with the later v6+ releases.

3.3.1. Prepare: Multi-Site/Active-Active Clusters

Some considerations must be taken into account for any active/active scenario:

  • For tables that use auto-increment, collisions are possible if two hosts select the same auto-increment number. You can reduce the effects by configuring each MySQL host with a different auto-increment settings, changing the offset and the increment values. For example, adding the following lines to your my.cnf file:

    auto-increment-offset = 1
    auto-increment-increment = 4

    In this way, the increments can be staggered on each machine and collisions are unlikely to occur.

  • Use row-based replication. Update your configuration file to explicitly use row-based replication by adding the following to your my.cnf file:

    binlog-format = row
  • Beware of triggers. Triggers can cause problems during replication because if they are applied on the Replica as well as the Primary you can get data corruption and invalid data. Tungsten Cluster cannot prevent triggers from executing on a Replica, and in an active/active topology there is no sensible way to disable triggers. Instead, check at the trigger level whether you are executing on a Primary or Replica. For more information, see Section C.4.1, “Triggers”.

3.3.2. Install: Multi-Site/Active-Active Clusters

Note

The procedures in this section are designed for the Multi-Site/Active-Active topology ONLY. Do NOT use these procedures for Composite Active/Active Clustering using v6 onwards.

For version 6.x onwards, Composite Active/Active Clustering, please refer to Section 3.4, “Deploying Composite Active/Active Clusters”

Creating the configuration requires two distinct steps, the first to create the two Tungsten Cluster deployments, and a second that creates the Tungsten Replicator configurations on different network ports, and different install directories.

  1. Install the Tungsten Cluster and Tungsten Replicator packages or download the tarballs, and unpack them:

    shell> cd /opt/continuent/software
    shell> tar zxf tungsten-clustering-7.1.4-10.tar.gz
    shell> tar zxf tungsten-replicator-7.1.4-10.tar.gz
  2. Change to the Tungsten Cluster directory:

    shell> cd tungsten-clustering-7.1.4-10
  3. Run tpm to configure the installation. This method assumes you are using the Section 10.3, “tpm Staging Configuration” method:

    Click the link below to switch examples between Staging and INI methods

    For ini install, the ini file contains all the configuration for both the cluster deployment and the replicator deployment.

    For a staging install, you first use the cluster configuration show below and then configure the replicator as a separate process. These additional steps are outlined below

    Show Staging

    Show INI

    shell> ./tools/tpm configure defaults \
        --reset \
        --user=tungsten \
        --install-directory=/opt/continuent \
        --replication-user=tungsten \
        --replication-password=secret \
        --replication-port=3306 \
        --profile-script=~/.bashrc \
        --application-user=app_user \
        --application-password=secret \
        --skip-validation-check=MySQLPermissionsCheck \
        --rest-api-admin-user=apiuser \
        --rest-api-admin-pass=secret
    
    
    shell> ./tools/tpm configure east \
        --topology=clustered \
        --connectors=east1,east2,east3 \
        --master=east1 \
        --members=east1,east2,east3
    
    shell> ./tools/tpm configure west \
        --topology=clustered \
        --connectors=west1,west2,west3 \
        --master=west1 \
        --members=west1,west2,west3
    
    shell> vi /etc/tungsten/tungsten.ini
    [defaults]
    user=tungsten
    install-directory=/opt/continuent
    replication-user=tungsten
    replication-password=secret
    replication-port=3306
    profile-script=~/.bashrc
    application-user=app_user
    application-password=secret
    skip-validation-check=MySQLPermissionsCheck
    rest-api-admin-user=apiuser
    rest-api-admin-pass=secret
    
    [defaults.replicator]
    home-directory=/opt/replicator
    rmi-port=10002
    executable-prefix=mm
    
    [east]
    topology=clustered
    connectors=east1,east2,east3
    master=east1
    members=east1,east2,east3
    
    [west]
    topology=clustered
    connectors=west1,west2,west3
    master=west1
    members=west1,west2,west3
    
    [east_west]
    topology=cluster-slave
    master-dataservice=east
    slave-dataservice=west
    thl-port=2113
    
    [west_east]
    topology=cluster-slave
    master-dataservice=west
    slave-dataservice=east
    thl-port=2115
    

    Configuration group defaults

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Configuration group defaults.replicator

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    • --home-directory=/opt/replicator

      home-directory=/opt/replicator

      Path to the directory where the active deployment will be installed. The configured directory will contain the software, THL and relay log information unless configured otherwise.

    • --rmi-port=10002

      rmi-port=10002

      Replication RMI listen port

    • --executable-prefix=mm

      executable-prefix=mm

      When enabled, the supplied prefix is added to each command alias that is generated for a given installation. This enables multiple installations to co-exist and and be accessible through a unique alias. For example, if the executable prefix is configured as east, then an alias for the installation to trepctl will be created as east_trepctl.

      Alias information for executable prefix data is stored within the $CONTINUENT_ROOT/share/aliases.sh file for each installation.

    Configuration group east

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Configuration group west

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Configuration group east_west

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Configuration group west_east

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Note

    If you plan to make full use of the REST API (which is enabled by default) you will need to also configure a username and password for API Access. This must be done by specifying the following options in your configuration:

    rest-api-admin-user=tungsten
    rest-api-admin-pass=secret

    Run tpm to install the software with the configuration.

    shell > ./tools/tpm install

    During the startup and installation, tpm will notify you of any problems that need to be fixed before the service can be correctly installed and started. If the service starts correctly, you should see the configuration and current status of the service.

  4. Change to the Tungsten Replicator directory:

    shell> cd tungsten-replicator-7.1.4-10
  5. Run tpm to configure the installation. This method assumes you are using the Section 10.3, “tpm Staging Configuration” method:

  6. If you are running a staging install, first configure the replicator using the following example, if configuring using an ini file, skip straight to the install step below

    shell> ./tools/tpm configure defaults \
        --reset \
        --user=tungsten \
        --install-directory=/opt/replicator \
        --replication-user=tungsten \
        --replication-password=secret \
        --replication-port=3306 \
        --profile-script=~/.bashrc \
        --application-user=app_user \
        --application-password=secret \
        --skip-validation-check=MySQLPermissionsCheck \
        --rmi-port=10002 \
        --executable-prefix=mm \
        --thl-port=2113 \
        --rest-api-admin-user=apiuser \
        --rest-api-admin-pass=secret
    
    shell> ./tools/tpm configure east \
        --topology=clustered \
        --connectors=east1,east2,east3 \
        --master=east1 \
        --members=east1,east2,east3
    
    shell> ./tools/tpm configure west \
        --topology=clustered \
        --connectors=west1,west2,west3 \
        --master=west1 \
        --members=west1,west2,west3
    
    shell> ./tools/tpm configure east_west \
        --topology=cluster-slave \
        --master-dataservice=east \
        --slave-dataservice=west \
        --thl-port=2113
    
    shell> ./tools/tpm configure west_east \
        --topology=cluster-slave \
        --master-dataservice=west \
        --slave-dataservice=east \
        --thl-port=2115
    
  7. Run tpm to install the software with the configuration.

    shell > ./tools/tpm install

    During the startup and installation, tpm will notify you of any problems that need to be fixed before the service can be correctly installed and started. If the service starts correctly, you should see the configuration and current status of the service.

  8. Initialize your PATH and environment.

    shell> source /opt/continuent/share/env.sh
    shell> source /opt/replicator/share/env.sh

The Multi-Site/Active-Active clustering should be installed and ready to use.

3.3.3. Best Practices: Multi-Site/Active-Active Clusters

Note

The procedures in this section are designed for the Multi-Site/Active-Active topology ONLY. Do NOT use these procedures for Composite Active/Active Clustering uing v6 onwards.

For version 6.x onwards, Composite Active/Active Clustering, please refer to Section 3.4, “Deploying Composite Active/Active Clusters”

Note

In addition to this information, follow the guidelines in Section 2.5, “Best Practices”.

  • Running a Multi-Site/Active-Active service uses many different components to keep data updated on all servers. Monitoring the dataservice is divided into monitoring the two different clusters. Be mindful when using commands that you have the correct path. You should either use the full path to the command under /opt/continuent and /opt/replicator, or use the aliases created by setting the --executable-prefix=mm option. Calling trepctl would become mm_trepctl.

  • Configure your database servers with distinct auto_increment_increment and auto_increment_offset settings. Each location that may accept writes should have a unique offset value.

Using cctrl gives you the dataservice status individually for the east and west dataservice. For example, the east dataservice is shown below:

Continuent Tungsten 7.1.4 build 10
east: session established
[LOGICAL] /east > ls

COORDINATOR[east1:AUTOMATIC:ONLINE]

ROUTERS:
+----------------------------------------------------------------------------+
|connector@east1[17951](ONLINE, created=0, active=0)                         |
|connector@east2[17939](ONLINE, created=0, active=0)                         |
|connector@east3[17961](ONLINE, created=0, active=0)                         |
+----------------------------------------------------------------------------+

DATASOURCES:
+----------------------------------------------------------------------------+
|east1(master:ONLINE, progress=29, THL latency=0.739)                        |
|STATUS [OK] [2013/11/25 11:24:35 AM GMT]                                    |
+----------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                     |
|  REPLICATOR(role=master, state=ONLINE)                                     |
|  DATASERVER(state=ONLINE)                                                  |
|  CONNECTIONS(created=0, active=0)                                          |
+----------------------------------------------------------------------------+

+----------------------------------------------------------------------------+
|east2(slave:ONLINE, progress=29, latency=0.721)                             |
|STATUS [OK] [2013/11/25 11:24:39 AM GMT]                                    |
+----------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                     |
|  REPLICATOR(role=slave, master=east1, state=ONLINE)                        |
|  DATASERVER(state=ONLINE)                                                  |
|  CONNECTIONS(created=0, active=0)                                          |
+----------------------------------------------------------------------------+

+----------------------------------------------------------------------------+
|east3(slave:ONLINE, progress=29, latency=1.143)                             |
|STATUS [OK] [2013/11/25 11:24:38 AM GMT]                                    |
+----------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                     |
|  REPLICATOR(role=slave, master=east1, state=ONLINE)                        |
|  DATASERVER(state=ONLINE)                                                  |
|  CONNECTIONS(created=0, active=0)                                          |
+----------------------------------------------------------------------------+

When checking the current status, it is import to compare the sequence numbers from each service correctly. There are four services to monitor, the Tungsten Cluster service east, and a Tungsten Replicator service east that reads data from the west Tungsten Cluster service. A corresponding west Tungsten Cluster and west Tungsten Replicator service.

  • When data is inserted on the Primary within the east Tungsten Cluster, use cctrl to determine the cluster status. Sequence numbers within the Tungsten Cluster east should match, and latency between hosts in the Tungsten Cluster service are relative to each other.

  • When data is inserted on east, the sequence number of the east Tungsten Cluster service and east Tungsten Replicator service (on west{1,2,3}) should be compared.

  • When data is inserted on the Primary within the east Tungsten Cluster, use cctrl to determine the cluster status. Sequence numbers within the Tungsten Cluster east should match, and latency between hosts in the Tungsten Cluster service are relative to each other.

  • When data is inserted on west, the sequence number of the west Tungsten Cluster service and west Tungsten Replicator service (on east{1,2,3}) should be compared.

  Tungsten Cluster Service Seqno Tungsten Replicator Service Seqno
Operation east west east west
Insert/update data on east Seqno Increment   Seqno Increment  
Insert/update data on west   Seqno Increment   Seqno Increment

Within each cluster, cctrl can be used to monitor the current status. For more information on checking the status and controlling operations, see Section 6.3, “Checking Dataservice Status”.

Note

For convenience, the shell PATH can be updated with the tools and configuration. With two separate services, both environments must be updated. To update the shell with the Tungsten Cluster service and tools:

shell> source /opt/continuent/share/env.sh

To update the shell with the Tungsten Replicator service and tools:

shell> source /opt/replicator/share/env.sh

To monitor all services and the current status, you can also use the multi_trepctl command (part of the Tungsten Replicator installation). This generates a unified status report for all the hosts and services configured:

shell> multi_trepctl --by-service
| host  | servicename | role   | state  | appliedlastseqno | appliedlatency |
| east1 | east        | master | ONLINE |               53 |        120.161 |
| east3 | east        | master | ONLINE |               44 |          0.697 |
| east2 | east        | slave  | ONLINE |               53 |        119.961 |
| west1 | east        | slave  | ONLINE |               53 |        119.834 |
| west2 | east        | slave  | ONLINE |               53 |        181.128 |
| west3 | east        | slave  | ONLINE |               53 |        204.790 |
| west1 | west        | master | ONLINE |           294327 |          0.285 |
| west2 | west        | master | ONLINE |           231595 |          0.316 |
| east1 | west        | slave  | ONLINE |           294327 |          0.879 |
| east2 | west        | slave  | ONLINE |           294327 |          0.567 |
| east3 | west        | slave  | ONLINE |           294327 |          1.046 |
| west3 | west        | slave  | ONLINE |           231595 |         22.895 |

In the above example, it can be seen that the west services have a much higher applied last sequence number than the east services, this is because all the writes have been applied within the west cluster.

To monitor individual servers and/or services, use trepctl, using the correct port number and servicename. For example, on east1 to check the status of the replicator within the Tungsten Cluster service:

shell> trepctl status

To check the Tungsten Replicator service, explicitly specify the port and service:

shell> mm_trepctl -service west status

3.3.4. Configuring Startup on Boot

Note

The procedures in this section are designed for the Multi-Site/Active-Active topology ONLY. Do NOT use these procedures for Composite Active/Active Clustering uing v6 onwards.

For version 6.x onwards, Composite Active/Active Clustering, please refer to Section 3.4, “Deploying Composite Active/Active Clusters”

Because there are two different Continuent services running, each must be individually configured to startup on boot:

  • For the Tungsten Cluster service, use Section 4.4, “Configuring Startup on Boot”.

  • For the Tungsten Replicator service, a custom startup script must be created, otherwise the replicator will be unable to start as it has been configured in a different directory.

    1. Create a link from the Tungsten Replicator service startup script in the operating system startup directory (/etc/init.d):

      shell> sudo ln -s /opt/replicator/tungsten/tungsten-replicator/bin/replicator /etc/init.d/mmreplicator
    2. Modify the APP_NAME variable within the startup script (/etc/init.d/mmreplicator) to mmreplicator:

      APP_NAME="mmreplicator"
    3. Update the operating system startup configuration to use the updated script.

      On Debian/Ubuntu:

      shell> sudo update-rc.d mmreplicator defaults

      On RedHat/CentOS:

      shell> sudo checkconfig --add mmreplicator

3.3.5. Resetting a single dataservice

Note

The procedures in this section are designed for the Multi-Site/Active-Active topology ONLY. Do NOT use these procedures for Composite Active/Active Clustering uing v6 onwards.

For version 6.x onwards, Composite Active/Active Clustering, please refer to Section 3.4, “Deploying Composite Active/Active Clusters”

Under certain conditions, dataservices in an active/active configuration may drift and/or become inconsistent with the data in another dataservice. If this occurs, you may need to re-provision the data on one or more of the dataservices after first determining the definitive source of the information.

In the following example the west service has been determined to be the definitive copy of the data. To fix the issue, all the datasources in the east service will be reprovisioned from one of the datasources in the west service.

The following is a guide to the steps that should be followed. In the example procedure it is the east service that has failed:

  1. Put the dataservice into MAINTENANCE mode. This ensures that Tungsten Cluster will not attempt to automatically recover the service.

    cctrl [east]> set policy maintenance
  2. On the east, failed, Tungsten Cluster service, put each Tungsten Connector offline:

    cctrl [east]> router * offline
  3. Reset the failed Tungsten Replicator service on all servers connected to the failed Tungsten Cluster service. For example, on west{1,2,3} reset the east Tungsten Replicator service:

    shell west> /opt/replicator/tungsten/tungsten-replicator/bin/trepctl -service east offline
    shell west> /opt/replicator/tungsten/tungsten-replicator/bin/trepctl -service east reset -all -y
  4. Reset the Tungsten Cluster service on each server within the failed region (east{1,2,3}):

    shell east> /opt/continuent/tungsten/tungsten-replicator/bin/replicator stop
    shell east> /opt/continuent/tungsten/tools/tpm reset east
    shell east> /opt/continuent/tungsten/tungsten-replicator/bin/replicator start
  5. Restore a backup on each host (east{1,2,3}) in the failed east service from a host in the west service:

    shell east> /opt/continuent/tungsten/tungsten-replicator/scripts/tungsten_provision_slave \
        --direct --source=west1
  6. Place all the Tungsten Replicator services on west{1,2,3} back online:

    shell west> /opt/replicator/tungsten/tungsten-replicator/bin/trepctl -service east online
  7. On the east, failed, Tungsten Cluster service, put each Tungsten Connector online:

    cctrl [east]> router * online
  8. Set the policy back to automatic:

    cctrl> set policy automatic

3.3.6. Resetting all dataservices

Note

The procedures in this section are designed for the Multi-Site/Active-Active topology ONLY. Do NOT use these procedures for Composite Active/Active Clustering uing v6 onwards.

For version 6.x onwards, Composite Active/Active Clustering, please refer to Section 3.4, “Deploying Composite Active/Active Clusters”

To reset all of the dataservices and restart the Tungsten Cluster and Tungsten Replicator services:

On all hosts (e.g. east{1,2,3} and west{1,2,3}):

shell> /opt/replicator/tungsten/tungsten-replicator/bin/replicator stop
shell> /opt/replicator/tungsten/tools/tpm reset
shell> /opt/continuent/tungsten/tools/tpm reset
shell> /opt/replicator/tungsten/tungsten-replicator/bin/replicator start

3.3.7. Provisioning during live operations

Note

The procedures in this section are designed for the Multi-Site/Active-Active topology ONLY. Do NOT use these procedures for Composite Active/Active Clustering uing v6 onwards.

For version 6.x onwards, Composite Active/Active Clustering, please refer to Section 3.4, “Deploying Composite Active/Active Clusters”

In the event of a failure within one host in the service where you need to reprovision the host from another running Replica:

  • Identify the servers that are failed. All servers that are not the Primary for their region can be re-provisioned using a backup/restore of the Primary (see Section 6.10, “Creating a Backup” or using the tungsten_provision_slave script.

  • To re-provision an entire region, follow the steps below. The east region is used in the example statements below:

    1. To prevent application servers from reading and writing to the failed service, place the Tungsten Connector offline within the failed region:

      cctrl [east]> router * offline
    2. On all servers in other regions (west{1,2,3}):

      shell> /opt/replicator/tungsten/tungsten-replicator/bin/trepctl -service east offline
      shell> /opt/replicator/tungsten/tungsten-replicator/bin/trepctl -service east reset -all -y
    3. On all servers in the failed region (east{1,2,3}):

      shell> /opt/replicator/tungsten/tungsten-replicator/bin/replicator stop
      shell> /opt/replicator/tungsten/tools/tpm reset
      shell> /opt/continuent/tungsten/tungsten-replicator/scripts/tungsten_provision_slave \
          --direct --source=west1
    4. Check that Tungsten Cluster is working correctly and all hosts are up to date:

      cctrl [east]> ls
    5. Restart the Tungsten Replicator service:

      shell> /opt/replicator/tungsten/tungsten-replicator/bin/replicator start
    6. On all servers in other regions (west{1,2,3}):

      shell> /opt/replicator/tungsten/tungsten-replicator/bin/trepctl -service east online

3.3.8. Adding a new Cluster/Dataservice

Note

The procedures in this section are designed for the Multi-Site/Active-Active topology ONLY. Do NOT use these procedures for Composite Active/Active Clustering uing v6 onwards.

For version 6.x onwards, Composite Active/Active Clustering, please refer to Section 3.4, “Deploying Composite Active/Active Clusters”

To add an entirely new cluster (dataservice) to the mesh, follow the below simple procedure.

Note

There is no need to set the Replicator starting points, and no downtime/maintenance window is required!

  1. Choose a cluster to take a node backup from:

    • Choose a cluster and Replica node to take a backup from.

    • Enable maintenance mode for the cluster:

      shell> cctrl
      cctrl> set policy maintenance
    • Shun the selected Replica node and stop both local and cross-site replicator services:

      shell> cctrl
      cctrl> datasource {replica_hostname_here} shun
      replica shell> trepctl offline
      replica shell> replicator stop
      replica shell> mm_trepctl offline
      replica; shell> mm_replicator stop
    • Take a backup of the shunned node, then copy to/restore on all nodes in the new cluster.

    • Recover the Replica node and put cluster back into automatic mode:

      replica shell> replicator start
      replica shell> trepctl online
      replica shell> mm_replicator start
      replica shell> mm_trepctl online
      shell> cctrl
      cctrl> datasource {replica_hostname_here} online
      cctrl> set policy automatic
  2. On ALL nodes in all three (3) clusters, ensure the /etc/tungsten/tungsten.ini has all three clusters defined and all the correct cross-site combinations.

  3. Install the Tungsten Clustering software on new cluster nodes to create a single standalone cluster and check the cctrl command to be sure the new cluster is fully online.

  4. Install the Tungsten Replicator software on all new cluster nodes and start it.

    Replication will now be flowing INTO the new cluster from the original two.

  5. On the original two clusters, run tools/tpm update from the cross-site replicator staging software path:

    shell> mm_tpm query staging
    shell> cd {replicator_staging_directory}
    shell> tools/tpm update --replace-release
    shell> mm_trepctl online
    shell> mm_trepctl services

    Check the output from the mm_trepctl services command output above to confirm the new service appears and is online.

Note

There is no need to set the cross-site replicators at a starting position because:

  • Replicator feeds from the new cluster to the old clusters start at seqno 0.

  • The tungsten_olda and tungsten_oldb database schemas will contain the correct starting points for the INBOUND feed into the new cluster, so when the cross-site replicators are started and brought online they will read from the tracking table and carry on correctly from the stored position.

3.3.9. Enabling SSL for Replicators Only

Note

The procedures in this section are designed for the Multi-Site/Active-Active topology ONLY. Do NOT use these procedures for Composite Active/Active Clustering uing v6 onwards.

For version 6.x onwards, Composite Active/Active Clustering, please refer to Section 3.4, “Deploying Composite Active/Active Clusters”

It is possible to enable secure communications for just the Replicator layer in a Multi-Site/Active-Active topology. This would include both the Cluster Replicators and the Cross-Site Replicators because they cannot be SSL-enabled independently.

  1. Create a certificate and load it into a java keystore, and then load it into a truststore and place all files into the /etc/tungsten/ directory. For detailed instructions, see Chapter 5, Deployment: Security

  2. Update /etc/tungsten/tungsten.ini to include these additional lines in the both the defaults section and the defaults.replicator section:

    [defaults]
    ...
    java-keystore-path=/etc/tungsten/keystore.jks
    java-keystore-password=secret
    java-truststore-path=/etc/tungsten/truststore.ts
    java-truststore-password=secret
    thl-ssl=true
    
    [defaults.replicator]
    ...
    java-keystore-path=/etc/tungsten/keystore.jks
    java-keystore-password=secret
    java-truststore-path=/etc/tungsten/truststore.ts
    java-truststore-password=secret
    thl-ssl=true 
    
  3. Put all clusters into maintenance mode.

    shell> cctrl
    cctrl> set policy maintenance
  4. On all hosts, update the cluster configuration:

    shell> tpm query staging
    shell> cd {cluster_staging_directory}
    shell> tools/tpm update
    shell> trepctl online
    shell> trepctl status | grep thl

    On all hosts, update the cross-site replicator configuration:

    shell> mm_tpm query staging
    shell> cd {replicator_staging_directory}
    shell> tools/tpm update
    shell> mm_trepctl online
    shell> mm_trepctl status | grep thl

    Important

    Please note that all replication will effectively be down until all nodes/services are SSL-enabled and online.

  5. Once all the updates are done and the Replicators are back up and running, use the various commands to check that secure communications have been enabled.

    Each datasource will show [SSL] when enabled:

    shell> cctrl
    cctrl> ls
    
    DATASOURCES:
    +----------------------------------------------------------------------------+
    |db1(master:ONLINE, progress=208950063, THL latency=0.895)                   |
    |STATUS [OK] [2018/04/10 11:47:57 AM UTC][SSL]                               |
    +----------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                     |
    |  REPLICATOR(role=master, state=ONLINE)                                     |
    |  DATASERVER(state=ONLINE)                                                  |
    |  CONNECTIONS(created=15307, active=2)                                      |
    +----------------------------------------------------------------------------+
    
    +----------------------------------------------------------------------------+
    |db2(slave:ONLINE, progress=208950061, latency=0.920)                        |
    |STATUS [OK] [2018/04/19 11:18:21 PM UTC][SSL]                               |
    +----------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                     |
    |  REPLICATOR(role=slave, master=db1, state=ONLINE)                          |
    |  DATASERVER(state=ONLINE)                                                  |
    |  CONNECTIONS(created=0, active=0)                                          |
    +----------------------------------------------------------------------------+
    
    +----------------------------------------------------------------------------+
    |db3(slave:ONLINE, progress=208950063, latency=0.939)                        |
    |STATUS [OK] [2018/04/25 12:17:20 PM UTC][SSL]                               |
    +----------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                     |
    |  REPLICATOR(role=slave, master=db1, state=ONLINE)                          |
    |  DATASERVER(state=ONLINE)                                                  |
    |  CONNECTIONS(created=0, active=0)                                          |
    +----------------------------------------------------------------------------+

    Both the local cluster replicator status command trepctl status and the cross-site replicator status command mm_trepctl status will show thls instead of thl in the values for masterConnectUri, masterListenUri and pipelineSource.

    shell> trepctl status | grep thl
    
    masterConnectUri       : thls://db1:2112/
    masterListenUri        : thls://db5:2112/
    pipelineSource         : thls://db1:2112/

3.3.10. Dataserver maintenance

Note

The procedures in this section are designed for the Multi-Site/Active-Active topology ONLY. Do NOT use these procedures for Composite Active/Active Clustering uing v6 onwards.

For version 6.x onwards, Composite Active/Active Clustering, please refer to Section 3.4, “Deploying Composite Active/Active Clusters”

To perform maintenance on the dataservice, for example to update the MySQL configuration file, can be achieved in a similar sequence to that shown in Section 6.15, “Performing Database or OS Maintenance”, except that you must also restart the corresponding Tungsten Replicator service after the main Tungsten Cluster service has been placed back online.

For example, to perform maintenance on the east service:

  1. Put the dataservice into MAINTENANCE mode. This ensures that Tungsten Cluster will not attempt to automatically recover the service.

    cctrl [east]> set policy maintenance
  2. Shun the first Replica datasource so that maintenance can be performed on the host.

    cctrl [east]> datasource east1 shun
  3. Perform the updates, such as updating my.cnf, changing schemas, or performing other maintenance.

  4. If MySQL configuration has been modified, restart the MySQL service:

    cctrl [east]> service host/mysql restart
  5. Bring the host back into the dataservice:

    cctrl [east]> datasource host recover
  6. Perform a switch so that the Primary becomes a Replica and can then be shunned and have the necessary maintenance performed:

    cctrl [east]> switch
  7. Repeat the previous steps to shun the host, perform maintenance, and then switch again until all the hosts have been updated.

  8. Set the policy back to automatic:

    cctrl> set policy automatic
  9. On each host in the other region, manually restart the Tungsten Replicator service, which will have gone offline when MySQL was restarted:

    shell> /opt/replicator/tungsten/tungsten-replicator/bin/trepctl -host host -service east online

3.3.10.1. Fixing Replication Errors

In the event of a replication fault, the standard cctrl, trepctl and other utility commands in Chapter 9, Command-line Tools can be used to bring the dataservice back into operation. All the tools are safe to use.

If you have to perform any updates or modifications to the stored MySQL data, ensure binary logging has been disabled using:

mysql> SET SESSION SQL_LOG_BIN=0;

Before running any commands. This prevents statements and operations reaching the binary log so that the operations will not be replicated to other hosts.

3.4. Deploying Composite Active/Active Clusters

A Composite Active/Active (CAA) Cluster topology provides all the benefits of a typical dataservice at a single location, but with the benefit of also replicating the information to another site. The underlying configuration within Tungsten Cluster uses two services within each node; one provides the replication within the cluster, and the second provides replication from the remote cluster. Both are managed by the Tungsten Manager

Note

Composite Active/Active Clusters were previously referred to as Multi-Site/Active-Active (MSAA) clusters. The name has been updated to reflect the nature of these clusters as part of an overall active/active deployment using clusters, where the individual clusters could be in the same or different locations.

Whilst the older Multi-Site/Active-Active topology is still valid and supported, it is recommended that this newer Composite Active/Active topology is adopted from version 6 of Tungsten Cluster onwards. For details on the older topology, see Section 3.3, “Deploying Multi-Site/Active-Active Clustering”

The configuration is handled with a single configuration and deployment that configures the core cluster services and additional cross-cluster services.

A sample display of how this operates is provided in Figure 3.4, “Topologies: Composite Active/Active Clusters”.

Figure 3.4. Topologies: Composite Active/Active Clusters

Topologies: Composite Active/Active Clusters

The service can be described as follows:

  • Tungsten Cluster Service: east

    Replicates data between east1, east2 and east3.

  • Tungsten Cluster Service: west

    Replicates data between west1, west2 and west3.

  • Tungsten Cluster Service: west_from_east

    Defines the replication service using a secondary sub-service within the cluster. This service reads THL FROM east and writes to the relay node in west, subsequently, the replica nodes within west are then replicated to from there.

  • Tungsten Replicator Service: east_from_west

    Defines the replication service using a secondary sub-service within the cluster. This service reads THL FROM west and writes to the relay node in east, subsequently, the replica nodes within east are then replicated to from there.

A new Composite Dynamic Active/Active topology was introduced from version 7.0.0 of Tungsten Cluster

Composite Dynamic Active/Active builds on the foundation of the Composite Active/Active topology and the cluster continues to operate and be configured in the same way.

The difference is, with Composite Dynamic Active/Active, the cluster instructs the Proxy layer to behave like a Composite Active/Passive cluster.

For more information on this topology and how to enable it, see Section 3.5, “Deploying Composite Dynamic Active/Active”

3.4.1. Prepare: Composite Active/Active Clusters

Some considerations must be taken into account for any active/active scenarios:

  • For tables that use auto-increment, collisions are possible if two hosts select the same auto-increment number. You can reduce the effects by configuring each MySQL host with a different auto-increment settings, changing the offset and the increment values. For example, adding the following lines to your my.cnf file:

    auto-increment-offset = 1
    auto-increment-increment = 4

    In this way, the increments can be staggered on each machine and collisions are unlikely to occur.

  • Use row-based replication. Update your configuration file to explicitly use row-based replication by adding the following to your my.cnf file:

    binlog-format = row
  • Beware of triggers. Triggers can cause problems during replication because if they are applied on the Replica as well as the Primary you can get data corruption and invalid data. Tungsten Cluster cannot prevent triggers from executing on a Replica, and in an active/active topology there is no sensible way to disable triggers. Instead, check at the trigger level whether you are executing on a Primary or Replica. For more information, see Section C.4.1, “Triggers”.

3.4.2. Install: Composite Active/Active Clusters

Deployment of Composite Active/Active clusters is only supported using the INI method of deployment.

Configuration and deployment of the cluster works as follows:

  • Creates two basic Primary/Replica clusters.

  • Creates a composite service that includes the Primary/Replica clusters within the definition.

The resulting configuration within the example builds the following deployment:

  • One cluster, east, with three hosts.

  • One cluster, west, with three hosts.

  • All six hosts in the two clusters will have a manager, replicator and connector installed.

  • Each replicator has two replication services, one service that replicates the data within the cluster. The second service, replicates data from the other cluster to this host.

Creating the full topology requires a single install step, this creates the Tungsten Cluster cluster dataservices, and creates the Composite dataservices on different network ports to allow for the cross-cluster replication to operate.

  1. Create the combined configuration file /etc/tungsten/tungsten.ini on all cluster hosts:

    shell> vi /etc/tungsten/tungsten.ini
    [defaults]
    user=tungsten
    install-directory=/opt/continuent
    profile-script=~/.bash_profile
    replication-user=tungsten
    replication-password=secret
    replication-port=13306
    application-user=app_user
    application-password=secret
    application-port=3306
    rest-api-admin-user=apiuser
    rest-api-admin-pass=secret
    
    [east]
    topology=clustered
    master=east1
    members=east1,east2,east3
    connectors=east1,east2,east3
    
    [west]
    topology=clustered
    master=west1
    members=west1,west2,west3
    connectors=west1,west2,west3
    
    [usa]
    topology=composite-multi-master
    composite-datasources=east,west
    

    Configuration group defaults

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Configuration group east

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Configuration group west

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Configuration group usa

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    The configuration above defines two clusters, east and west, which are both part of a composite cluster service, usa. Configuration can be divided up into the four sections shown, as follows:

    Note

    If you plan to make full use of the REST API (which is enabled by default) you will need to also configure a username and password for API Access. This must be done by specifying the following options in your configuration:

    rest-api-admin-user=tungsten
    rest-api-admin-pass=secret

    Warning

    Service names should not contain the keyword from within a Composite Active/Active deployment. This keyword is used (with the underscore separator, for example, east_from_west to denote cross-site replicators within the cluster. To avoid confusion, avoid using from so that it is easy to distinguish between replication pipelines.

    When configuring this service, tpm will automatically imply the following into the configuration:

    • A parent composite service, usa in this example, with child services as listed, east and west.

    • Replication services between each child service, using the service name a_from_b, for example, east_from_west and west_from_east.

      More child services will create more automatic replication services. For example, with three clusters, alpha, beta, and gamma, tpm would configure alpha_from_beta and alpha_from_gamma on the alpha cluster, beta_from_alpha and beta_from_gamma on the beta cluster, and so on.

    • For each additional service, the port number is automatically configured from the base port number for the first service. For example, using the default port 2112, the east_from_west service would have THL port 2113.

  2. Execute the installation on each host within the entire composite cluster. For example, on all six hosts provided in the sample configuration above.

    1. Install the Tungsten Cluster package (.rpm), or download the compressed tarball and unpack it:

      shell> cd /opt/continuent/software
      shell> tar zxf tungsten-clustering-7.1.4-10.tar.gz
    2. Change to the Tungsten Cluster staging directory:

      shell> cd tungsten-clustering7.1.4-10
    3. Run tpm to install the Clustering software:

      shell > ./tools/tpm install

      During the installation and startup, tpm will notify you of any problems that need to be fixed before the service can be correctly installed and started. If the service starts correctly, you should see the configuration and current status of the service.

  3. Initialize your PATH and environment:

    shell> source /opt/continuent/share/env.sh

The Composite Active/Active clustering should be installed and ready to use.

3.4.3. Best Practices: Composite Active/Active Clusters

Note

In addition to this information, follow the guidelines in Section 2.5, “Best Practices”.

  • Running a Composite Active/Active service uses many different components to keep data updated on all servers. Monitoring the dataservice is divided into monitoring the two different clusters and each cluster sub-service cluster responsible for replication to/from remote clusters.

  • Configure your database servers with distinct auto_increment_increment and auto_increment_offset settings. Each location that may accept writes should have a unique offset value.

Using cctrl gives you the dataservice status. By default, cctrl will connect you to the custer associated with the node that you issue the command from. To start at the top level, issue cctrl -multi instead

At the top level, the composite cluster output shows the composite service, composite cluster members and replication services:

Tungsten Clustering 7.1.4 build 10
east: session established, encryption=false, authentication=false
[LOGICAL] / > ls
usa
  east
    east_from_west
  west

To examine the overall composite cluster status, change to the composite cluster and use ls:

[LOGICAL] / > use usa
[LOGICAL] /usa > ls

COORDINATOR[west3:AUTOMATIC:ONLINE]
   east:COORDINATOR[east3:AUTOMATIC:ONLINE]
   west:COORDINATOR[west3:AUTOMATIC:ONLINE]

ROUTERS:
+---------------------------------------------------------------------------------+
|connector@east1[10583](ONLINE, created=0, active=0)                              |
|connector@east2[10548](ONLINE, created=0, active=0)                              |
|connector@east3[10540](ONLINE, created=0, active=0)                              |
|connector@west1[10589](ONLINE, created=0, active=0)                              |
|connector@west2[10541](ONLINE, created=0, active=0)                              |
|connector@west3[10547](ONLINE, created=0, active=0)                              |
+---------------------------------------------------------------------------------+

DATASOURCES:
+---------------------------------------------------------------------------------+
|east(composite master:ONLINE, global progress=1, max latency=3.489)              |
|STATUS [OK] [2019/12/24 10:21:08 AM UTC]                                         |
+---------------------------------------------------------------------------------+
|  east(master:ONLINE, progress=1, max latency=1.483)                             |
|  east_from_west(relay:ONLINE, progress=1, max latency=3.489)                    |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|west(composite master:ONLINE, global progress=1, max latency=0.909)              |
|STATUS [OK] [2019/12/24 10:21:08 AM UTC]                                         |
+---------------------------------------------------------------------------------+
|  west(master:ONLINE, progress=1, max latency=0.909)                             |
|  west_from_east(relay:ONLINE, progress=1, max latency=0.903)                    |
+---------------------------------------------------------------------------------+

For each cluster within the composite cluster, four lines of information are provided:

  • |east(composite master:ONLINE, global progress=1, max latency=3.489)              |

    This line indicates:

    • The name and type of the composite cluster, and whether the Primary in the cluster is online.

    • The global progress. This is a counter that combines the local progress of the cluster, and the replication of data from this cluster to the remote clusters in the composite to this cluster. For example, if data is inserted into west

    • The maximum latency within the cluster.

  • |STATUS [OK] [2019/12/24 10:21:08 AM UTC]                                         |

    The status and date within the Primary of the cluster.

  • |  east(master:ONLINE, progress=1, max latency=1.483)                             |

    The status and progress of the cluster.

  • |  east_from_west(relay:ONLINE, progress=1, max latency=3.489)                    |

    The status and progress of remote replication from the cluster.

The global progress and the progress work together to provide an indication of the overall replication status within the composite cluster:

  • Inserting data into the Primary on east will:

    • Increment the progress within the east cluster.

    • Increment the global progress within the east cluster.

  • Inserting data into the Primary on west will:

    • Increment the progress within the west cluster.

    • Increment the global progress within the west cluster.

Looking at the individual cluster shows only the cluster status, not the cross-cluster status:

[LOGICAL] /east > ls
COORDINATOR[east3:AUTOMATIC:ONLINE]

ROUTERS:
+---------------------------------------------------------------------------------+
|connector@east1[10583](ONLINE, created=0, active=0)                              |
|connector@east2[10548](ONLINE, created=0, active=0)                              |
|connector@east3[10540](ONLINE, created=0, active=0)                              |
|connector@west1[10589](ONLINE, created=0, active=0)                              |
|connector@west2[10541](ONLINE, created=0, active=0)                              |
|connector@west3[10547](ONLINE, created=0, active=0)                              |
+---------------------------------------------------------------------------------+

DATASOURCES:
+---------------------------------------------------------------------------------+
|east1(master:ONLINE, progress=1, THL latency=0.765)                              |
|STATUS [OK] [2019/12/24 10:21:12 AM UTC]                                         |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=master, state=ONLINE)                                          |
|  DATASERVER(state=ONLINE)                                                       |
|  CONNECTIONS(created=0, active=0)                                               |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|east2(slave:ONLINE, progress=1, latency=0.826)                                   |
|STATUS [OK] [2019/12/24 10:21:13 AM UTC]                                         |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=slave, master=east1, state=ONLINE)                             |
|  DATASERVER(state=ONLINE)                                                       |
|  CONNECTIONS(created=0, active=0)                                               |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|east3(slave:ONLINE, progress=1, latency=0.842)                                   |
|STATUS [OK] [2019/12/24 10:21:12 AM UTC]                                         |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=slave, master=east1, state=ONLINE)                             |
|  DATASERVER(state=ONLINE)                                                       |
|  CONNECTIONS(created=0, active=0)                                               |
+---------------------------------------------------------------------------------+

Within each cluster, cctrl can be used to monitor the current status. For more information on checking the status and controlling operations, see Section 6.3, “Checking Dataservice Status”.

To monitor all services and the current status, you can also use the multi_trepctl command (part of the Tungsten Replicator installation). This generates a unified status report for all the hosts and services configured:

shell> multi_trepctl --by-service
| host  | servicename    | role   | state  | appliedlastseqno | appliedlatency |
| east1 | east           | master | ONLINE |                5 |          0.440 |
| east2 | east           | slave  | ONLINE |                5 |          0.538 |
| east3 | east           | slave  | ONLINE |                5 |          0.517 |
| east1 | east_from_west | relay  | ONLINE |               23 |          0.074 |
| east2 | east_from_west | slave  | ONLINE |               23 |          0.131 |
| east3 | east_from_west | slave  | ONLINE |               23 |          0.111 |
| west1 | west           | master | ONLINE |               23 |          0.021 |
| west2 | west           | slave  | ONLINE |               23 |          0.059 |
| west3 | west           | slave  | ONLINE |               23 |          0.089 |
| west1 | west_from_east | relay  | ONLINE |                5 |          0.583 |
| west2 | west_from_east | slave  | ONLINE |                5 |          0.562 |
| west3 | west_from_east | slave  | ONLINE |                5 |          0.592 |

In the above example, it can be seen that the west services have a higher applied last sequence number than the east services, this is because all the writes have been applied within the west cluster.

To monitor individual servers and/or services, use trepctl, using the correct servicename. For example, on east1 to check the status of the replicator within the Tungsten Cluster service, use the trepctl services command to get the status of both the local and cross-cluster services:

shell> trepctl status
Processing services command...
NAME              VALUE
----              -----
appliedLastSeqno: 6
appliedLatency  : 0.43
role            : master
serviceName     : east
serviceType     : local
started         : true
state           : ONLINE
NAME              VALUE
----              -----
appliedLastSeqno: 4
appliedLatency  : 1837.999
role            : relay
serviceName     : east_from_west
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

To get a more detailed status, you must explicitly specify the service

shell> trepctl -service east_from_west status

3.4.4. Configuring Startup on Boot

For the Tungsten Cluster service, use Section 4.4, “Configuring Startup on Boot”.

3.4.5. Resetting a single dataservice

Under certain conditions, dataservices in an active/active configuration may drift and/or become inconsistent with the data in another dataservice. If this occurs, you may need to re-provision the data on one or more of the dataservices after first determining the definitive source of the information.

In the following example the west service has been determined to be the definitive copy of the data. To fix the issue, all the datasources in the east service will be reprovisioned from one of the datasources in the west service.

The following is a guide to the steps that should be followed. In the example procedure it is the east service that has failed:

  1. Put the dataservice into MAINTENANCE mode. This ensures that Tungsten Cluster will not attempt to automatically recover the service.

    cctrl [east]> set policy maintenance
  2. On the east, failed, Tungsten Cluster service, put each Tungsten Connector offline:

    cctrl [east]> router * offline
  3. Reset the local failed service on all servers connected to the remote failed service. For example, on west{1,2,3} reset the west_from_east service:

    shell west> trepctl -service west_from_east offline
    shell west> trepctl -service west_from_east reset -all -y
  4. Reset the local service on each server within the failed region (east{1,2,3}):

    shell east> trepctl -service east offline
    shell east> trepctl -service east reset -all -y
  5. Restore a backup on each host (east{1,2,3}) in the failed east service from a host in the west service:

    shell east> tungsten_provision_slave \
        --direct --source=west1
  6. Place all the services on west{1,2,3} back online:

    shell west> trepctl -service west_from_east online
  7. On the east, failed, Tungsten Cluster service, put each Tungsten Connector online:

    cctrl [east]> router * online
  8. Set the policy back to automatic:

    cctrl> set policy automatic

3.4.6. Resetting all dataservices

To reset all of the dataservices and restart the Tungsten Cluster services:

On all hosts (e.g. east{1,2,3} and west{1,2,3}):

shell> replicator stop
shell> tpm reset
shell> replicator start

3.4.7. Dataserver maintenance

To perform maintenance on the dataservice, for example to update the MySQL configuration file, can be achieved in a similar sequence to that shown in Section 6.15, “Performing Database or OS Maintenance”, except that you must also restart the corresponding Tungsten Replicator service after the main Tungsten Cluster service has been placed back online.

For example, to perform maintenance on the east service:

  1. Put the dataservice into MAINTENANCE mode. This ensures that Tungsten Cluster will not attempt to automatically recover the service.

    cctrl [east]> set policy maintenance
  2. Shun the first Replica datasource so that maintenance can be performed on the host.

    cctrl [east]> datasource east1 shun
  3. Perform the updates, such as updating my.cnf, changing schemas, or performing other maintenance.

  4. If MySQL configuration has been modified, restart the MySQL service:

    cctrl [east]> service host/mysql restart
  5. Bring the host back into the dataservice:

    cctrl [east]> datasource host recover
  6. Perform a switch so that the Primary becomes a Replica and can then be shunned and have the necessary maintenance performed:

    cctrl [east]> switch
  7. Repeat the previous steps to shun the host, perform maintenance, and then switch again until all the hosts have been updated.

  8. Set the policy back to automatic:

    cctrl> set policy automatic
  9. On each host in the other region, manually restart the Tungsten Replicator service, which will have gone offline when MySQL was restarted:

    shell> /opt/replicator/tungsten/tungsten-replicator/bin/trepctl -host host -service east online

3.4.7.1. Fixing Replication Errors

In the event of a replication fault, the standard cctrl, trepctl and other utility commands in Chapter 9, Command-line Tools can be used to bring the dataservice back into operation. All the tools are safe to use.

If you have to perform any updates or modifications to the stored MySQL data, ensure binary logging has been disabled using:

mysql> SET SESSION SQL_LOG_BIN=0;

Before running any commands. This prevents statements and operations reaching the binary log so that the operations will not be replicated to other hosts.

3.4.7.1.1. Recovering Cross Site Services

In a cmm_name; topology, a switch or a failover not only promotes a Replica to be a new Primary, but also will require the ability to reconfigure cross-site communications. This process therefore assumes that cross-site communication is online and working. In some situations, it may be possible that cross-site communication is down, or for some reason cross-site replication is in an OFFLINE:ERROR state - for example a DDL or DML statement that worked in the local cluster may have failed to apply in the remote.

If a switch or failover occurs and the process is unable to reconfigure the cross-site replicators, the local switch will still succeed, however the associated cross-site services will be placed into a SHUNNED(SUBSERVICE-SWITCH-FAILED) state.

The guide explains how to recover from this situation.

  • The examples are based on a 2-cluster topology, named NYC and LONDON and the composite dataservice named GLOBAL.

  • The cluster is configured with the following dataservers:

    • NYC : db1 (Primary), db2 (Replica), db3 (Replica)

    • LONDON: db4 (Primary), db5 (Replica), db6 (Replica)

  • The cross site replicators in both clusters are in an OFFLINE:ERROR state due to failing DDL.

  • A switch was then issued, promoting db3 as the new Primary in NYC and db5 as the new Primary in LONDON

When the cluster enters a state where the cross-site services are in an error, output from cctrl will look like the following:

shell> cctrl -expert -multi
[LOGICAL:EXPERT] / > use london_from_nyc
london_from_nyc: session established, encryption=false, authentication=false
[LOGICAL:EXPERT] /london_from_nyc > ls
COORDINATOR[db6:AUTOMATIC:ONLINE]
 
ROUTERS:
+---------------------------------------------------------------------------------+
|connector@db1[26248](ONLINE, created=0, active=0)                                |
|connector@db2[14906](ONLINE, created=0, active=0)                                |
|connector@db3[15035](ONLINE, created=0, active=0)                                |
|connector@db4[27813](ONLINE, created=0, active=0)                                |
|connector@db5[4379](ONLINE, created=0, active=0)                                 |
|connector@db6[2098](ONLINE, created=0, active=0)                                 |
+---------------------------------------------------------------------------------+
 
DATASOURCES:
+---------------------------------------------------------------------------------+
|db5(relay:SHUNNED(SUBSERVICE-SWITCH-FAILED), progress=6, latency=0.219)          |
|STATUS [SHUNNED] [2018/03/15 10:27:24 AM UTC]                                    |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=relay, master=db3, state=ONLINE)                               |
|  DATASERVER(state=ONLINE)                                                       |
|  CONNECTIONS(created=0, active=0)                                               |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|db4(slave:SHUNNED(SUBSERVICE-SWITCH-FAILED), progress=6, latency=0.252)          |
|STATUS [SHUNNED] [2018/03/15 10:27:25 AM UTC]                                    |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=slave, master=db5, state=ONLINE)                               |
|  DATASERVER(state=ONLINE)                                                       |
|  CONNECTIONS(created=0, active=0)                                               |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|db6(slave:SHUNNED(SUBSERVICE-SWITCH-FAILED), progress=6, latency=0.279)          |
|STATUS [SHUNNED] [2018/03/15 10:27:25 AM UTC]                                    |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=slave, master=db4, state=ONLINE)                               |
|  DATASERVER(state=ONLINE)                                                       |
|  CONNECTIONS(created=0, active=0)                                               |
+---------------------------------------------------------------------------------+

In the above example, you can see that all services are in the SHUNNED(SUBSERVICE-SWITCH-FAILED) state, and partial reconfiguration has happened.

The Replicators for db4 and db6 should be Replicas of db5, db5 has correctly configured to the new Primary in nyc, db3. The actual state of the cluster in each scenario maybe different depending upon the cause of the loss of cross-site communication. Using the steps below, apply the necessary actions that relate to your own cluster state, if in any doubt always contact Continuent Support for assistance.

  1. The first step is to ensure the initial replication errors have been resolved and that the replicators are in an online state, the steps to resolve the replicators will depend on the reason for the error, for further guidance on resolving these issues, see Chapter 6, Operations Guide.

  2. From one node, connect into cctrl at the expert level:

    shell> cctrl -expert -multi
  3. Next, connect to the cross-site subservice, in this example, london_from_nyc

    cctrl> use london_from_nyc
  4. Next, place the service into Maintenance Mode

    cctrl> set policy maintenance
  5. Enable override of commands issued

    cctrl> set force true
  6. Bring the relay datasource online

    cctrl> datasource db5 online
  7. If you need to change the source for the relay replicator to the correct, new, Primary in the remote cluster, take the replicator offline. If the relay source is correct, then move on to step 10

    cctrl> replicator db5 offline
  8. Change the source of the relay replicator

    cctrl> replicator db5 relay nyc/db3
  9. Bring the replicator online

    cctrl> replicator db5 online
  10. For each datasource that requires the replicator altering, issue the following commands:

    cctrl> replicator datasource offline
    cctrl> replicator datasource slave db5
    cctrl> replicator datasource online

    For example:

    cctrl> replicator db4 offline
    cctrl> replicator db4 slave db5
    cctrl> replicator db4 online
  11. Once all replicators are using the correct source, we can then bring the cluster back

    cctrl> cluster welcome
  12. Some of the datasources may still be in the SHUNNED state, so for each of those, you can then issue the following

    cctrl> datasource datasource online

    For example:

    cctrl> datasource db4 online
  13. Once all nodes are online, we can then return the cluster to automatic

    cctrl> set policy automatic
  14. Repeat this process for the other cross-site subservice if required

Direct link video.

3.4.8. Adding a Cluster to a Composite Active/Active Topology

This procedure explains how to add additional clusters to an existing v6.x (or newer) Composite Active/Active configuration.

The example in this procedure adds a new 3-node cluster consisting of nodes db7, db8 and db9 within a service called Tokyo. The existing cluster contains two dataservices, NYC and London, made up of nodes db1, db2, db3 and db4, db5, db6 respectively.

3.4.8.1. Pre-Requisites

Ensure the new nodes have all the necessary pre-requisites in place, specifically paying attention to the following:

  • MySQL auto_increment parameters set appropriately on existing and new clusters

  • All new nodes have full connectivity to the existing nodes and the hosts file contains correct hostnames

  • All existing nodes have full connectivity to the new nodes and hosts file contains correct hostnames

3.4.8.2. Backup and Restore

We need to provision all the new nodes in the new cluster with a backup taken from one node in any of the existing clusters. In this example we are using db6 in the London dataservice as the source for the backup.

  1. Shun and stop the services on the node used for the backup

    db6-shell> cctrl
    cctrl> datasource db6 shun
    cctrl> replicator db6 offline
    cctrl> exit
    db6-shell> stopall
    db6-shell> sudo service mysqld stop
  2. Next, use whichever method you wish to copy the mysql datafiles from db6 to all the nodes in the new cluster (scp, rsync, xtrabackup etc). Ensure ALL database files are copied.

  3. Once backup copied across, restart the services on db6

    db6-shell> sudo service mysqld start
    db6-shell> startall
    db6-shell> cctrl
    cctrl> datasource db6 recover
    cctrl> exit
  4. Ensure all files copied to the target nodes have the correct file ownership

  5. Start mysql on the new nodes

3.4.8.3. Update Existing Configuration

Next we need to change the configuration on the existing hosts to include the configuration of the new cluster.

You need to add a new service block that includes the new nodes and append the new service to the composite-datasource parameter in the composite dataservice, all within /etc/tungsten/tungsten.ini

Example of a new service block and composite-datasource change added to existing hosts configuration:

[tokyo]
topology=clustered
master=db7
members=db7,db8,db9
connectors=db7,db8,db9
[global]
topology=composite-multi-master
composite-datasources=nyc,london,tokyo

3.4.8.4. New Host Configuration

To avoid any differences in configuration, once the changes have been made to the tungsten.ini on the existing hosts, copy this file from one of the nodes to all the nodes in the new cluster.

Ensure start-and-report is false or not set in the config.

3.4.8.5. Install on new nodes

On the 3 new nodes, validate the software:

shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
shell> tools/tpm validate

This may produce Warnings that the tracking schemas for the existing cluster already exist - this is OK and they can be ignored. Assuming no other unexpected errors are reported, then go ahead and install the software:

shell> tools/tpm install

3.4.8.6. Update existing nodes

Before we start the new cluster, we now need to update the existing clusters

  1. Put entire cluster into MAINTENANCE

    shell> cctrl
    cctrl> use {composite-dataservice}
    cctrl> set policy maintenance
    cctrl> ls
    COORDINATOR[db3:MAINTENANCE:ONLINE]
       london:COORDINATOR[db4:MAINTENANCE:ONLINE]
          nyc:COORDINATOR[db3:MAINTENANCE:ONLINE]
    cctrl> exit
  2. Update the software on each node. This needs to be executed from the software staging directory using the replace-release option as this will ensure the new cross-site dataservices are setup correctly. Update the Primaries first followed by the Replicas, cluster by cluster:

    shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
    shell> tools/tpm update --replace-release

3.4.8.7. Start the new cluster

On all the nodes in the new cluster, start the software:

shell> startall

3.4.8.8. Validate and check

Using cctrl, check that the new cluster appears and that all services are correctly showing online, it may take a few moments for the cluster to settle down and start everything

shell> cctrl
cctrl> use {composite-dataservice}
cctrl> ls
cctrl> exit

Check the output of trepctl and ensure all replicators are online and new cross-site services appear in the pre-existing clusters

shell> trepctl -service {service} status
shell> trepctl services

Place entire cluster back into AUTOMATIC

shell> cctrl
cctrl> use {composite-dataservice}
cctrl> set policy automatic
cctrl> ls
COORDINATOR[db2:AUTOMATIC:ONLINE]
   london:COORDINATOR[db5:AUTOMATIC:ONLINE]
   nyc:COORDINATOR[db2:AUTOMATIC:ONLINE]
cctrl> exit

3.5. Deploying Composite Dynamic Active/Active

Composite Dynamic Active/Active builds on the foundation of the Composite Active/Active topology and the cluster continues to operate and be configured in the same way.

The difference is, with Composite Dynamic Active/Active, the cluster instructs the Proxy layer to behave like a Composite Active/Passive cluster.

Within your configuration you specify write affinity to a single cluster, meaning that all reads will continue to balance between local replicas, but all writes will be directed to only one cluster.

The diagram below shows how a Composite Dynamic Active/Active would behave in a typical 2-cluster configuration.

Figure 3.5. Topologies: Composite Dynamic Active/Active Clusters

Topologies: Composite Dynamic Active/Active Clusters

The benefit of a Composite Dynamic Active/Active cluster and being able to direct writes to only one cluster, avoids all the inherent risks of a true Active/Active deployment, such as conflicts when the same row is altered in both clusters.

This is especially useful for deployments that do not have the ability to avoid potential conflicts programmatically.

The additional benefit this topology offers is instant failover of writes in the event of a cluster failure. In Composite Dynamic Active/Active if the cluster with write affinity fails, writes instantly failover to the other cluster, and because that cluster is open for writes, applications will continue uninterrupted. This differs from a Composite Active/Passive where in the event of a cluster failure there needs to be a manual failover process to re-route write operations.

3.5.1. Enabling Composite Dynamic Active/Active

To use Composite Dynamic Active/Active you need to have a Composite Active/Active cluster deployed, then it is simply a case of specifying the required affinity within the connectors.

For the purpose of this example we will assume we have two clusters alpha and beta. Each cluster will have two connectors and it is desired that the alpha cluster be the primary write destination.

Within the configuration for the connectors, add the following:

=> On alpha nodes:
connector-write-affinity=alpha
connector-read-affinity=alpha

=> On beta nodes:
connector-write-affinity=alpha
connector-read-affinity=beta

This will have the effect of setting the write affinity to the alpha cluster primarily on both alpha and beta clusters as follows:

  • alpha cluster will get both read and write affinity to alpha

  • beta cluster will get write affinity to alpha, but maintain read affinity to beta

After recovering a failed site

As outlined above, if the site that has write affinity fails, read-write traffic will failover to another site based on the affinity rules configured. Following recovery of the site that is configured as the primary write site, new connections will follow the write affinity rules, whereas existing connections will remain on the site that was promoted after failover.

To maintain data-integrity and to ensure writes continue to only be directed to a single site, it is therefore essential to also enable the following tpm property:

--connector-reset-when-affinity-back=true

With this enabled, following recovery of the primary write site, all connections (new and old) will revert to the original, intended, cluster configured with primary write affinity.

In the case of the alpha cluster failing, the writes will failover and redirect to the beta cluster.

Testing DAA in Bridge Mode

When using Bridge mode (the default at install), all requests are routed to the Primary by default. To test query routing, run the following query when connected through the Connector:

Route to the Primary:
mysql> select @@hostname;

In Bridge mode, the only way to verify that reads are being directed to replicas is to establish a read-only port and execute queries through it to force the QoS RO_RELAXED.

First, ensure that your INI file has the following option, then run tpm update

connector-readonly-listen-port=3307

To test, ensure you connect to the specified read-only port:

Route to a Replica:
shell> mysql -h... -P3307
mysql> select @@hostname;
Testing DAA in Proxy Mode with No R/W Splitting Enabled

To test Connector query routing in Proxy mode, you may use the URL-based R/W splitting to test query routing:

Route to the Primary:
shell> mysql -h... -Dtest@qos=RW_STRICT -e "select @@hostname;"

Route to a Replica:
shell> mysql -h... -Dtest@qos=RO_RELAXED -e "select @@hostname;"
Testing DAA in Proxy Mode with R/W Splitting Enabled (SmartScale or @direct)

To test Connector query routing in Proxy mode when either SmartScale or @direct read/write splitting has been enabled, you may use the following:

Route to the Primary:
mysql> select @@hostname for update;

Route to a Replica:
mysql> select @@hostname;

Manual Site-Level Switch

For DAA to work properly, all writes must go to one cluster or another, no exceptions. When you want to move all writes to another site/cluster (like you would in a Composite Active/Passive cluster using the switch command at the composite level), there is no switch command available in Dynamic Active/Active.

As of version 7.0.2, we strongly recommend that you use the cctrl command datasource SERVICE drain [optional timeout in seconds] at the composite level to shun the currently selected Active cluster. This will allow the Connector to finish (drain) all in-flight queries, shun the composite dataservice once fully drained, and then move all writes to another cluster.

Please note that this is different than using the cctrl command datasource SERVICE shun (available prior to version 7.0.2) at the composite level to shun the currently selected Active cluster. Using shun instead of drain will force the Connector to immediately sever/terminate all in-flight queries, then move all writes to another cluster.

shell> cctrl -multi
Tungsten Clustering 7.0.2 build 145
beta: session established, encryption=true, authentication=true
jgroups: encrypted, database: encrypted
[LOGICAL] / > use world
[LOGICAL] /world > datasource alpha drain 30

WARNING: This is an expert-level command:
Incorrect use may cause data corruption
or make the cluster unavailable.

Do you want to continue? (y/n)> y
composite data source 'alpha' is now SHUNNED
[LOGICAL] /world > exit
Exiting...

When you are ready to resume writes to the originally-configured site, use the composite-level cctrl command datasource SERVICE welcome. If you have set --connector-reset-when-affinity-back=true, then writes will move back to the original site. If set to false, the writes will stay where they are.

shell> cctrl -multi
Tungsten Clustering 7.0.2 build 145
beta: session established, encryption=true, authentication=true
jgroups: encrypted, database: encrypted
[LOGICAL] / > use world
[LOGICAL] / > datasource alpha welcome

WARNING: This is an expert-level command:
Incorrect use may cause data corruption
or make the cluster unavailable.

Do you want to continue? (y/n)> y
composite data source 'alpha' is now ONLINE
[LOGICAL] /world > exit
Exiting...

For more information about the datasource shun command, please visit: Section 9.1.3.5.9, “cctrl datasource shun Command”

For more information about the datasource drain command, please visit: Section 9.1.3.5.3, “cctrl datasource drain Command”

3.6. Deploying Tungsten Connector Only

An independent Tungsten Connector installation can be useful when you want to create a connector service that provides HA and load balancing, but which operates independently of the main cluster. Specifically, this solution is used within disaster recovery and multi-site operations where the connector may be operating across site-boundaries independently of the dataservice at each site.

The independent nature is in terms of the configuration of the overall service through tpm; an independent connector configured to communicate with existing cluster hosts will be managed by the managers of the cluster. But, the connector will not be updated when performing a tpm update operation within the configured cluster. This allows the connector to work through upgrade procedures to minimize downtime.

To create an independent connector, tpm is used to create a definition for a cluster including the datasources, and specifying only a single connector host, then installing Tungsten Cluster on only the connector host. Failure to configure in this way, and tpm will install a full Tungsten Cluster service across all the implied members of the cluster.

  1. Install the Tungsten Cluster package or download the Tungsten Cluster tarball, and unpack it:

    shell> cd /opt/continuent/software
    shell> tar zxf tungsten-clustering-7.1.4-10.tar.gz
  2. Change to the Tungsten Cluster directory:

    shell> cd tungsten-clustering-7.1.4-10
  3. Run tpm to perform the installation, using either the staging method or the INI method. Review Section 10.1, “Comparing Staging and INI tpm Methods” for more details on these two methods.

    Click the link below to switch examples between Staging and INI methods

    Show Staging

    Show INI

    shell> ./tools/tpm configure defaults \
        --reset \
        --user=tungsten \
        --profile-script=~/.bashrc \
        --application-user=app-_user \
        --application-password=secret \
        --application-port=3306 \
        --replication-port=13306 \
        --install-directory=/opt/continuent
    
    shell> ./tools/tpm configure alpha \
        --connectors=connectorhost1 \
        --master=host1 \
        --members=host1,host2,host3
    
    shell> vi /etc/tungsten/tungsten.ini
    [defaults]
    user=tungsten
    profile-script=~/.bashrc
    application-user=app-_user
    application-password=secret
    application-port=3306
    replication-port=13306
    install-directory=/opt/continuent
    
    [alpha]
    connectors=connectorhost1
    master=host1
    members=host1,host2,host3
    

    Configuration group defaults

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Configuration group alpha

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    The above creates a configuration specifying the datasources, host{1,2,3}, and a single connector host based on the hostname of the installation host. Note that the application and datasource port configuration are the same as required by a typical Tungsten Cluster configuration. The values above are identical to those used in Section 3.1, “Deploying Standalone HA Clusters” deployment.

  4. Run tpm to install the software with the configuration.

    shell > ./tools/tpm install

    During the startup and installation, tpm will notify you of any problems that need to be fixed before the service can be correctly installed and started. If the service starts correctly, you should see the configuration and current status of the service.

  5. Initialize your PATH and environment.

    shell > source /opt/continuent/share/env.sh

  6. Start the connector service:

    shell> connector start

Once started:

  • The connector will appear, and be managed by, any manager host using the cctrl tool. For example:

    [LOGICAL] /dsone > ls
    
    COORDINATOR[host1:AUTOMATIC:ONLINE]
    
    ROUTERS:
    +----------------------------------------------------------------------------+
    |connector@connector2[16019](ONLINE, created=0, active=0)                    |
    |connector@host1[18450](ONLINE, created=19638, active=0)                     |
    |connector@host2[1995](ONLINE, created=0, active=0)                          |
    |connector@host3[8895](ONLINE, created=0, active=0)                          |
    +----------------------------------------------------------------------------+
    ...
  • The active status of the connector can be monitored using cctrl as normal.

  • Updates to the main cluster will not update the Tungsten Cluster of the standalone connector. The standalone must be updated independently of the remainder of the Tungsten Cluster dataservice.

  • Connector can be accessed using the connector host and specified port:

    shell> mysql -utungsten -p -hconnector -P3306
  • The user.map authorization file must be created and managed separately on standalone connectors. For more information, see Section 7.6, “User Authentication”

3.7. Deploying Additional Datasources, Managers, or Connectors

3.7.1. Adding Datasources to an Existing Deployment

  1. Ensure the new host that is being added has been configured following the Appendix B, Prerequisites.

  2. Update the configuration using tpm, adding the new host to the list of --members, --hosts, and --connectors, if applicable.

    If using the staging method of deployment, you can use +=, which appends the host to the existing deployment as shown in the example below. Click the link to switch between staging and ini type deployment examples.

    Show Staging

    Show INI

    shell> tpm query staging
    tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
    The staging USER is tungsten
    
    shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
    The staging HOST is db1
    
    shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
    The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> ssh {STAGING_USER}@{STAGING_HOST}
    shell> cd {STAGING_DIRECTORY}
    shell> ./tools/tpm configure alpha \
        --members+=host4 \
        --hosts+=host4 \
        --connectors+=host4 \
    

    Run the tpm command to update the software with the Staging-based configuration:

    shell> ./tools/tpm update --no-connectors

    For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

    shell> vi /etc/tungsten/tungsten.ini
    [alpha]
    ...
    members=host1,host2,host3,host4
    hosts=host1,host2,host3,host4
    connectors=host1,host2,host3,host4
    

    Run the tpm command to update the software with the INI-based configuration:

    shell> tpm query staging
    tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
    The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> cd {STAGING_DIRECTORY}
    
    shell> ./tools/tpm update --no-connectors

    For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

  3. Using the --no-connectors option updates the current deployment without restarting the existing connectors.

  4. Initially, the newly added host will attempt to read the information from the existing THL. If the full THL is not available from the Primary, the new Replica will need to be reprovisioned:

    1. Log into the new host.

    2. Execute tprovision to read the information from an existing Replica and overwrite the data within the new host:

      shell> tprovision --source=host2
      NOTE >>Put alpha replication service offline
      NOTE >>Create a mysqldump backup of host2 in /opt/continuent/backups/provision_mysqldump_2019-01-17_17-27_96
      NOTE >>host2>>Create mysqldump in /opt/continuent/backups/provision_mysqldump_2019-01-17_17-27_96/provision.sql.gz
      NOTE >>Load the mysqldump file
      NOTE >>Put the alpha replication service online
      NOTE >>Clear THL and relay logs for the alpha replication service

Once the new host has been added and re-provision, check the status in cctrl:

[LOGICAL] /alpha > ls

COORDINATOR[host1:AUTOMATIC:ONLINE]

ROUTERS:
+----------------------------------------------------------------------------+
|connector@host1[11401](ONLINE, created=0, active=0)                         |
|connector@host2[8756](ONLINE, created=0, active=0)                          |
|connector@host3[21673](ONLINE, created=0, active=0)                         |
+----------------------------------------------------------------------------+

DATASOURCES:
+----------------------------------------------------------------------------+
|host1(master:ONLINE, progress=219, THL latency=1.047)                       |
|STATUS [OK] [2018/12/13 04:16:17 PM GMT]                                    |
+----------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                     |
|  REPLICATOR(role=master, state=ONLINE)                                     |
|  DATASERVER(state=ONLINE)                                                  |
|  CONNECTIONS(created=0, active=0)                                          |
+----------------------------------------------------------------------------+

+----------------------------------------------------------------------------+
|host2(slave:ONLINE, progress=219, latency=1.588)                            |
|STATUS [OK] [2018/12/13 04:16:17 PM GMT]                                    |
+----------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                     |
|  REPLICATOR(role=slave, master=host1, state=ONLINE)                        |
|  DATASERVER(state=ONLINE)                                                  |
|  CONNECTIONS(created=0, active=0)                                          |
+----------------------------------------------------------------------------+

+----------------------------------------------------------------------------+
|host3(slave:ONLINE, progress=219, latency=2.021)                            |
|STATUS [OK] [2018/12/13 04:16:18 PM GMT]                                    |
+----------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                     |
|  REPLICATOR(role=slave, master=host1, state=ONLINE)                        |
|  DATASERVER(state=ONLINE)                                                  |
|  CONNECTIONS(created=0, active=0)                                          |
+----------------------------------------------------------------------------+

+----------------------------------------------------------------------------+
|host4(slave:ONLINE, progress=219, latency=1.000)                            |
|STATUS [OK] [2019/01/17 05:28:54 PM GMT]                                    |
+----------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                     |
|  REPLICATOR(role=slave, master=host1, state=ONLINE)                        |
|  DATASERVER(state=ONLINE)                                                  |
|  CONNECTIONS(created=0, active=0)                                          |
+----------------------------------------------------------------------------+

If the host has not come up, or the progress does not match the Primary, check Section 6.6, “Datasource Recovery Steps” for more information on determining the exact status and what steps to take to enable the host operation.

3.7.2. Adding Active Witnesses to an Existing Deployment

To add active witnesses to an Existing Deployment, use tpm to update the configuration, adding the list of active witnesses and the list of all members within the updated dataservice configuration.

Active Witness hosts must have been prepared using the notes provided in Appendix B, Prerequisites. Active witnesses must be ble to resolve the hostnames of the other managers and hosts in the dataservice. Installation will fail if prerequisities and host availability and stability cannot be confirmed.

Update the configuration using tpm, adding the new host to the list of members

If using the staging method of deployment, you can use +=, which appends the host to the existing deployment as shown in the example below. Click the link to switch between staging and ini type deployment examples.

Show Staging

Show INI

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
The staging USER is tungsten

shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
The staging HOST is db1

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> ssh {STAGING_USER}@{STAGING_HOST}
shell> cd {STAGING_DIRECTORY}
shell> ./tools/tpm configure alpha \
    --members+=host4 \
    --witnesses=host4 \
    --enable-active-witnesses=true \

Run the tpm command to update the software with the Staging-based configuration:

shell> ./tools/tpm update --no-connectors

For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

shell> vi /etc/tungsten/tungsten.ini
[alpha]
...
members=host1,host2,host3,host4
witnesses=host4
enable-active-witnesses=true

Run the tpm command to update the software with the INI-based configuration:

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> cd {STAGING_DIRECTORY}

shell> ./tools/tpm update --no-connectors

For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

Using the --no-connectors option updates the current deployment without restarting the existing connectors.

Once installation has completed successfully, and the manager service has started on each configured active witness, the status can be determined using ls within cctrl:

[LOGICAL] /alpha > ls                  

COORDINATOR[host1:AUTOMATIC:ONLINE]

ROUTERS:
+----------------------------------------------------------------------------+
|connector@host1[20446](ONLINE, created=0, active=0)                         |
|connector@host2[21698](ONLINE, created=0, active=0)                         |
|connector@host3[30354](ONLINE, created=0, active=0)                         |
+----------------------------------------------------------------------------+

DATASOURCES:
+----------------------------------------------------------------------------+
|host1(slave:ONLINE, progress=8946, latency=0.000)                           |
|STATUS [OK] [2018/12/05 04:27:47 PM GMT]                                    |
+----------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                     |
|  REPLICATOR(role=slave, master=host3, state=ONLINE)                        |
|  DATASERVER(state=ONLINE)                                                  |
|  CONNECTIONS(created=0, active=0)                                          |
+----------------------------------------------------------------------------+

+----------------------------------------------------------------------------+
|host2(slave:ONLINE, progress=8946, latency=0.334)                           |
|STATUS [OK] [2018/12/05 04:06:59 PM GMT]                                    |
+----------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                     |
|  REPLICATOR(role=slave, master=host3, state=ONLINE)                        |
|  DATASERVER(state=ONLINE)                                                  |
|  CONNECTIONS(created=0, active=0)                                          |
+----------------------------------------------------------------------------+

+----------------------------------------------------------------------------+
|host3(master:ONLINE, progress=8946, THL latency=0.331)                      |
|STATUS [OK] [2018/11/20 05:39:14 PM GMT]                                    |
+----------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                     |
|  REPLICATOR(role=master, state=ONLINE)                                     |
|  DATASERVER(state=ONLINE)                                                  |
|  CONNECTIONS(created=0, active=0)                                          |
+----------------------------------------------------------------------------+


WITNESSES:
+----------------------------------------------------------------------------+
|host4(witness:ONLINE)                                                       |
+----------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                     |
+----------------------------------------------------------------------------+

Validation of the cluster with the new witnesses can be verified by using the cluster validate command within cctrl.

3.7.3. Replacing an Active Witness as a Full Cluster Node

This section explains the simple process for converting an Active Witness into a full cluster node. This process can be used to either convert the existig node or replace the witness with a new node.

  1. First, place the cluster into MAINTENANCE mode.

    shell> cctrl
    cctrl> set policy maintenance

  2. Stop the software on the existing Witness node

    shell> stopall

  3. Whether you are converting this host, or adding a new host, ensure any additional pre-requisities that are needed for a full cluster node are in place, for example MySQL has been installed.

  4. INI Install

    If you are using an ini file for configuration, update the ini on all nodes (including connectors) removing the witness properties and placing the new host as part of the cluster configuration, example below. Skip to Staging Install further down for Staging steps.

    Before:

    [defaults]
    user=tungsten
    home-directory=/opt/continuent
    application-user=app_user
    application-password=secret
    application-port=3306
    profile-script=~/.bash_profile
    replication-user=tungsten
    replication-password=secret
    replication-port=13306
    mysql-allow-intensive-checks=true
    
    [nyc]
    enable-active-witnesses=true
    topology=clustered
    master=db1
    members=db1,db2,db3
    witnesses=db3
    connectors=db1,db2,db3

    After:

    [defaults]
    user=tungsten
    home-directory=/opt/continuent
    application-user=app_user
    application-password=secret
    application-port=3306
    profile-script=~/.bash_profile
    replication-user=tungsten
    replication-password=secret
    replication-port=13306
    mysql-allow-intensive-checks=true
    
    [nyc]
    topology=clustered
    master=db1
    members=db1,db2,db3
    connectors=db1,db2,db3

  5. Update the software on the existing cluster nodes and connector nodes (If separate). Include --no-connectors if connectors you want to manually restart them when convenient.

    shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
    shell> tools/tpm update --replace-release

  6. Either install on the new host or update on the previous Witness host:

    shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
    shell> tools/tpm install

    or:

    shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
    shell> tools/tpm update --replace-release -f

  7. Staging Install

    If you are using a staging configuration, update the configuration from the staging host, example below:

    shell> cd {STAGING_DIRECTORY}
    
    ./tools/tpm configure defaults \
    --reset \
    --user=tungsten \
    --home-directory=/opt/continuent \
    --application-user=app_user \
    --application-password=secret \
    --application-port=3306 \
    --profile-script=~/.bash_profile \
    --replication-user=tungsten \
    --replication-password=secret \
    --replication-port=13306 \
    --mysql-allow-intensive-checks=true
    
    ./tools/tpm configure nyc \
    --topology=clustered \
    --master=db1 \
    --members=db1,db2,db3 \
    --connectors=db1,db2,db3

  8. Update the software on the existing cluster nodes. Include --no-connectors if connectors co-exist on database nodes and you want to manually restart them when convenient.

    shell> cd {STAGING_DIRECTORY}
    shell> tools/tpm update --replace-release --hosts=db1,db2

  9. Either install on the new host or update on the previous Witness host:

    shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
    shell> tools/tpm install --hosts=db3

    or:

    shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
    shell> tools/tpm update --replace-release -f --hosts=db3

  10. Once the software has been installed you now need to restore a backup of the database onto the node, or provision the database using the provided scripts. Either restore a backup, create and restore a new backup or use tprovision to restore the database on the host.

  11. Start the software on the new node/old witness node

    shell> startall

  12. If you issued --no-connectors during the update, restart the connectors when convenient

    shell> connector restart

  13. Check within cctrl from one of the existing database nodes to check that the status returns the exptected output, if it does, return the cluster to AUTOMATIC and the process is complete. If the output is not correct, this is usually due to metadata files not updating, therefore on every node, issue the following:

    shell> tungsten_reset_manager

    This will clean the metadata files and stop the manager process. Once the script has completed on all nodes, restart the manager process on each node, one-by-one, starting with the Primary node first, followed by the Replicas

    shell> manager start

    Finally, return the cluster to AUTOMATIC. If the reset process above was performed, it may take a minute or two for the ls output of cctrl to update whilst the metadata files are refreshed.

3.7.4. Replacing a Full Cluster Node as an Active Witness

This section explains the simple process for converting a full cluster node into an Active Witness.

  1. First, place the cluster into MAINTENANCE mode.

    shell> cctrl
    cctrl> set policy maintenance

  2. Stop the software on the existing cluster node

    shell> stopall

  3. Stop MySQL on the existing cluster node (Syntax is an example and may differ in your environment)

    shell> systemctl stop mysqld

  4. INI Install

    If you are using an ini file for configuration, update the ini on all nodes (including connectors) changind the reference to the node to be a witness node, example below. Skip to Staging Install further down for Staging steps.

    Before:

    [defaults]
    user=tungsten
    home-directory=/opt/continuent
    application-user=app_user
    application-password=secret
    application-port=3306
    profile-script=~/.bash_profile
    replication-user=tungsten
    replication-password=secret
    replication-port=13306
    mysql-allow-intensive-checks=true
    
    [nyc]
    topology=clustered
    master=db1
    members=db1,db2,db3
    connectors=db1,db2,db3

    After:

    [defaults]
    user=tungsten
    home-directory=/opt/continuent
    application-user=app_user
    application-password=secret
    application-port=3306
    profile-script=~/.bash_profile
    replication-user=tungsten
    replication-password=secret
    replication-port=13306
    mysql-allow-intensive-checks=true
    
    [nyc]
    enable-active-witnesses=true
    topology=clustered
    master=db1
    members=db1,db2,db3
    witnesses=db3
    connectors=db1,db2,db3

  5. Update the software on the existing cluster nodes and connector nodes (If separate). Include --no-connectors if connectors you want to manually restart them when convenient.

    shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
    shell> tools/tpm update --replace-release

  6. Update on the host you are converting:

    shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
    shell> tools/tpm update --replace-release -f

  7. Staging Install

    If you are using a staging configuration, update the configuration from the staging host, example below:

    shell> cd {STAGING_DIRECTORY}
    
    ./tools/tpm configure defaults \
    --reset \
    --user=tungsten \
    --home-directory=/opt/continuent \
    --application-user=app_user \
    --application-password=secret \
    --application-port=3306 \
    --profile-script=~/.bash_profile \
    --replication-user=tungsten \
    --replication-password=secret \
    --replication-port=13306 \
    --mysql-allow-intensive-checks=true
    
    ./tools/tpm configure nyc \
    --enable-active-witnesses=true \
    --topology=clustered \
    --master=db1 \
    --members=db1,db2,db3 \
    --witnesses=db3 \
    --connectors=db1,db2,db3

  8. Update the software on the existing cluster nodes. Include --no-connectors if connectors co-exist on database nodes and you want to manually restart them when convenient.

    shell> cd {STAGING_DIRECTORY}
    shell> tools/tpm update --replace-release --hosts=db1,db2

  9. Update on the host you are converting:

    shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
    shell> tools/tpm update --replace-release -f --hosts=db3

  10. Once the updates have been complete, you should then run the tungsten_reset_manager command on each node in the entire cluster. This will ensure the metadata is clean and reference to the node is reflected to be a witness, rather than a full cluter node. On each node, simply execute the command and follow the on screen prompts:

    shell> tungsten_reset_manager

  11. Restart the managers on the nodes you have not converted:

    shell> manager start

  12. Start the software on the node that you converted:

    shell> startall

  13. If you issued --no-connectors during the update, restart the connectors when convenient

    shell> connector restart

  14. Check within cctrl from one of the existing database nodes to check that the status returns the exptected output, and then return the cluster to AUTOMATIC and the process is complete.

3.7.5. Adding Connectors to an Existing Deployment

Adding more connectors to an existing installation allows for increased routing capacity. The new connectors will form part of the cluster and be fully aware and communicate with existing managers and datasources within the cluster.

To add more connectors to an existing deployment:

  1. On the new host, ensure the Appendix B, Prerequisites have been followed.

  2. Update the configuration using tpm, adding the new host to the list of connectors

    If using the staging method of deployment, you can use +=, which appends the host to the existing deployment as shown in the example below. Click the link to switch between staging and ini type deployment examples.

    Show Staging

    Show INI

    shell> tpm query staging
    tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
    The staging USER is tungsten
    
    shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
    The staging HOST is db1
    
    shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
    The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> ssh {STAGING_USER}@{STAGING_HOST}
    shell> cd {STAGING_DIRECTORY}
    shell> ./tools/tpm configure alpha \
        --connectors+=host4 \
    

    Run the tpm command to update the software with the Staging-based configuration:

    shell> ./tools/tpm update --no-connectors

    For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

    shell> vi /etc/tungsten/tungsten.ini
    [alpha]
    ...
    connectors=host4
    

    Run the tpm command to update the software with the INI-based configuration:

    shell> tpm query staging
    tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
    The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> cd {STAGING_DIRECTORY}
    
    shell> ./tools/tpm update --no-connectors

    For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

  3. Using the --no-connectors option updates the current deployment without restarting the existing connectors.

  4. During a period when it is safe to restart the connectors:

    shell> ./tools/tpm promote-connector

The status of all the connectors can be monitored using cctrl:

[LOGICAL] /alpha > ls

COORDINATOR[host1:AUTOMATIC:ONLINE]

ROUTERS:
+----------------------------------------------------------------------------+
|connector@host1[8616](ONLINE, created=0, active=0)                          |
|connector@host2[12381](ONLINE, created=0, active=0)                         |
|connector@host3[19708](ONLINE, created=0, active=0)                         |
|connector@host4[5085](ONLINE, created=0, active=0)                          |
+----------------------------------------------------------------------------+

3.7.6. Converting from a single cluster to a composite cluster

There are two possible scenarios for converting from a single standalone cluster to a composite cluster. The two following sections will guide you through examples of each of these.

3.7.6.1. Convert and add new nodes as a new service

The following steps guide you through updating the configuration to include the new hosts as a new service and convert to a Composite Cluster.

For the purpose of this worked example, we have a single cluster dataservice called east with three nodes, defined as db1, db2 and db3 with db1 as the Primary.

Our goal is to create a new cluster dataservice called west with three nodes, defined as db4, db5 and db6 with db4 as the relay.

We will configure a new composite dataservice called global

The steps show two alternative approaches, to create the west as a Passive cluster (Composite Active/Passive) or to create the west cluster as a second active cluster (Composite Active/Active)

  1. On the new host(s), ensure the Appendix B, Prerequisites have been followed.

    If configuring via the Staging Installation method, skip straight to Step 4:

    Note

    The staging method CANNOT be used if converting to an Active/Active cluster

  2. On the new host(s), ensure the tungsten.ini contains the correct service blocks for both the existing cluster and the new cluster.

  3. On the new host(s), install the proper version of clustering software, ensuring that the version being installed matches the version currently installed on the existing hosts.

    shell> cd /opt/continuent/sofware
    shell> tar zxvf tungsten-clustering-7.1.4-10.tar.gz
    shell> cd tungsten-clustering-7.1.4-10
    shell> ./tools/tpm install

    Important

    Ensure --start-and-report is set to false in the configuration for the new hosts.

  4. Set the existing cluster to maintenance mode using cctrl:

    shell> cctrl
    [LOGICAL] / > set policy maintenance
  5. Add the definition for the new cluster service west and composite service global to the existing configuration on the existing host(s):

    For Composite Active/Passive

    Show Staging

    Show INI

    shell> tpm query staging
    tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
    The staging USER is tungsten
    
    shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
    The staging HOST is db1
    
    shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
    The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> ssh {STAGING_USER}@{STAGING_HOST}
    shell> cd {STAGING_DIRECTORY}
    shell> ./tools/tpm configure west \
        --connectors=db4,db5,db6 \
        --relay-source=east \
        --relay=db4 \
        --slaves=db5,db6 \
        --topology=clustered
    
    shell> ./tools/tpm configure global \
        --composite-datasources=east,west
    

    Run the tpm command to update the software with the Staging-based configuration:

    shell> ./tools/tpm update --no-connectors --replace-release

    For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

    shell> vi /etc/tungsten/tungsten.ini
    [west]
    ...
    connectors=db4,db5,db6
    relay-source=east
    relay=db4
    slaves=db5,db6
    topology=clustered
    
    [global]
    ...
    composite-datasources=east,west
    

    Run the tpm command to update the software with the INI-based configuration:

    shell> tpm query staging
    tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
    The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> cd {STAGING_DIRECTORY}
    
    shell> ./tools/tpm update --no-connectors --replace-release

    For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

    For Composite Active/Active

    shell> vi /etc/tungsten/tungsten.ini
    [west]
    topology=clustered
    connectors=db4,db5,db6
    master=db4
    members=db4,db5,db6
    
    [global]
    topology=composite-multi-master
    composite-datasources=east,west
    shell> tpm query staging
    tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
    The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10
    
    shell> cd {STAGING_DIRECTORY}
    
    shell> ./tools/tpm update --no-connectors --replace-release

    Note

    Using the optional --no-connectors option updates the current deployment without restarting the existing connectors.

    Note

    Using the --replace-release option ensures the metadata files for the cluster are correctly rebuilt. This parameter MUST be supplied.

  6. On every node in the original EAST cluster, make sure all replicators are online:

    shell> trepctl services
    shell> trepctl -all-services online
  7. On all the new hosts in the new cluster, start the manager processes ONLY

    shell> manager start
  8. From the original cluster, use cctrl to check that the new dataservice and composite dataservice have been created, and place the new dataservice into maintenance mode

    shell> cctrl
    cctrl> cd /
    cctrl> ls
    cctrl> use global
    cctrl> ls
    cctrl> datasource east online
    cctrl> set policy maintenance

    Example from a Composite Active/Passive Cluster

    tungsten@db1:~  $ cctrl
    Tungsten Clustering 7.1.4 build 10
    east: session established, encryption=false, authentication=false
    
    [LOGICAL] /east > cd /
    [LOGICAL] / > ls
    
    [LOGICAL] / > ls
    global
      east
      west
    
    [LOGICAL] / > use global
    [LOGICAL] /global > ls
    COORDINATOR[db3:MIXED:ONLINE]
       east:COORDINATOR[db3:MAINTENANCE:ONLINE]
       west:COORDINATOR[db5:AUTOMATIC:ONLINE]
    
    ROUTERS:
    +---------------------------------------------------------------------------------+
    |connector@db1[9493](ONLINE, created=0, active=0)                                 |
    |connector@db2[9341](ONLINE, created=0, active=0)                                 |
    |connector@db3[10675](ONLINE, created=0, active=0)                                |
    +---------------------------------------------------------------------------------+
    
    DATASOURCES:
    +---------------------------------------------------------------------------------+
    |east(composite master:OFFLINE)                                                    |
    |STATUS [OK] [2019/12/09 11:04:17 AM UTC]                                         |
    +---------------------------------------------------------------------------------+
    +---------------------------------------------------------------------------------+
    |west(composite slave:OFFLINE)                                                  |
    |STATUS [OK] [2019/12/09 11:04:17 AM UTC]                                         |
    +---------------------------------------------------------------------------------+
    
    REASON FOR MAINTENANCE MODE: MANUAL OPERATION
    
    [LOGICAL] /global > datasource east online
    composite data source 'east@global' is now ONLINE
    
    [LOGICAL] /global > set policy maintenance
    policy mode is now MAINTENANCE

    Example from a Composite Active/Active Cluster

    tungsten@db1:~  $ cctrl
    Tungsten Clustering 7.1.4 build 10
    east: session established, encryption=false, authentication=false
    
    [LOGICAL] /east > cd /
    [LOGICAL] / > ls
    
    [LOGICAL] / > ls
    global
      east
        east_from_west
      west
        west_from_east
    
    [LOGICAL] / > use global
    [LOGICAL] /global > ls
    COORDINATOR[db3:MIXED:ONLINE]
       east:COORDINATOR[db3:MAINTENANCE:ONLINE]
       west:COORDINATOR[db4:AUTOMATIC:ONLINE]
    
    ROUTERS:
    +---------------------------------------------------------------------------------+
    |connector@db1[23431](ONLINE, created=0, active=0)                                |
    |connector@db2[25535](ONLINE, created=0, active=0)                                |
    |connector@db3[15353](ONLINE, created=0, active=0)                                |
    +---------------------------------------------------------------------------------+
    
    DATASOURCES:
    +---------------------------------------------------------------------------------+
    |east(composite master:OFFLINE, global progress=10, max latency=1.043)            |
    |STATUS [OK] [2024/08/13 11:05:01 AM UTC]                                         |
    +---------------------------------------------------------------------------------+
    |  east(master:ONLINE, progress=10, max latency=1.043)                            |
    |  east_from_west(UNKNOWN:UNKNOWN, progress=-1, max latency=-1.000)               |
    +---------------------------------------------------------------------------------+
    +---------------------------------------------------------------------------------+
    |west(composite master:ONLINE, global progress=-1, max latency=-1.000)            |
    |STATUS [OK] [2024/08/13 11:07:56 AM UTC]                                         |
    +---------------------------------------------------------------------------------+
    |  west(UNKNOWN:UNKNOWN, progress=-1, max latency=-1.000)                         |
    |  west_from_east(UNKNOWN:UNKNOWN, progress=-1, max latency=-1.000)               |
    +---------------------------------------------------------------------------------+
    
    REASON FOR MAINTENANCE MODE: MANUAL OPERATION
    
    [LOGICAL] /global > datasource east online
    composite data source 'east@global' is now ONLINE
    
    [LOGICAL] /global > set policy maintenance
    policy mode is now MAINTENANCE
  9. Start the replicators in the new cluster ensuring they start as OFFLINE:

    shell> replicator start offline
  10. Go to the relay (or Primary) node of the new cluster (i.e. db4) and provision it from a Replica of the original cluster (i.e. db2):

    Provision the new relay in a Composite Active/Passive Cluster

    db4-shell> tprovision -s db2

    Provision the new primary in a Composite Active/Active Cluster

    db4-shell> tprovision -s db2 -c
  11. Go to each Replica node of the new cluster and provision from the relay node of the new cluster (i.e. db4):

    db5-shell> tprovision -s db4
  12. Bring the replicators in the new cluster online, if not already:

    shell> trepctl -all-services online
  13. From a node in the original cluster (e.g. db1), using cctrl, set the composite cluster online, if not already, and return to automatic:

    shell> cctrl
    [LOGICAL] / > use global
    [LOGICAL] / > datasource west online
    [LOGICAL] / > set policy automatic
  14. Start the connectors associated with the new cluster hosts in west:

    shell> connector start

    Note

    Depending on the mode in which the connectors are running, you may need to configure the user.map. If this is in use on the old cluster, then we recommend that you take a copy of this file and place this on the new connectors associated with the new cluster, and then adjust any affinity settings that are required. Additionally, the user.map may need adjustments on the original cluster. For more details on the user.map file, it is advised to review the relevant sections in the Connector documentation related to the mode your connectors are operating in. These can be found at Section 7.6.1, “user.map File Format”

  15. If --no-connectors was issued during the update, then during a period when it is safe, restart the connectors associated with the original cluster:

    shell> ./tools/tpm promote-connector

3.7.6.2. Convert and move nodes to a new service

This method of conversion is a little more complicated and the only safe way to accomplish this would require downtime for the replication on all nodes.

To achieve this without downtime to your applications, it is recommended that all application activity be isolated to the Primary host only. Following the conversion, all activity will then be replicated to the Replica nodes

Our example starting cluster has 5 nodes (1 Primary and 4 Replicas) and uses service name alpha. Our target cluster will have 6 nodes (3 per cluster) in 2 member clusters alpha_east and alpha_west in composite service alpha.

This means that we will reuse the existing service name alpha as the name of the new composite service, and create two new service names, one for each cluster (alpha_east and alpha_west).

To convert the above configuration, follow the steps below:

  1. On the new host, ensure the Appendix B, Prerequisites have been followed.

  2. Ensure the cluster is in MAINTENANCE mode. This will prevent the managers from performing any unexpected recovery or failovers during the process.

    cctrl> set policy maintenance
  3. Next, you must stop all services on all existing nodes.

    shell> stopall
  4. If configuring via the INI Installation Method, update tungsten.ini on all original 5 nodes, then copy the file to the new node.

    You will need to create two new services for each cluster, and change the original service stanza to represent the composite service. An example of how the complete configuration would look is below. Click the link the switch between ini and staging configurations.

    Show Staging

    Show INI

    shell> ./tools/tpm configure defaults \
        --reset \
        --user=tungsten \
        --install-directory=/opt/continuent \
        --profile-script=~/.bash_profile \
        --replication-user=tungsten \
        --replication-password=secret \
        --replication-port=13306 \
        --application-user=app_user \
        --application-password=secret \
        --application-port=3306 \
        --rest-api-admin-user=apiuser \
        --rest-api-admin-pass=secret
    
    shell> ./tools/tpm configure alpha_east \
        --topology=clustered \
        --master=db1 \
        --members=db1,db2,db3 \
        --connectors=db1,db2,db3
    
    shell> ./tools/tpm configure alpha_west \
        --topology=clustered \
        --relay=db4 \
        --members=db4,db5,db6 \
        --connectors=db4,db5,db6 \
        --relay-source=alpha_east
    
    shell> ./tools/tpm configure alpha \
        --composite-datasources=alpha_east,alpha_west
    
    shell> vi /etc/tungsten/tungsten.ini
    [defaults]
    user=tungsten
    install-directory=/opt/continuent
    profile-script=~/.bash_profile
    replication-user=tungsten
    replication-password=secret
    replication-port=13306
    application-user=app_user
    application-password=secret
    application-port=3306
    rest-api-admin-user=apiuser
    rest-api-admin-pass=secret
    
    [alpha_east]
    topology=clustered
    master=db1
    members=db1,db2,db3
    connectors=db1,db2,db3
    
    [alpha_west]
    topology=clustered
    relay=db4
    members=db4,db5,db6
    connectors=db4,db5,db6
    relay-source=alpha_east
    
    [alpha]
    composite-datasources=alpha_east,alpha_west
    
  5. Using you preferred backup/restore method, take a backup of the MySQL database on one of the original nodes and restore this to the new node

    If preferred, this step can be skipped, and the provision of the new node completed via the use of the supplied provisioning scripts, explained in Step 10 below.

  6. Invoke the conversion using the tpm command from the software extraction directory.

    If installation configured via the INI method, this command should be run on all 5 original nodes. If configured via Staging method, this command should be run on the staging host only.

    shell> tpm query staging
    shell> cd {software_staging_dir_from_tpm_query}
    shell> ./tools/tpm update --replace-release --force
    shell> rm /opt/continuent/tungsten/cluster-home/conf/cluster/*/datasource/*

    Note

    The use of the --force option is required to force the override of the old properties

  7. Only if installation configured via the INI method, then proceed to install the software using the tpm command from the software extraction directory on the new node:

    shell> cd {software_staging_dir}
    shell> ./tools/tpm install

    Note

    Ensure you install the same version of software on the new node that matches exactly, the version on the existing 5 nodes

  8. Start all services on all existing nodes.

    shell> startall
  9. Bring the clusters back into AUTOMATIC mode:

    shell> cctrl -multi
    cctrl> use alpha
    cctrl> set policy automatic
    cctrl> exit
  10. If you skipped the backup/restore step above, you now need to provision the database on the new node. To do this, use the tungsten_provision_slave script to provision the database from one of the existing nodes, for example db5

    shell> tungsten_provision_slave --source db5

3.8. Replicating Data Into an Existing Dataservice

If you have an existing dataservice, data can be replicated from a standalone MySQL server into the service. The replication is configured by creating a service that reads from the standalone MySQL server and writes into the cluster through a connector attached to your dataservice. By writing through the connector, changes to the underlying dataservice topology can be handled.

Additionally, using a replicator that writes data into an existing data service can be used when migrating from an existing service into a new Tungsten Cluster service.

For more information on initially provisioning the data for this type of operation, see Section 6.12.2, “Migrating from MySQL Native Replication Using a New Service”.

Figure 3.6. Topologies: Replicating into a Dataservice

Topologies: Replicating into a Dataservice

In order to configure this deployment, there are two steps:

  1. Create a new replicator on the source server that extracts the data.

  2. Create a new replicator that reads the binary logs directly from the external MySQL service through the connector

There are also the following requirements:

  • The host on which you want to replicate to must have Tungsten Replicator 5.3.0 or later.

  • Hosts on both the replicator and cluster must be able to communicate with each other.

  • The replication user on the source host must have the RELOAD, REPLICATION SLAVE, and REPLICATION CLIENT GRANT privileges.

  • Replicator must be able to connect as the tungsten user to the databases within the cluster.

  • When writing into the Primary through the connector, the user must be given the correct privileges to write and update the MySQL server. For this reason, the easiest method is to use the tungsten user, and ensure that that user has been added to the user.map:

    tungsten secret alpha

Install the Tungsten Replicator package (see Section 2.3.2, “Using the RPM package files”), or download the compressed tarball and unpack it on host1:

shell> cd /opt/replicator/software
shell> tar zxf tungsten-replicator-7.1.4-10.tar.gz

Change to the Tungsten Replicator staging directory:

shell> cd tungsten-replicator-7.1.4-10

Configure the replicator on host1

First we configure the defaults and a cluster alias that points to the Primaries and Replicas within the current Tungsten Cluster service that you are replicating from:

Click the link below to switch examples between Staging and INI methods

Show Staging

Show INI

shell> ./tools/tpm configure alpha \
    --master=host1 \
    --install-directory=/opt/continuent \
    --replication-user=tungsten \
    --replication-password=password \
    --enable-batch-service=true
shell> vi /etc/tungsten/tungsten.ini
[alpha]
master=host1
install-directory=/opt/continuent
replication-user=tungsten
replication-password=password
enable-batch-service=true

Configuration group alpha

The description of each of the options is shown below; click the icon to hide this detail:

Click the icon to show a detailed description of each argument.

This creates a configuration that specifies that the topology should read directly from the source host, host3, writing directly to host1. An alternative THL port is provided to ensure that the THL listener is not operating on the same network port as the original.

Now install the service, which will create the replicator reading direct from host3 into host1:

shell> ./tools/tpm install

If the installation process fails, check the output of the /tmp/tungsten-configure.log file for more information about the root cause.

Once the installation has been completed, you must update the position of the replicator so that it points to the correct position within the source database to prevent errors during replication. If the replication is being created as part of a migration process, determine the position of the binary log from the external replicator service used when the backup was taken. For example:

mysql> show master status;
*************************** 1. row ***************************
            File: mysql-bin.000026
        Position: 1311
    Binlog_Do_DB: 
Binlog_Ignore_DB: 
1 row in set (0.00 sec)

Use dsctl set to update the replicator position to point to the Primary log position:

shell> /opt/replicator/tungsten/tungsten-replicator/bin/dsctl -service beta set \
    -reset -seqno 0 -epoch 0 \
    -source-id host3 -event-id mysql-bin.000026:1311

Now start the replicator:

shell> /opt/replicator/tungsten/tungsten-replicator/bin/replicator start

Replication status should be checked by explicitly using the servicename and/or RMI port:

shell> /opt/replicator/tungsten/tungsten-replicator/bin/trepctl -service beta status
Processing status command...
NAME                     VALUE
----                     -----
appliedLastEventId     : mysql-bin.000026:0000000000001311;1252
appliedLastSeqno       : 5
appliedLatency         : 0.748
channels               : 1
clusterName            : beta
currentEventId         : mysql-bin.000026:0000000000001311
currentTimeMillis      : 1390410611881
dataServerHost         : host1
extensions             : 
host                   : host3
latestEpochNumber      : 1
masterConnectUri       : thl://host3:2112/
masterListenUri        : thl://host1:2113/
maximumStoredSeqNo     : 5
minimumStoredSeqNo     : 0
offlineRequests        : NONE
pendingError           : NONE
pendingErrorCode       : NONE
pendingErrorEventId    : NONE
pendingErrorSeqno      : -1
pendingExceptionMessage: NONE
pipelineSource         : jdbc:mysql:thin://host3:13306/
relativeLatency        : 8408.881
resourcePrecedence     : 99
rmiPort                : 10000
role                   : master
seqnoType              : java.lang.Long
serviceName            : beta
serviceType            : local
simpleServiceName      : beta
siteName               : default
sourceId               : host3
state                  : ONLINE
timeInStateSeconds     : 8408.21
transitioningTo        : 
uptimeSeconds          : 8409.88
useSSLConnection       : false
version                : Tungsten Replicator 7.1.4 build 10
Finished status command...

3.9. Replicating Data Out of a Cluster

If you have an existing cluster and you want to replicate the data out to a separate standalone server using Tungsten Replicator then you can create a cluster alias, and use a Primary/Replica topology to replicate from the cluster. This allows for THL events from the cluster to be applied to a separate server for the purposes of backup or separate analysis.

Figure 3.7. Topologies: Replicating Data Out of a Cluster

Topologies: Replicating Data Out of a Cluster

During the installation process a cluster-alias and cluster-slave are declared. The cluster-alias describes all of the servers in the cluster and how they may be reached. The cluster-slave defines one or more servers that will replicate from the cluster.

The Tungsten Replicator will be installed on the Cluster-Extractor server. That server will download THL data and apply them to the local server. If the Cluster-Extractor has more than one server; one of them will be declared the relay (or Primary). The other members of the Cluster-Extractor may also download THL data from that server.

If the relay for the Cluster-Extractor fails; the other nodes will automatically start downloading THL data from a server in the cluster. If a non-relay server fails; it will not have any impact on the other members.

3.9.1. Prepare: Replicating Data Out of a Cluster

  1. Identify the cluster to replicate from. You will need the Primary, Replicas and THL port (if specified). Use tpm reverse from a cluster member to find the correct values.

  2. If you are replicating to a non-MySQL server. Update the configuration of the cluster to include the following properties prior to beginning.

    svc-extractor-filters=colnames,pkey
    property=replicator.filter.pkey.addColumnsToDeletes=true
    property=replicator.filter.pkey.addPkeyToInserts=true

  3. Identify all servers that will replicate from the cluster. If there is more than one, a relay server should be identified to replicate from the cluster and provide THL data to other servers.

  4. Prepare each server according to the prerequisites for the DBMS platform it is serving. If you are working with multiple DBMS platforms; treat each platform as a different Cluster-Extractor during deployment.

  5. Make sure the THL port for the cluster is open between all servers.

3.9.2. Deploy: Replicating Data Out of a Cluster

  1. Install the Tungsten Replicator package or download the Tungsten Replicator tarball, and unpack it:

    shell> cd /opt/continuent/software
    shell> tar zxf tungsten-replicator-7.1.4-10.tar.gz
  2. Change to the unpackaged directory:

    shell> cd tungsten-replicator-7.1.4-10
  3. Configure the replicator

    Click the link below to switch examples between Staging and INI methods

    Show Staging

    Show INI

    shell> ./tools/tpm configure defaults \
        --install-directory=/opt/continuent \
        --profile-script=~/.bash_profile \
        --replication-password=secret \
        --replication-port=13306 \
        --replication-user=tungsten \
        --user=tungsten \
        --rest-api-admin-user=apiuser \
        --rest-api-admin-pass=secret
    
    shell> ./tools/tpm configure alpha \
        --master=host1 \
        --slaves=host2,host3 \
        --thl-port=2112 \
        --topology=cluster-alias
    
    shell> ./tools/tpm configure beta \
        --relay=host6 \
        --relay-source=alpha \
        --topology=cluster-slave
    
    shell> vi /etc/tungsten/tungsten.ini
    [defaults]
    install-directory=/opt/continuent
    profile-script=~/.bash_profile
    replication-password=secret
    replication-port=13306
    replication-user=tungsten
    user=tungsten
    rest-api-admin-user=apiuser
    rest-api-admin-pass=secret
    
    [alpha]
    master=host1
    slaves=host2,host3
    thl-port=2112
    topology=cluster-alias
    
    [beta]
    relay=host6
    relay-source=alpha
    topology=cluster-slave
    

    Configuration group defaults

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Configuration group alpha

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Configuration group beta

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    Important

    If you are replicating to a non-MySQL server. Include the following steps in your configuration.

    shell> mkdir -p /opt/continuent/share/
    shell> cp tungsten-replicator/support/filters-config/convertstringfrommysql.json »
       /opt/continuent/share/

    Then, include the following parameters in the configuration

    property=replicator.stage.remote-to-thl.filters=convertstringfrommysql
    property=replicator.filter.convertstringfrommysql.definitionsFile= »
       /opt/continuent/share/convertstringfrommysql.json
    

    Important

    This dataservice cluster-alias name MUST be the same as the cluster dataservice name that you are replicating from.

    Note

    Do not include start-and-report=true if you are taking over for MySQL native replication. See Section 6.12.1, “Migrating from MySQL Native Replication 'In-Place'” for next steps after completing installation.

  4. Once the configuration has been completed, you can perform the installation to set up the services using this configuration:

    shell> ./tools/tpm install

During the installation and startup, tpm will notify you of any problems that need to be fixed before the service can be correctly installed and started. If the service starts correctly, you should see the configuration and current status of the service.

If the installation process fails, check the output of the /tmp/tungsten-configure.log file for more information about the root cause.

The cluster should be installed and ready to use.

3.10. Replicating from a Cluster to a Datawarehouse

You can replicate data from an existing cluster to a datawarehouse such as Hadoop or Vertica. A replication applier node handles the datawarehouse loading by obtaining THL from the cluster. The configuration of the cluster needs to be changed to be compatible with the required target applier format.

The Cluster-Extractor deployment works by configuring the cluster replication service in heterogeneous mode, and then replicating out to the Appliers that writes into the datawarehouse by using a cluster alias. This ensures that changes to the cluster topology (i.e. Primary switches during a failover or maintenance) still allow replication to continue effectively to your chosen datawarehouse.

The datawarehouse may be installed and running on the same host as the replicator, "Onboard", or on a different host entirely, "Offboard".

Figure 3.8. Topologies: Replication from a Cluster to an Offboard Datawarehouse

Topologies: Replication from Cluster to Datawarehouse

Below is a summary of the steps needed to configure the Cluster-Extractor topology, with links to the actual procedures included:

  1. Install or update a cluster, configured to operate in heterogeneous mode.

    In our example, the cluster configuration file /etc/tungsten/tungsten.ini would contain two stanzas:

    • [defaults] - contains configuration values used by all services.

    • [alpha] - contains cluster configuration parameters, and will use topology=clustered to indicate to the tpm command that nodes listed in this stanza are to be acted upon during installation and update operations.

    For more details about installing the source cluster, please see Section 3.10.2, “Replicating from a Cluster to a Datawarehouse - Configuring the Cluster Nodes”.

  2. Potentially seed the initial data. For more information about various ways to provision the initial data into the target warehouse, please see Section 3.11, “Migrating and Seeding Data”.

  3. Install the Extractor replicator:

    In our example, the Extractor configuration file /etc/tungsten/tungsten.ini would contain three stanzas:

    • [defaults] - contains configuration values used by all services.

    • [alpha] - contains the list of cluster nodes for use by the applier service as a source list. This stanza will use topology=cluster-alias to ensure that no installation or update action will ever be taken on the listed nodes by the tpm command.

    • [omega] - defines a replicator Applier service that uses topology=cluster-slave. This service will extract THL from the cluster nodes defined in the relay source cluster-alias definition [alpha] and write the events into your chosen datawarehouse.

    For more details about installing the replicator, please see Section 3.10.3, “Replicating from a Cluster to a Datawarehouse - Configuring the Cluster-Extractor”.

3.10.1. Replicating from a Cluster to a Datawarehouse - Prerequisites

There are the prerequisite requirements for Cluster-Extractor operations::

  • The Tungsten Cluster and Tungsten Replicator must be version 5.2.0 or later.

  • Hosts on both the replicator and cluster must be able to communicate with each other.

  • Replicator must be able to connect as the tungsten user to the databases within the cluster

3.10.2. Replicating from a Cluster to a Datawarehouse - Configuring the Cluster Nodes

There are the steps to configure a cluster to act as the source for a Cluster-Extractor replicator writing into a datawarehouse:

  • Enable MySQL ROW-based Binary Logging

    All MySQL databases running in clusters replicating to non-MySQL targets must operate in ROW-based replication mode to prevent data drift.

    This is required because replication to the datawarehouse environment must send the raw-data, rather than the statements which cannot be applied directly to a target datawarehouse.

    You must configure the my.cnf file to enable ROW-based binary logging:

    binlog-format = ROW

    ROW-based binary logging can also be enabled without restarting the MySQL server:

    mysql> select @@global.binlog_format\G
    *************************** 1. row ***************************
    @@global.binlog_format: MIXED
    1 row in set (0.00 sec)
    
    mysql> SET GLOBAL binlog_format = 'ROW';
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> select @@global.binlog_format\G
    *************************** 1. row ***************************
    @@global.binlog_format: ROW
    1 row in set (0.00 sec)
  • Enable and Configure the Extractor Filters

    Heterogeneous mode should be enabled within the cluster.

    The extractor filters and two associated properties add the column names and primary key details to the THL. This is required so that the information can be replicated into the datawarehouse correctly.

    For example, on every cluster node the lines below would be added to the /etc/tungsten/tungsten.ini file, then tpm update would be executed:

    [alpha]
    ...
    repl-svc-extractor-filters=colnames,pkey
    property=replicator.filter.pkey.addColumnsToDeletes=true
    property=replicator.filter.pkey.addPkeyToInserts=true

    For staging deployments, prepend two hyphens to each line and include on the command line.

3.10.3. Replicating from a Cluster to a Datawarehouse - Configuring the Cluster-Extractor

Configure the replicator that will act as an Extractor, reading information from the cluster and then applying that data into the chosen datawarehouse. Multiple example targets are shown.

This node may be located either on a separate host (for example when replicating to Amazon Redshift), or on the same node as the target datawarehouse service (i.e. HP Vertica or Hadoop).

On the following pages are the steps to configure a Cluster-Extractor target replicator writing into a datawarehouse for both staging and INI methods of installation.

Figure 3.9. Topologies: Replication from a Cluster to an Offboard Datawarehouse

Topologies: Replication from a Cluster to a Datawarehouse

3.10.3.1. Replicating Data from a Cluster to a Datawarehouse (Staging Use Case)

The following Staging-method procedure will install the Tungsten Replicator software onto target node host6, extracting from a cluster consisting of three (3) nodes (host1, host2 and host3) and applying into the target datawarehouse via host6.

Important

If you are replicating to a MySQL-specific target, please see Section 3.9, “Replicating Data Out of a Cluster” for more information.

  1. On your staging server, go to the software directory.

    shell> cd /opt/continuent/software
  2. Download the latest Tungsten Replicator version.

  3. Unpack the release package

    shell> tar xvzf tungsten-replicator-7.1.4-10.tar.gz
  4. Change to the unpackaged directory:

    shell> cd tungsten-replicator-7.1.4-10.tar.gz
  5. Execute the tpm command to configure defaults for the installation.

    shell> ./tools/tpm configure defaults \
    --install-directory=/opt/replicator \
    '--profile-script=~/.bashrc' \
    --replication-password=secret \
    --replication-port=13306 \
    --replication-user=tungsten \
    --start-and-report=true \
    --mysql-allow-intensive-checks=true \
    --user=tungsten

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

  6. Configure a cluster alias that points to the Primaries and Replicas within the current Tungsten Cluster service that you are replicating from:

    shell> ./tools/tpm configure alpha \
        --master=host1 \
        --slaves=host2,host3 \
        --thl-port=2112 \
        --topology=cluster-alias

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    • tpm configure alpha

      This runs the tpm command. configure indicates that we are creating a new dataservice, and alpha is the name of the dataservice being created.

      This definition is for a dataservice alias, not an actual dataservice because --topology=cluster-alias has been specified. This alias is used in the cluster-slave section to define the source hosts for replication.

    • --master=host1

      Specifies the hostname of the default Primary in the cluster.

    • --slaves=host2,host3

      Specifies the name of any other servers in the cluster that may be replicated from.

    • --thl-port=2112

      The THL port for the cluster. The default value is 2112 but any other value must be specified.

    • --topology=cluster-alias

      Define this as a cluster dataservice alias so tpm does not try to install cluster software to the hosts.

    Important

    This dataservice cluster-alias name MUST be the same as the cluster dataservice name that you are replicating from.

  7. On the Cluster-Extractor node, copy the convertstringfrommysql.json filter configuration sample file into the /opt/replicator/share directory then edit it to suit:

    cp /opt/replicator/tungsten/tungsten-replicator/support/filters-config/convertstringfrommysql.json /opt/replicator/share/
    vi /opt/replicator/share/convertstringfrommysql.json

    Once the convertstringfrommysql JSON configuration file has been edited, update the /etc/tungsten/tungsten.ini file to add and configure any addition options needed for the specific datawarehouse you are using.

  8. Create the configuration that will replicate from cluster dataservice alpha into the database on the host specified by --relay=host6:

    shell> ./tools/tpm configure omega \
    --relay=host6 \
    --relay-source=alpha \
    --repl-svc-remote-filters=convertstringfrommysql \
    --property=replicator.filter.convertstringfrommysql.definitionsFile=/opt/replicator/share/convertstringfrommysql.json \
    --topology=cluster-slave

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    • tpm configure omega

      This runs the tpm command. configure indicates that we are creating a new replication service, and omega is the unique service name for the replication stream from the cluster.

    • --relay=host6

      Specifies the hostname of the destination database into which data will be replicated.

    • --relay-source=alpha

      Specifies the name of the source cluster dataservice alias (defined above) that will be used to read events to be replicated.

    • --topology=cluster-slave

      Read source replication data from any host in the alpha dataservice.

  9. Now finish configuring the omega dataservice with the options specific to the datawarehouse target in use.

    • AWS RedShift Target

      shell> ./tools/tpm configure omega \
      --batch-enabled=true \
      --batch-load-template=redshift \
      --enable-heterogeneous-slave=true \
      --datasource-type=redshift \
      --replication-host=REDSHIFT_ENDPOINT_FQDN_HERE \
      --replication-user=REDSHIFT_PASSWORD_HERE \
      --replication-password=REDSHIFT_PASSWORD_HERE \
      --redshift-dbname=REDSHIFT_DB_NAME_HERE \
      --svc-applier-filters=dropstatementdata \
      --svc-applier-block-commit-interval=10s \
      --svc-applier-block-commit-size=5
      

      The description of each of the options is shown below; click the icon to hide this detail:

      Click the icon to show a detailed description of each argument.

      • tpm configure

        Configures default options that will be configured for all future services.

      • --topology=cluster-slave

        Configure the topology as a cluster-slave. This will configure the individual replicator as ac Extractor of all the nodes in the cluster, as defined in the previous configuration of the cluster topology.

      • --relay

        Configure the node as the relay for the cluster which will replicate data into the datawarehouse.

      • --enable-heterogeneous-slave

        Configures the Extractor to correctly process the incoming data so that it can be written to the datawarehouse. This includes correcting the processing of text data types and configuring the appropriate filters.

      • --replication-host

        The target host for writing data. In the case of Redshift, this is the fully qualified hostname of the Redshift host.

      • --replication-user

        The user within the Redshift service that will be used to write data into the database.

      • --replication-password=password

        The password for the user within the Redshift service that will be used to write data into the database.

      • --datasource-type=redshift

        Set the datasource type to be used when storing information about the replication state.

      • --batch-enabled=true

        Enable the batch service, this configures the JavaScript batch engine and CSV writing semantics to generate the data to be applied into a datawarehouse.

      • --batch-load-template=redshift

        The batch load template to be used. Since we are replicating into Redshift, the redshift template is used.

      • --redshift-dbname=dev

        The name of the database within the Redshift service where the data will be written.

      Please see Install Amazon Redshift Applier for more information.

    • Vertica Target

      shell> ./tools/tpm configure omega \
      --batch-enabled=true  \
      --batch-load-template=vertica6 \
      --batch-load-language=js  \
      --datasource-type=vertica  \
      --disable-relay-logs=true \
      --enable-heterogeneous-service=true \
      --replication-user=dbadmin \
      --replication-password=VERTICA_DB_PASSWORD_HERE \
      --replication-host=VERTICA_HOST_NAME_HERE \
      --replication-port=5433  \
      --svc-applier-block-commit-interval=5s \
      --svc-applier-block-commit-size=500  \
      --vertica-dbname=VERTICA_DB_NAME_HERE
      

      Please see Install Vertica Applier for more information.

    • For additional targets, please see the full list at Deploying Appliers, or click on some of the targets below:

  10. Once the configuration has been completed, you can perform the installation to set up the Tungsten Replicator services using the tpm command run from the staging directory:

    shell> ./tools/tpm install

If the installation process fails, check the output of the /tmp/tungsten-configure.log file for more information about the root cause.

The Cluster-Extractor replicator should now be installed and ready to use.

3.10.3.2. Replicating Data from a Cluster to a Datawarehouse (INI Use Case)

The following INI-based procedure will install the Tungsten Replicator software onto target node host6, extracting from a cluster consisting of three (3) nodes (host1, host2 and host3) and applying into the target datawarehouse via host6.

Important

If you are replicating to a MySQL-specific target, please see Deploying the MySQL Applier for more information.

  1. On the Cluster-Extractor node, copy the convertstringfrommysql.json filter configuration sample file into the /opt/replicator/share directory then edit it to suit:

    cp /opt/replicator/tungsten/tungsten-replicator/support/filters-config/convertstringfrommysql.json /opt/replicator/share/
    vi /opt/replicator/share/convertstringfrommysql.json

    Once the convertstringfrommysql JSON configuration file has been edited, update the /etc/tungsten/tungsten.ini file to add and configure any addition options needed for the specific datawarehouse you are using.

  2. Create the configuration file /etc/tungsten/tungsten.ini on the destination DBMS host, i.e. host6:

    [defaults]
    user=tungsten
    install-directory=/opt/replicator
    replication-user=tungsten
    replication-password=secret
    replication-port=3306
    profile-script=~/.bashrc
    mysql-allow-intensive-checks=true
    start-and-report=true
    
    [alpha]
    topology=cluster-alias
    master=host1
    members=host1,host2,host3
    thl-port=2112
    
    [omega]
    topology=cluster-slave
    relay=host6
    relay-source=alpha
    repl-svc-remote-filters=convertstringfrommysql
    property=replicator.filter.convertstringfrommysql.definitionsFile=/opt/replicator/share/convertstringfrommysql.json
    

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    • [defaults]

      defaults indicates that we are setting options which will apply to all cluster dataservices.

    • user=tungsten

      The operating system user name that you have created for the Tungsten service, tungsten.

    • install-directory=/opt/replicator

      The installation directory of the Tungsten Replicator service. This is where the replicator software will be installed on the destination DBMS server.

    • replication-user=tungsten

      The MySQL user name to use when connecting to the MySQL database.

    • replication-password=secret

      The MySQL password for the user that will connect to the MySQL database.

    • replication-port=3306

      The TCP/IP port on the destination DBMS server that is listening for connections.

    • start-and-report=true

      Tells tpm to startup the service, and report the current configuration and status.

    • profile-script=~/.bashrc

      Tells tpm to add PATH information to the specified script to initialize the Tungsten Replicator environment.

    • [alpha]

      alpha is the name and identity of the source cluster alias being created.

      This definition is for a dataservice alias, not an actual dataservice because topology=cluster-alias has been specified. This alias is used in the cluster-slave section to define the source hosts for replication.

    • topology=cluster-alias

      Define this as a cluster dataservice alias so tpm does not try to install cluster software to the hosts.

    • members=host1,host2,host3

      A comma separated list of all the hosts that are part of this cluster dataservice.

    • master=host1

      The hostname of the server that is the current cluster Primary MySQL server.

    • thl-port=2112

      The THL port for the cluster. The default value is 2112 but any other value must be specified.

    • [omega]

      omega is is the unique service name for the replication stream from the cluster.

      This replication service will extract data from cluster dataservice alpha and apply into the database on the DBMS server specified by relay=host6.

    • topology=cluster-slave

      Tells tpm this is a Cluster-Extractor replication service which will have a list of all source cluster nodes available.

    • relay=host6

      The hostname of the destination DBMS server.

    • relay-source=alpha

      Specifies the name of the source cluster dataservice alias (defined above) that will be used to read events to be replicated.

    Important

    The cluster-alias name (i.e. alpha) MUST be the same as the cluster dataservice name that you are replicating from.

    Note

    Do not include start-and-report=true if you are taking over for MySQL native replication. See Section 6.12.1, “Migrating from MySQL Native Replication 'In-Place'” for next steps after completing installation.

  3. Now finish configuring the omega dataservice with the options specific to the datawarehouse target in use.

    Append the appropriate code snippet below to the bottom of the existing [omega] stanza:

    • AWS RedShift Target - Offboard Batch Applier

      batch-enabled=true
      batch-load-template=redshift
      datasource-type=redshift
      enable-heterogeneous-slave=true
      replication-host=REDSHIFT_ENDPOINT_FQDN_HERE
      replication-user=REDSHIFT_PASSWORD_HERE
      replication-password=REDSHIFT_PASSWORD_HERE
      redshift-dbname=REDSHIFT_DB_NAME_HERE
      svc-applier-filters=dropstatementdata
      svc-applier-block-commit-interval=1m
      svc-applier-block-commit-size=5000
      

      The description of each of the options is shown below; click the icon to hide this detail:

      Click the icon to show a detailed description of each argument.

      • --topology=cluster-slave

        Configure the topology as a Cluster-Extractor. This will configure the individual replicator as an trext; of all the nodes in the cluster, as defined in the previous configuration of the cluster topology.

      • --relay

        Configure the node as the relay for the cluster which will replicate data into the datawarehouse.

      • --enable-heterogeneous-slave=true

        Configures the Extractor to correctly process the incoming data so that it can be written to the datawarehouse. This includes correcting the processing of text data types and configuring the appropriate filters.

      • --replication-host

        The target host for writing data. In the case of Redshift, this is the fully qualified hostname of the Redshift host.

      • --replication-user

        The user within the Redshift service that will be used to write data into the database.

      • --replication-password=password

        The password for the user within the Redshift service that will be used to write data into the database.

      • --datasource-type=redshift

        Set the datasource type to be used when storing information about the replication state.

      • --batch-enabled=true

        Enable the batch service, this configures the JavaScript batch engine and CSV writing semantics to generate the data to be applied into a datawarehouse.

      • --batch-load-template=redshift

        The batch load template to be used. Since we are replicating into Redshift, the redshift template is used.

      • --redshift-dbname=dev

        The name of the database within the Redshift service where the data will be written.

      Please see Install Amazon Redshift Applier for more information.

    • Vertica Target - Onboard/Offboard Batch Applier

      batch-enabled=true 
      batch-load-template=vertica6
      batch-load-language=js 
      datasource-type=vertica
      disable-relay-logs=true
      enable-heterogeneous-service=true
      replication-user=dbadmin
      replication-password=VERTICA_DB_PASSWORD_HERE
      replication-host=VERTICA_HOST_NAME_HERE
      replication-port=5433
      svc-applier-block-commit-interval=5s
      svc-applier-block-commit-size=500
      vertica-dbname=VERTICA_DB_NAME_HERE
      

      Please see Install Vertica Applier for more information.

    • For additional targets, please see the full list at Deploying Appliers, or click on some of the targets below:

  4. Download and install the latest Tungsten Replicator package (.rpm), or download the compressed tarball and unpack it on host6:

    shell> cd /opt/continuent/software
    shell> tar xvzf tungsten-replicator-7.1.4-10.tar.gz
  5. Change to the Tungsten Replicator staging directory:

    shell> cd tungsten-replicator-7.1.4-10
  6. Run tpm to install the Tungsten Replicator software with the INI-based configuration:

    shell > ./tools/tpm install

    During the installation and startup, tpm will notify you of any problems that need to be fixed before the service can be correctly installed and started. If the service starts correctly, you should see the configuration and current status of the service.

If the installation process fails, check the output of the /tmp/tungsten-configure.log file for more information about the root cause.

The Cluster-Extractor replicator should now be installed and ready to use.

3.11. Migrating and Seeding Data

3.11.1. Migrating from MySQL Native Replication 'In-Place'

If you are migrating an existing MySQL native replication deployment to use Tungsten Cluster the configuration of the Tungsten Cluster replication must be updated to match the status of the Replica.

  1. Deploy Tungsten Cluster using the model or system appropriate according to Chapter 2, Deployment. Ensure that the Tungsten Cluster is not started automatically by excluding the --start or --start-and-report options from the tpm commands.

  2. On each Replica

    Confirm that native replication is working on all Replica nodes :

    shell> echo 'SHOW SLAVE STATUS\G' | tpm mysql | \
    egrep 'Master_Host| Last_Error| Slave_SQL_Running' 
                      Master_Host: tr-ssl1
                Slave_SQL_Running: Yes
                       Last_Error:

  3. On the Primary and each Replica

    Reset the Tungsten Replicator position on all servers :

    shell> replicator start offline
    shell> trepctl -service alpha reset -all -y

  4. On the Primary

    Login and start Tungsten Cluster services and put the Tungsten Replicator online:

    shell> startall
    shell> trepctl online
  5. On the Primary

    Put the cluster into maintenance mode using cctrl to prevent Tungsten Cluster automatically reconfiguring services:

    cctrl > set policy maintenance
  6. On each Replica

    Record the current Replica log position (as reported by the Relay_Master_Log_File and Exec_Master_Log_Pos output from SHOW SLAVE STATUS. Ideally, each Replica should be stopped at the same position:

    shell> echo 'SHOW SLAVE STATUS\G' | tpm mysql | \
    egrep 'Master_Host| Last_Error| Relay_Master_Log_File| Exec_Master_Log_Pos' 
                      Master_Host: tr-ssl1
            Relay_Master_Log_File: mysql-bin.000025
                       Last_Error: Error executing row event: 'Table 'tungsten_alpha.heartbeat' doesn't exist'
              Exec_Master_Log_Pos: 181268

    If you have multiple Replicas configured to read from this Primary, record the Replica position individually for each host. Once you have the information for all the hosts, determine the earliest log file and log position across all the Replicas, as this information will be needed when starting Tungsten Cluster replication. If one of the servers does not show an error, it may be replicating from an intermediate server. If so, you can proceed normally and assume this server stopped at the same position as the host is replicating from.

  7. On the Primary

    Take the replicator offline and clear the THL:

    shell> trepctl offline
    shell> trepctl -service alpha reset -all -y
  8. On the Primary

    Start replication, using the lowest binary log file and log position from the Replica information determined in step 6.

    shell> trepctl online -from-event 000025:181268

    Tungsten Replicator will start reading the MySQL binary log from this position, creating the corresponding THL event data.

  9. On each Replica

    1. Disable native replication to prevent native replication being accidentally started on the Replica.

      On MySQL 5.0 or MySQL 5.1:

      shell> echo "STOP SLAVE; CHANGE MASTER TO MASTER_HOST='';" | tpm mysql

      On MySQL 5.5 or later:

      shell> echo "STOP SLAVE; RESET SLAVE ALL;" | tpm mysql
    2. If the final position of MySQL replication matches the lowest across all Replicas, start Tungsten Cluster services :

      shell> trepctl online
      shell> startall

      The Replica will start reading from the binary log position configured on the Primary.

      If the position on this Replica is different, use trepctl online -from-event to set the online position according to the recorded position when native MySQL was disabled. Then start all remaining services with startall.

      shell> trepctl online -from-event 000025:188249
      shell> startall

  10. Use cctrl to confirm that replication is operating correctly across the dataservice on all hosts.

  11. Put the cluster back into automatic mode:

    cctrl> set policy automatic
  12. Update your applications to use the installed connector services rather than a direct connection.

  13. Remove the master.info file on each Replica to ensure that when a Replica restarts, it does not connect up to the Primary MySQL server again.

Once these steps have been completed, Tungsten Cluster should be operating as the replication service for your MySQL servers. Use the information in Chapter 6, Operations Guide to monitor and administer the service.

3.11.2. Migrating from MySQL Native Replication Using a New Service

When running an existing MySQL native replication service that needs to be migrated to a Tungsten Cluster service, one solution is to create the new Tungsten Cluster service, synchronize the content, and then install a service that migrates data from the existing native service to the new service while applications are reconfigured to use the new service. The two can then be executed in parallel until applications have been migrated.

The basic structure is shown in Figure 3.10, “Migration: Migrating Native Replication using a New Service”. The migration consists of two steps:

  • Initializing the new service with the current database state.

  • Creating a Tungsten Replicator deployment that continues to replicate data from the native MySQL service to the new service.

Once the application has been switched and is executing against the new service, the secondary replication can be disabled by shutting down the Tungsten Replicator in /opt/replicator.

Figure 3.10. Migration: Migrating Native Replication using a New Service

Migration: Migrating Native Replication using a New Service

To configure the service:

  1. Stop replication on a Replica for the existing native replication installation :

    mysql> STOP SLAVE;

    Obtain the current Replica position within the Primary binary log :

    mysql> SHOW SLAVE STATUS\G
    ...
                      Master_Host: host3
            Relay_Master_Log_File: mysql-bin.000002
              Exec_Master_Log_Pos: 559
    ...
  2. Create a backup using any method that provides a consistent snapshot. The MySQL Primary may be used if you do not have a Replica to backup from. Be sure to get the binary log position as part of your back. This is included in the output to Xtrabackup or using the --master-data=2 option with mysqldump.

  3. Restart the Replica using native replication :

    mysql> START SLAVE;
  4. On the Primary and each Replica within the new service, restore the backup data and start the database service

  5. Setup the new Tungsten Cluster deployment using the MySQL servers on which the data has been restored. For clarity, this will be called newalpha.

  6. Configure a second replication service, beta to apply data using the existing MySQL native replication server as the Primary, and the Primary of newalpha. The information provided in Section 3.8, “Replicating Data Into an Existing Dataservice” will help. Do not start the new service.

  7. Set the replication position for beta using tungsten_set_position to set the position to the point within the binary logs where the backup was taken:

    shell> /opt/replicator/tungsten/tungsten-replicator/bin/tungsten_set_position \
        --seqno=0 --epoch=0 --service=beta \
        --source-id=host3 --event-id=mysql-bin.000002:559
  8. Start replicator service beta:

    shell> /opt/replicator/tungsten/tungsten-replicator/bin/replicator start

Once replication has been started, use trepctl to check the status and ensure that replication is operating correctly.

The original native MySQL replication Primary can continue to be used for reading and writing from within your application, and changes will be replicated into the new service on the new hardware. Once the applications have been updated to use the new service, the old servers can be decommissioned and replicator service beta stopped and removed.

3.11.3. Seeding Data through MySQL

Once the Tungsten Replicator is installed, it can be used to provision all Replicas with the Primary data. The Replicas will need enough information in order for the installation to succeed and for Tungsten Replicator to start. The provisioning process requires dumping all data on the Primary and reloading it back into the Primary server. This will create a full set of THL entries for the Replica replicators to apply. There may be no other applications accessing the Primary server while this process is running. Every table will be emptied out and repopulated so other applications would get an inconsistent view of the database. If the Primary is a MySQL Replica, then the Replica process may be stopped and started to prevent any changes without affecting other servers.

  1. If you are using a MySQL Replica as the Primary, stop the replication thread :

    mysql> STOP SLAVE;

  2. Check Tungsten Replicator status on all servers to make sure it is ONLINE and that the appliedLastSeqno values are matching :

    shell> trepctl status

    Starting the process before all servers are consistent could cause inconsistencies. If you are trying to completely reprovision the server then you may consider running trepctl reset before proceeding. That will reset the replication position and ignore any previous events on the Primary.

  3. Use mysqldump to output all of the schemas that need to be provisioned :

    shell> mysqldump --opt --skip-extended-insert -hhost3 -utungsten -P13306 -p \
        --databases db1,db2 > ~/dump.sql

    Optionally, you can just dump a set of tables to be provisioned :

    shell> mysqldump --opt --skip-extended-insert -hhost3 -utungsten -P13306 -p \
        db1 table1 table2 > ~/dump.sql

  4. If you are using heterogeneous replication all tables on the Replica must be empty before proceeding. The Tungsten Replicator does not replicate DDL statements such as DROP TABLE and CREATE TABLE. You may either truncate the tables on the Replica or use ddlscan to recreate them.

  5. Load the dump file back into the Primary to recreate all data :

    shell> cat ~/dump.sql | tpm mysql

    The Tungsten Replicator will read the binary log as the dump file is loaded into MySQL. The Replicas will automatically apply these statements through normal replication.

  6. If you are using a MySQL Replica as the Primary, restart the replication thread after the dump file as completed loading :

    mysql> START SLAVE;

  7. Monitor replication status on the Primary and Replicas :

    shell> trepctl status

Chapter 4. Deployment: Advanced

Table of Contents

4.1. Deploying Parallel Replication
4.1.1. Application Prerequisites for Parallel Replication
4.1.2. Enabling Parallel Apply During Install
4.1.3. Channels
4.1.4. Parallel Replication and Offline Operation
4.1.4.1. Clean Offline Operation
4.1.4.2. Tuning the Time to Go Offline Cleanly
4.1.4.3. Unclean Offline
4.1.5. Adjusting Parallel Replication After Installation
4.1.5.1. How to Enable Parallel Apply After Installation
4.1.5.2. How to Change Channels Safely
4.1.5.3. How to Disable Parallel Replication Safely
4.1.5.4. How to Switch Parallel Queue Types Safely
4.1.6. Monitoring Parallel Replication
4.1.6.1. Useful Commands for Parallel Monitoring Replication
4.1.6.2. Parallel Replication and Applied Latency On Replicas
4.1.6.3. Relative Latency
4.1.6.4. Serialization Count
4.1.6.5. Maximum Offline Interval
4.1.6.6. Workload Distribution
4.1.7. Controlling Assignment of Shards to Channels
4.1.8. Disk vs. Memory Parallel Queues
4.2. Distributed Datasource Groups
4.2.1. Introduction to DDG
4.2.2. How DDG Works
4.2.3. Configuring DDG
4.3. Starting and Stopping Tungsten Cluster
4.3.1. Restarting the Replicator Service
4.3.2. Restarting the Connector Service
4.3.3. Restarting the Manager Service
4.3.4. Restarting the Multi-Site/Active-Active Replicator Service
4.4. Configuring Startup on Boot
4.4.1. Configuring Multi-Site/Active-Active Replicator Startup on Boot
4.5. Upgrading Tungsten Cluster
4.5.1. Upgrading using the Staging Method (with ssh Access)
4.5.2. Upgrading when using INI-based configuration, or without ssh Access
4.5.2.1. Upgrading
4.5.2.2. Upgrading a Single Host using tpm
4.5.3. Upgrade/Convert: From Multi-Site/Active-Active (MSAA) to Composite Active/Passive (CAP)
4.5.3.1. Conversion Prerequisites
4.5.3.2. Step 1: Backups
4.5.3.3. Step 2: Redirect Client Connections
4.5.3.4. Step 3: Enter Maintenance Mode
4.5.3.5. Step 4: Stop the Cross-site Replicators
4.5.3.6. Step 5: Export the tracking schema databases
4.5.3.7. Step 6: Uninstall the Cross-site Replicators
4.5.3.8. Step 7: Create Composite Tracking Schema
4.5.3.9. Step 8: Reload the tracking schema for Passive clusters
4.5.3.10. Step 9: Stop local cluster Replicators
4.5.3.11. Step 10: Remove THL
4.5.3.12. Step 11: Export the tracking schema database on Active cluster
4.5.3.13. Step 12: Reload the tracking schema for Active cluster
4.5.3.14. Step 13: Update Configuration
4.5.3.15. Step 14: Install the Software on Active Cluster
4.5.3.16. Step 15: Start Local Replicators on Active cluster
4.5.3.17. Step 16: Install the Software on remaining Clusters
4.5.3.18. Step 17: Start Local Replicators on remaining clusters
4.5.3.19. Step 18: Convert Datasource roles for Passive clusters
4.5.3.20. Step 19: Upgrade the Software on Connectors
4.5.4. Upgrade/Convert: From Multi-Site/Active-Active (MSAA) to Composite Active/Active (CAA)
4.5.4.1. Supported Upgrade Paths
4.5.4.2. Upgrade Prerequisites
4.5.4.3. Step 1: Backups
4.5.4.4. Step 2: Stop the Cross-site Replicators
4.5.4.5. Step 3: Export the tungsten_* Databases
4.5.4.6. Step 4: Uninstall the Cross-site Replicators
4.5.4.7. Step 5: Reload the tracking schema
4.5.4.8. Step 6: Update Configuration
4.5.4.9. Step 7: Enter Maintenance Mode
4.5.4.10. Step 8: Stop Managers
4.5.4.11. Step 9: Install/Update the Software
4.5.4.12. Step 10: Start Managers
4.5.4.13. Step 11: Return to Automatic Mode
4.5.4.14. Step 12: Validate
4.5.5. Installing an Upgraded JAR Patch
4.5.6. Installing Patches
4.5.7. Upgrading to v7.0.0+
4.5.7.1. Background
4.5.7.2. Upgrade Decisions
4.5.7.3. Setup internal encryption and authentication
4.5.7.4. Enable Tungsten to Database Encryption
4.5.7.5. Enable Connector to Database Encryption
4.5.7.6. Enable MySQL SSL
4.5.7.7. Steps to upgrade using tpm
4.5.7.8. Optional Post-Upgrade steps to configure API
4.6. Removing Datasources, Managers or Connectors
4.6.1. Removing a Datasource from an Existing Deployment
4.6.2. Removing a Composite Datasource/Cluster from an Existing Deployment
4.6.3. Removing a Connector from an Existing Deployment

The following sections provide guidance and instructions for creating advanced deployments, including configuration automatic startup and shutdown during boot procedures, upgrades, downgrades, and removal of Tungsten Cluster.

4.1. Deploying Parallel Replication

Parallel apply is an important technique for achieving high speed replication and curing Replica lag. It works by spreading updates to Replicas over multiple threads that split transactions on each schema into separate processing streams. This in turn spreads I/O activity across many threads, which results in faster overall updates on the Replica. In ideal cases throughput on Replicas may improve by up to 5 times over single-threaded MySQL native replication.

Note

It is worth noting that the only thing Tungsten parallelizes is applying transactions to Replicas. All other operations in each replication service are single-threaded.

4.1.1. Application Prerequisites for Parallel Replication

Parallel replication works best on workloads that meet the following criteria:

  • ROW based binary logging must be enabled in the MySQL database.

  • Data are stored in independent schemas. If you have 100 customers per server with a separate schema for each customer, your application is a good candidate.

  • Transactions do not span schemas. Tungsten serializes such transactions, which is to say it stops parallel apply and runs them by themselves. If more than 2-3% of transactions are serialized in this way, most of the benefits of parallelization are lost.

  • Workload is well-balanced across schemas.

  • The Replica host(s) are capable and have free memory in the OS page cache.

  • The host on which the Replica runs has a sufficient number of cores to operate a large number of Java threads.

  • Not all workloads meet these requirements. If your transactions are within a single schema only, you may need to consider different approaches, such as Replica prefetch. Contact Continuent for other suggestions.

Parallel replication does not work well on underpowered hosts, such as Amazon m1.small instances. In fact, any host that is already I/O bound under single-threaded replication will typical will not show much improvement with parallel apply.

Note

Currently, it is not recommended to use the SMARTSCALE connector configuration in conjunction with Parallel Apply. This is due to progress only being measured against the slowest channel.

4.1.2. Enabling Parallel Apply During Install

Parallel apply is enabled using the svc-parallelization-type and channels options of tpm. The parallelization type defaults to none which is to say that parallel apply is disabled. You should set it to disk. The channels option sets the the number of channels (i.e., threads) you propose to use for applying data. Here is a code example of a MySQL Applier installation with parallel apply enabled. The Replica will apply transactions using 30 channels.

Show Staging

Show INI

shell> ./tools/tpm configure defaults \
    --reset \
    --install-directory=/opt/continuent \
    --user=tungsten \
    --mysql-allow-intensive-checks=true \
    --profile-script=~/.bash_profile \
    --application-port=3306 \
    --application-user=app_user \
    --application-password=secret \
    --replication-port=13306 \
    --replication-user=tungsten \
    --replication-password=secret \
    --svc-parallelization-type=disk \
    --connector-smartscale=false # parallel apply and smartscale are not compatible \
    --channels=10 \
    --rest-api-admin-user=apiuser \
    --rest-api-admin-pass=secret

shell> ./tools/tpm configure alpha \
    --master=host1 \
    --members=host1,host2,host3 \
    --connectors=host1,host2,host3 \
    --topology=clustered
shell> vi /etc/tungsten/tungsten.ini
[defaults]
install-directory=/opt/continuent
user=tungsten
mysql-allow-intensive-checks=true
profile-script=~/.bash_profile
application-port=3306
application-user=app_user
application-password=secret
replication-port=13306
replication-user=tungsten
replication-password=secret
svc-parallelization-type=disk
connector-smartscale=false # parallel apply and smartscale are not compatible
channels=10
rest-api-admin-user=apiuser
rest-api-admin-pass=secret

[alpha]
master=host1
members=host1,host2,host3
connectors=host1,host2,host3
topology=clustered

Configuration group defaults

The description of each of the options is shown below; click the icon to hide this detail:

Click the icon to show a detailed description of each argument.

Configuration group alpha

The description of each of the options is shown below; click the icon to hide this detail:

Click the icon to show a detailed description of each argument.

If the installation process fails, check the output of the /tmp/tungsten-configure.log file for more information about the root cause.

There are several additional options that default to reasonable values. You may wish to change them in special cases.

  • buffer-size — Sets the replicator block commit size, which is the number of transactions to commit at once on Replicas. Values up to 100 are normally fine.

  • native-slave-takeover — Used to allow Tungsten to take over from native MySQL replication and parallelize it. See here for more.

You can check the number of active channels on a Replica by looking at the "channels" property once the replicator restarts.

Replica shell> trepctl -service alpha status| grep channels
channels               : 10

Important

The channel count for a Primary will ALWAYS be 1 because extraction is single-threaded:

Primary shell> trepctl -service alpha status| grep channels
channels               : 1

Warning

Enabling parallel apply will dramatically increase the number of connections to the database server.

Typically the calculation on a Replica would be: Connections = Channel_Count x Sevice_Count x 2, so for a 4-way Composite Composite Active/Active topology with 30 channels there would be 30 x 4 x 2 = 240 connections required for the replicator alone, not counting application traffic.

You may display the currently used number of connections in MySQL:

mysql> SHOW STATUS LIKE 'max_used_connections';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| Max_used_connections | 190   |
+----------------------+-------+
1 row in set (0.00 sec)

Below are suggestions for how to change the maximum connections setting in MySQL both for the running instance as well as at startup:

mysql> SET GLOBAL max_connections = 512;

mysql> SHOW VARIABLES LIKE 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 512   |
+-----------------+-------+
1 row in set (0.00 sec)

shell> vi /etc/my.cnf
#max_connections = 151
max_connections = 512

4.1.3. Channels

Channels and Parallel Apply

Parallel apply works by using multiple threads for the final stage of the replication pipeline. These threads are known as channels. Restart points for each channel are stored as individual rows in table trep_commit_seqno if you are applying to a relational DBMS server, including MySQL, Oracle, and data warehouse products like Vertica.

When you set the channels argument, the tpm program configures the replication service to enable the requested number of channels. A value of 1 results in single-threaded operation.

Do not change the number of channels without setting the replicator offline cleanly. See the procedure later in this page for more information.

How Many Channels Are Enough?

Pick the smallest number of channels that loads the Replica fully. For evenly distributed workloads this means that you should increase channels so that more threads are simultaneously applying updates and soaking up I/O capacity. As long as each shard receives roughly the same number of updates, this is a good approach.

For unevenly distributed workloads, you may want to decrease channels to spread the workload more evenly across them. This ensures that each channel has productive work and minimizes the overhead of updating the channel position in the DBMS.

Once you have maximized I/O on the DBMS server leave the number of channels alone. Note that adding more channels than you have shards does not help performance as it will lead to idle channels that must update their positions in the DBMS even though they are not doing useful work. This actually slows down performance a little bit.

Effect of Channels on Backups

If you back up a Replica that operates with more than one channel, say 30, you can only restore that backup on another Replica that operates with the same number of channels. Otherwise, reloading the backup is the same as changing the number of channels without a clean offline.

When operating Tungsten Replicator in a Tungsten cluster, you should always set the number of channels to be the same for all replicators. Otherwise you may run into problems if you try to restore backups across MySQL instances that load with different locations.

If the replicator has only a single channel enabled, you can restore the backup anywhere. The same applies if you run the backup after the replicator has been taken offline cleanly.

4.1.4. Parallel Replication and Offline Operation

4.1.4.1. Clean Offline Operation

When you issue a trepctl offline command, Tungsten Replicator will bring all channels to the same point in the log and then go offline. This is known as going offline cleanly. When a Replica has been taken offline cleanly the following are true:

When parallel replication is not enabled, you can take the replicator offline by stopping the replicator process. There is no need to issue a trepctl offline command first.

4.1.4.2. Tuning the Time to Go Offline Cleanly

Putting a replicator offline may take a while if the slowest and fastest channels are far apart, i.e., if one channel gets far ahead of another. The separation between channels is controlled by the maxOfflineInterval parameter, which defaults to 5 seconds. This sets the allowable distance between commit timestamps processed on different channels. You can adjust this value at installation or later. The following example shows how to change it after installation. This can be done at any time and does not require the replicator to go offline cleanly.

Click the link below to switch examples between Staging and INI methods...

Show Staging

Show INI

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
The staging USER is tungsten

shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
The staging HOST is db1

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> ssh {STAGING_USER}@{STAGING_HOST}
shell> cd {STAGING_DIRECTORY}
shell> ./tools/tpm configure alpha \
    --property=replicator.store.parallel-queue.maxOfflineInterval=30

Run the tpm command to update the software with the Staging-based configuration:

shell> ./tools/tpm update

For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

shell> vi /etc/tungsten/tungsten.ini
[alpha]
...
property=replicator.store.parallel-queue.maxOfflineInterval=30

Run the tpm command to update the software with the INI-based configuration:

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> cd {STAGING_DIRECTORY}

shell> ./tools/tpm update

For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

The offline interval is only the the approximate time that Tungsten Replicator will take to go offline. Up to a point, larger values (say 60 or 120 seconds) allow the replicator to parallelize in spite of a few operations that are relatively slow. However, the down side is that going offline cleanly can become quite slow.

4.1.4.3. Unclean Offline

If you need to take a replicator offline quickly, you can either stop the replicator process or issue the following command:

shell> trepctl offline -immediate

Both of these result in an unclean shutdown. However, parallel replication is completely crash-safe provided you use transactional table types like InnoDB, so you will be able to restart without causing Replica consistency problems.

Warning

You must take the replicator offline cleanly to change the number of channels or when reverting to MySQL native replication. Failing to do so can result in errors when you restart replication.

4.1.5. Adjusting Parallel Replication After Installation

4.1.5.1. How to Enable Parallel Apply After Installation

Warning

Be sure to place the cluster into MAINTENANCE mode first so the Manager does not attempt to automatically bring the replicator online.

cctrl> set policy maintenance

To enable parallel replication after installation, take the replicator offline cleanly using the following command:

shell> trepctl offline

Modify the configuration to add two parameters:

Show Staging

Show INI

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
The staging USER is tungsten

shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
The staging HOST is db1

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> ssh {STAGING_USER}@{STAGING_HOST}
shell> cd {STAGING_DIRECTORY}
shell> ./tools/tpm configure defaults \
    --svc-parallelization-type=disk \
    --channels=10

Run the tpm command to update the software with the Staging-based configuration:

shell> ./tools/tpm update

For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

[defaults]
...
svc-parallelization-type=disk
channels=10

Run the tpm command to update the software with the INI-based configuration:

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> cd {STAGING_DIRECTORY}

shell> ./tools/tpm update

For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

Note

You make use an actual data service name in place of the keyword defaults.

Signal the changes by a complete restart of the Replicator process:

shell> replicator restart

Warning

Be sure to place the cluster into AUTOMATIC mode as soon as all replicators are updated and back online.

cctrl> set policy automatic

You can check the number of active channels on a Replica by looking at the "channels" property once the replicator restarts.

Replica shell> trepctl -service alpha status| grep channels
channels               : 10

Important

The channel count for a Primary will ALWAYS be 1 because extraction is single-threaded:

Primary shell> trepctl -service alpha status| grep channels
channels               : 1

Warning

Enabling parallel apply will dramatically increase the number of connections to the database server.

Typically the calculation on a Replica would be: Connections = Channel_Count x Sevice_Count x 2, so for a 4-way Composite Composite Active/Active topology with 30 channels there would be 30 x 4 x 2 = 240 connections required for the replicator alone, not counting application traffic.

You may display the currently used number of connections in MySQL:

mysql> SHOW STATUS LIKE 'max_used_connections';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| Max_used_connections | 190   |
+----------------------+-------+
1 row in set (0.00 sec)

Below are suggestions for how to change the maximum connections setting in MySQL both for the running instance as well as at startup:

mysql> SET GLOBAL max_connections = 512;

mysql> SHOW VARIABLES LIKE 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 512   |
+-----------------+-------+
1 row in set (0.00 sec)

shell> vi /etc/my.cnf
#max_connections = 151
max_connections = 512

4.1.5.2. How to Change Channels Safely

To change the number of channels you must take the replicator offline cleanly using the following command:

shell> trepctl offline

This command brings all channels up the same transaction in the log, then goes offline. If you look in the trep_commit_seqno table, you will notice only a single row, which shows that updates to the Replica have been completely serialized to a single point. At this point you may safely reconfigure the number of channels on the replicator, for example using the following command:

Click the link below to switch examples between Staging and INI methods...

Show Staging

Show INI

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
The staging USER is tungsten

shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
The staging HOST is db1

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> ssh {STAGING_USER}@{STAGING_HOST}
shell> cd {STAGING_DIRECTORY}
shell> ./tools/tpm configure alpha \
    --channels=5

Run the tpm command to update the software with the Staging-based configuration:

shell> ./tools/tpm update

For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

[alpha]
...
channels=5

Run the tpm command to update the software with the INI-based configuration:

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> cd {STAGING_DIRECTORY}

shell> ./tools/tpm update

For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

You can check the number of active channels on a Replica by looking at the "channels" property once the replicator restarts.

If you attempt to reconfigure channels without going offline cleanly, Tungsten Replicator will signal an error when you attempt to go online with the new channel configuration. The cure is to revert to the previous number of channels, go online, and then go offline cleanly. Note that attempting to clean up the trep_commit_seqno and trep_shard_channel tables manually can result in your Replicas becoming inconsistent and requiring full resynchronization. You should only do such cleanup under direction from Continuent support.

Warning

Failing to follow the channel reconfiguration procedure carefully may result in your Replicas becoming inconsistent or failing. The cure is usually full resynchronization, so it is best to avoid this if possible.

4.1.5.3. How to Disable Parallel Replication Safely

The following steps describe how to gracefully disable parallel apply replication.

Replication Graceful Offline (critical first step)

To disable parallel apply, you must first take the replicator offline cleanly using the following command:

shell> trepctl offline

This command brings all channels up the same transaction in the log, then goes offline. If you look in the trep_commit_seqno table, you will notice only a single row, which shows that updates to the Replica have been completely serialized to a single point. At this point you may safely disable parallel apply on the replicator, for example using the following command:

Click the link below to switch examples between Staging and INI methods...

Show Staging

Show INI

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
The staging USER is tungsten

shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
The staging HOST is db1

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> ssh {STAGING_USER}@{STAGING_HOST}
shell> cd {STAGING_DIRECTORY}
shell> ./tools/tpm configure alpha \
    --svc-parallelization-type=none \
    --channels=1

Run the tpm command to update the software with the Staging-based configuration:

shell> ./tools/tpm update

For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

[alpha]
...
svc-parallelization-type=none
channels=1

Run the tpm command to update the software with the INI-based configuration:

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> cd {STAGING_DIRECTORY}

shell> ./tools/tpm update

For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

Verification

You can check the number of active channels on a Replica by looking at the "channels" property once the replicator restarts.

shell> trepctl -service alpha status| grep channels
channels               : 1
Notes and Warnings

If you attempt to reconfigure channels without going offline cleanly, Tungsten Replicator will signal an error when you attempt to go online with the new channel configuration. The cure is to revert to the previous number of channels, go online, and then go offline cleanly. Note that attempting to clean up the trep_commit_seqno and trep_shard_channel tables manually can result in your Replicas becoming inconsistent and requiring full resynchronization. You should only do such cleanup under direction from Continuent support.

Warning

Failing to follow the channel reconfiguration procedure carefully may result in your Replicas becoming inconsistent or failing. The cure is usually full resynchronization, so it is best to avoid this if possible.

4.1.5.4. How to Switch Parallel Queue Types Safely

As with channels you should only change the parallel queue type after the replicator has gone offline cleanly. The following example shows how to update the parallel queue type after installation:

Click the link below to switch examples between Staging and INI methods...

Show Staging

Show INI

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
The staging USER is tungsten

shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
The staging HOST is db1

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> ssh {STAGING_USER}@{STAGING_HOST}
shell> cd {STAGING_DIRECTORY}
shell> ./tools/tpm configure alpha \
    --svc-parallelization-type=disk \
    --channels=5

Run the tpm command to update the software with the Staging-based configuration:

shell> ./tools/tpm update

For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

[alpha]
...
svc-parallelization-type=disk
channels=5

Run the tpm command to update the software with the INI-based configuration:

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> cd {STAGING_DIRECTORY}

shell> ./tools/tpm update

For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

4.1.6. Monitoring Parallel Replication

Basic monitoring of a parallel deployment can be performed using the techniques in Chapter 6, Operations Guide. Specific operations for parallel replication are provided in the following sections.

4.1.6.1. Useful Commands for Parallel Monitoring Replication

The replicator has several helpful commands for tracking replication performance:

Command Description
trepctl status Shows basic variables including overall latency of Replica and number of apply channels
trepctl status -name shards Shows the number of transactions for each shard
trepctl status -name stores Shows the configuration and internal counters for stores between tasks
trepctl status -name tasks Shows the number of transactions (events) and latency for each independent task in the replicator pipeline

4.1.6.2. Parallel Replication and Applied Latency On Replicas

The trepctl status appliedLastSeqno parameter shows the sequence number of the last transaction committed. Here is an example from a Replica with 5 channels enabled.

shell> trepctl status
Processing status command...
NAME                     VALUE
----                     -----
appliedLastEventId     : mysql-bin.000211:0000000020094456;0
appliedLastSeqno       : 78021
appliedLatency         : 0.216
channels               : 5
...
Finished status command...

When parallel apply is enabled, the meaning of appliedLastSeqno changes. It is the minimum recovery position across apply channels, which means it is the position where channels restart in the event of a failure. This number is quite conservative and may make replication appear to be further behind than it actually is.

  • Busy channels mark their position in table trep_commit_seqno as they commit. These are up-to-date with the traffic on that channel, but channels have latency between those that have a lot of big transactions and those that are more lightly loaded.

  • Inactive channels do not get any transactions, hence do not mark their position. Tungsten sends a control event across all channels so that they mark their commit position in trep_commit_channel. It is possible to see a delay of many seconds or even minutes in unloaded systems from the true state of the Replica because of idle channels not marking their position yet.

For systems with few transactions it is useful to lower the synchronization interval to a smaller number of transactions, for example 500. The following command shows how to adjust the synchronization interval after installation:

Click the link below to switch examples between Staging and INI methods...

Show Staging

Show INI

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
The staging USER is tungsten

shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
The staging HOST is db1

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> ssh {STAGING_USER}@{STAGING_HOST}
shell> cd {STAGING_DIRECTORY}
shell> ./tools/tpm configure alpha \
    --property=replicator.store.parallel-queue.syncInterval=500

Run the tpm command to update the software with the Staging-based configuration:

shell> ./tools/tpm update

For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

[alpha]
...
property=replicator.store.parallel-queue.syncInterval=500

Run the tpm command to update the software with the INI-based configuration:

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> cd {STAGING_DIRECTORY}

shell> ./tools/tpm update

For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

Note that there is a trade-off between the synchronization interval value and writes on the DBMS server. With the foregoing setting, all channels will write to the trep_commit_seqno table every 500 transactions. If there were 50 channels configured, this could lead to an increase in writes of up to 10%—each channel could end up adding an extra write to mark its position every 10 transactions. In busy systems it is therefore better to use a higher synchronization interval for this reason.

You can check the current synchronization interval by running the trepctl status -name stores command, as shown in the following example:

shell> trepctl status -name stores
Processing status command (stores)...
...
NAME                      VALUE
----                      -----
...
name                    : parallel-queue
...
storeClass              : com.continuent.tungsten.replicator.thl.THLParallelQueue
syncInterval            : 10000
Finished status command (stores)...

You can also force all channels to mark their current position by sending a heartbeat through using the trepctl heartbeat command.

4.1.6.3. Relative Latency

Relative latency is a trepctl status parameter. It indicates the latency since the last time the appliedSeqno advanced; for example:

shell> trepctl status
Processing status command...
NAME                     VALUE
----                     -----
appliedLastEventId     : mysql-bin.000211:0000000020094766;0
appliedLastSeqno       : 78022
appliedLatency         : 0.571
...
relativeLatency        : 8.944
Finished status command...

In this example the last transaction had a latency of .571 seconds from the time it committed on the Primary and committed 8.944 seconds ago. If relative latency increases significantly in a busy system, it may be a sign that replication is stalled. This is a good parameter to check in monitoring scripts.

4.1.6.4. Serialization Count

Serialization count refers to the number of transactions that the replicator has handled that cannot be applied in parallel because they involve dependencies across shards. For example, a transaction that spans multiple shards must serialize because it might cause cause an out-of-order update with respect to transactions that update a single shard only.

You can detect the number of transactions that have been serialized by looking at the serializationCount parameter using the trepctl status -name stores command. The following example shows a replicator that has processed 1512 transactions with 26 serialized.

shell> trepctl status -name stores
Processing status command (stores)...
...
NAME                      VALUE
----                      -----
criticalPartition       : -1
discardCount            : 0
estimatedOfflineInterval: 0.0
eventCount              : 1512
headSeqno               : 78022
maxOfflineInterval      : 5
maxSize                 : 10
name                    : parallel-queue
queues                  : 5
serializationCount      : 26
serialized              : false
...
Finished status command (stores)...

In this case 1.7% of transactions are serialized. Generally speaking you will lose benefits of parallel apply if more than 1-2% of transactions are serialized.

4.1.6.5. Maximum Offline Interval

The maximum offline interval (maxOfflineInterval) parameter controls the "distance" between the fastest and slowest channels when parallel apply is enabled. The replicator measures distance using the seconds between commit times of the last transaction processed on each channel. This time is roughly equivalent to the amount of time a replicator will require to go offline cleanly.

You can change the maxOfflineInterval as shown in the following example, the value is defined in seconds.

Click the link below to switch examples between Staging and INI methods...

Show Staging

Show INI

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
The staging USER is tungsten

shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
The staging HOST is db1

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> ssh {STAGING_USER}@{STAGING_HOST}
shell> cd {STAGING_DIRECTORY}
shell> ./tools/tpm configure alpha \
    --property=replicator.store.parallel-queue.maxOfflineInterval=30

Run the tpm command to update the software with the Staging-based configuration:

shell> ./tools/tpm update

For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

[alpha]
...
property=replicator.store.parallel-queue.maxOfflineInterval=30

Run the tpm command to update the software with the INI-based configuration:

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> cd {STAGING_DIRECTORY}

shell> ./tools/tpm update

For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

You can view the configured value as well as the estimate current value using the trepctl status -name stores command, as shown in yet another example:

shell> trepctl status -name stores
Processing status command (stores)...
NAME                      VALUE
----                      -----
...
estimatedOfflineInterval: 1.3
...
maxOfflineInterval      : 30
...
Finished status command (stores)...

4.1.6.6. Workload Distribution

Parallel apply works best when transactions are distributed evenly across shards and those shards are distributed evenly across available channels. You can monitor the distribution of transactions over shards using the trepctl status -name shards command. This command lists transaction counts for all shards, as shown in the following example.

shell> trepctl status -name shards
Processing status command (shards)...
...
NAME                VALUE
----                -----
appliedLastEventId: mysql-bin.000211:0000000020095076;0
appliedLastSeqno  : 78023
appliedLatency    : 0.255
eventCount        : 3523
shardId           : cust1
stage             : q-to-dbms
...
Finished status command (shards)...

If one or more shards have a very large eventCount value compared to the others, this is a sign that your transaction workload is poorly distributed across shards.

The listing of shards also offers a useful trick for finding serialized transactions. Shards that Tungsten Replicator cannot safely parallelize are assigned the dummy shard ID #UNKNOWN. Look for this shard to find the count of serialized transactions. The appliedLastSeqno for this shard gives the sequence number of the most recent serialized transaction. As the following example shows, you can then list the contents of the transaction to see why it serialized. In this case, the transaction affected tables in different schemas.

shell> trepctl status -name shards
Processing status command (shards)...
NAME                VALUE
----                -----
appliedLastEventId: mysql-bin.000211:0000000020095529;0
appliedLastSeqno  : 78026
appliedLatency    : 0.558
eventCount        : 26
shardId           : #UNKNOWN
stage             : q-to-dbms
...
Finished status command (shards)...
shell> thl list -seqno 78026
SEQ# = 78026 / FRAG# = 0 (last frag)
- TIME = 2013-01-17 22:29:42.0
- EPOCH# = 1
- EVENTID = mysql-bin.000211:0000000020095529;0
- SOURCEID = logos1
- METADATA = [mysql_server_id=1;service=percona;shard=#UNKNOWN]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [##charset = ISO8859_1, autocommit = 1, sql_auto_is_null = 0, »
    foreign_key_checks = 1, unique_checks = 1, sql_mode = '', character_set_client = 8, »
    collation_connection = 8, collation_server = 33]
- SCHEMA =
- SQL(0) = insert into mats_0.foo values(1) /* ___SERVICE___ = [percona] */
- OPTIONS = [##charset = ISO8859_1, autocommit = 1, sql_auto_is_null = 0, »
    foreign_key_checks = 1, unique_checks = 1, sql_mode = '', character_set_client = 8, »
    collation_connection = 8, collation_server = 33]
- SQL(1) = insert into mats_1.foo values(1)

The replicator normally distributes shards evenly across channels. As each new shard appears, it is assigned to the next channel number, which then rotates back to 0 once the maximum number has been assigned. If the shards have uneven transaction distributions, this may lead to an uneven number of transactions on the channels. To check, use the trepctl status -name tasks and look for tasks belonging to the q-to-dbms stage.

shell> trepctl status -name tasks
Processing status command (tasks)...
...
NAME                VALUE
----                -----
appliedLastEventId: mysql-bin.000211:0000000020095076;0
appliedLastSeqno  : 78023
appliedLatency    : 0.248
applyTime         : 0.003
averageBlockSize  : 2.520
cancelled         : false
currentLastEventId: mysql-bin.000211:0000000020095076;0
currentLastFragno : 0
currentLastSeqno  : 78023
eventCount        : 5302
extractTime       : 274.907
filterTime        : 0.0
otherTime         : 0.0
stage             : q-to-dbms
state             : extract
taskId            : 0
...
Finished status command (tasks)...

If you see one or more channels that have a very high eventCount, consider either assigning shards explicitly to channels or redistributing the workload in your application to get better performance.

4.1.7. Controlling Assignment of Shards to Channels

Tungsten Replicator by default assigns channels using a round robin algorithm that assigns each new shard to the next available channel. The current shard assignments are tracked in table trep_shard_channel in the Tungsten catalog schema for the replication service.

For example, if you have 2 channels enabled and Tungsten processes three different shards, you might end up with a shard assignment like the following:

foo => channel 0
bar => channel 1
foobar => channel 0

This algorithm generally gives the best results for most installations and is crash-safe, since the contents of the trep_shard_channel table persist if either the DBMS or the replicator fails.

It is possible to override the default assignment by updating the shard.list file found in the tungsten-replicator/conf directory. This file normally looks like the following:

# SHARD MAP FILE.
# This file contains shard handling rules used in the ShardListPartitioner
# class for parallel replication.  If unchanged shards will be hashed across
# available partitions.

# You can assign shards explicitly using a shard name match, where the form
# is <db>=<partition>.
#common1=0
#common2=0
#db1=1
#db2=2
#db3=3

# Default partition for shards that do not match explicit name.
# Permissible values are either a partition number or -1, in which
# case values are hashed across available partitions.  (-1 is the
# default.
#(*)=-1

# Comma-separated list of shards that require critical section to run.
# A "critical section" means that these events are single-threaded to
# ensure that all dependencies are met.
#(critical)=common1,common2

# Method for channel hash assignments.  Allowed values are round-robin and
# string-hash.
(hash-method)=round-robin

You can update the shard.list file to do three types of custom overrides.

  1. Change the hashing method for channel assignments. Round-robin uses the trep_shard_channel table. The string-hash method just hashes the shard name.

  2. Assign shards to explicit channels. Add lines of the form shard=channel to the file as shown by the commented-out entries.

  3. Define critical shards. These are shards that must be processed in serial fashion. For example if you have a sharded application that has a single global shard with reference information, you can declare the global shard to be critical. This helps avoid applications seeing out of order information.

Changes to shard.list must be made with care. The same cautions apply here as for changing the number of channels or the parallelization type. For subscription customers we strongly recommend conferring with Continuent Support before making changes.

4.1.8. Disk vs. Memory Parallel Queues

Channels receive transactions through a special type of queue, known as a parallel queue. Tungsten offers two implementations of parallel queues, which vary in their performance as well as the requirements they may place on hosts that operate parallel apply. You choose the type of queue to enable using the --svc-parallelization-type option.

Warning

Do not change the parallel queue type without setting the replicator offline cleanly. See the procedure later in this page for more information.

Disk Parallel Queue (disk option)

A disk parallel queue uses a set of independent threads to read from the Transaction History Log and feed short in-memory queues used by channels. Disk queues have the advantage that they minimize memory required by Java. They also allow channels to operate some distance apart, which improves throughput. For instance, one channel may apply a transaction that committed 2 minutes before the transaction another channel is applying. This separation keeps a single slow transaction from blocking all channels.

Disk queues minimize memory consumption of the Java VM but to function efficiently they do require pages from the Operating System page cache. This is because the channels each independently read from the Transaction History Log. As long as the channels are close together the storage pages tend to be present in the Operating System page cache for all threads but the first, resulting in very fast reads. If channels become widely separated, for example due to a high maxOfflineInterval value, or the host has insufficient free memory, disk queues may operate slowly or impact other processes that require memory.

Memory Parallel Queue (memory option)

A memory parallel queue uses a set of in-memory queues to hold transactions. One stage reads from the Transaction History Log and distributes transactions across the queues. The channels each read from one of the queues. In-memory queues have the advantage that they do not need extra threads to operate, hence reduce the amount of CPU processing required by the replicator.

When you use in-memory queues you must set the maxSize property on the queue to a relatively large value. This value sets the total number of transaction fragments that may be in the parallel queue at any given time. If the queue hits this value, it does not accept further transaction fragments until existing fragments are processed. For best performance it is often necessary to use a relatively large number, for example 10,000 or greater.

The following example shows how to set the maxSize property after installation. This value can be changed at any time and does not require the replicator to go offline cleanly:

Click the link below to switch examples between Staging and INI methods...

Show Staging

Show INI

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
The staging USER is tungsten

shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
The staging HOST is db1

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> ssh {STAGING_USER}@{STAGING_HOST}
shell> cd {STAGING_DIRECTORY}
shell> ./tools/tpm configure alpha \
    --property=replicator.store.parallel-queue.maxSize=10000

Run the tpm command to update the software with the Staging-based configuration:

shell> ./tools/tpm update

For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

[alpha]
...
property=replicator.store.parallel-queue.maxSize=10000

Run the tpm command to update the software with the INI-based configuration:

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> cd {STAGING_DIRECTORY}

shell> ./tools/tpm update

For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

You may need to increase the Java VM heap size when you increase the parallel queue maximum size. Use the --java-mem-size option on the tpm command for this purpose or edit the Replicator wrapper.conf file directly.

Warning

Memory queues are not recommended for production use at this time. Use disk queues.

4.2. Distributed Datasource Groups

Note

This feature was introduced in v7.1.0

4.2.1. Introduction to DDG

Tungsten Distributed Datasource Groups (DDG) are, at their core, a single Standalone cluster, with an odd number of nodes, as usual.

In addition, every node in the cluster uses the same [serviceName], also as usual. The key differences here are that:

  • Each node in the cluster is assigned a Distributed Datasource Group ID (DDG-ID)

  • Nodes with the same DDG-ID will act as if they are part of a separate cluster, limiting failovers to nodes inside the group until there are no more failover candidates, at which time a node in a different groupvirtual ID will be selected as the new primary during a failover.

This means that you would assign nodes in the same region or datacenter the same DDG-ID.

There is still only a single write Primary amongst all the nodes in all the regions, just like Composite Active/Active (CAP).

Unlike CAP, if all nodes in the datacenter containing the Primary node were gone, a node in a different location would be promoted to Primary.

The networks between the datacenters or regions must be of low latency similar to LAN speed for this feature to work properly.

Also, the node in the same group with the most THL downloaded will be selected as the new Primary. If no node is available in the same group, the node with the most THL available is selected from a different group.

4.2.2. How DDG Works

To illustrate the new topology, imagine a 5-node standard cluster spanning 3 datacenters with 2 nodes in DC-A, 2 nodes in DC-B and 1 node in DC-C.

Nodes in DC-A have DDG-ID of 100, nodes in DC-B have DDG-ID of 200, and nodes in DC-C have DDG-ID of 300.

Below are the failure scenarios and resulting actions:

  • Primary fails

    • Failover to any healthy Replica in the same Region/Datacenter (virtual ID group)

  • Entire Region/Datacenter containing the Primary node fails

    • Failover to any healthy Replica in a different Region/Datacenter (virtual ID group)

  • Network partition between any two Regions/Datacenters

    • No action, quorum is maintained by the majority of Managers.

    • Application servers not in the Primary Datacenter will fail to connect

  • Network partition between all Regions/Datacenters

    • All nodes FAILSAFE/SHUNNED

  • Any two Regions/Datacenters offline

    • All nodes FAILSAFE/SHUNNED

Note

Manual intervention to recover the cluster will be required any time the cluster is placed into the FAILSAFE/SHUNNED state.

When configured as per the above example, the ls output from within cctrl will look like the following:

DATASOURCES:
+---------------------------------------------------------------------------------+
|db1-demo.continuent.com(master:ONLINE, progress=0, THL latency=0.495)            |
|STATUS [OK] [2023/06/23 05:46:52 PM UTC][SSL]                                    |
|DATASOURCE GROUP(id=100)                                                         |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=master, state=ONLINE)                                          |
|  DATASERVER(state=ONLINE)                                                       |
|  CONNECTIONS(created=0, active=0)                                               |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|db2-demo.continuent.com(slave:ONLINE, progress=0, latency=0.978)                 |
|STATUS [OK] [2023/06/23 05:46:51 PM UTC][SSL]                                    |
|DATASOURCE GROUP(id=100)                                                         |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=slave, master=db1-demo.continuent.com, state=ONLINE)           |
|  DATASERVER(state=ONLINE)                                                       |
|  CONNECTIONS(created=4, active=0)                                               |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|db3-demo.continuent.com(slave:ONLINE, progress=0, latency=0.705)                 |
|STATUS [OK] [2023/06/23 05:46:51 PM UTC][SSL]                                    |
|DATASOURCE GROUP(id=200)                                                         |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=slave, master=db1-demo.continuent.com, state=ONLINE)           |
|  DATASERVER(state=ONLINE)                                                       |
|  CONNECTIONS(created=4, active=0)                                               |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|db4-demo.continuent.com(slave:ONLINE, progress=0, latency=2.145)                 |
|STATUS [OK] [2023/06/23 05:46:54 PM UTC][SSL]                                    |
|DATASOURCE GROUP(id=200)                                                         |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=slave, master=db1-demo.continuent.com, state=ONLINE)           |
|  DATASERVER(state=ONLINE)                                                       |
|  CONNECTIONS(created=0, active=0)                                               |
+---------------------------------------------------------------------------------+

WITNESSES:
+---------------------------------------------------------------------------------+
|db5-demo.continuent.com(witness:ONLINE)                                          |
|DATASOURCE GROUP(id=300)                                                         |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
+---------------------------------------------------------------------------------+

4.2.3. Configuring DDG

Configuration is very easy, just pick an integer ID for each set of nodes you wish to group together, usually based upon location like region or datacenter.

Folow the steps for deploying and configuring a standard cluster detailed at Section 3.1, “Deploying Standalone HA Clusters” with one simple addition to the configuration, just add an additional new line to the [defaults] section of the /etc/tungsten/tungsten.ini file on every node, including Connector-only nodes, for example:

[defaults]
datasource-group-id=100

The new tpm configuration option datasource-group-id defines which Distributed Datasource Group that node belongs to. The new entry must be in the [defaults] section of the configuration.

Ommitting datasource-group-id from your configuration or setting the value to 0 disables this feature. A positive integer, >0 will enable DDG.

4.3. Starting and Stopping Tungsten Cluster

To stop all of the services associated with a dataservice node, use the stopall script:

shell> stopall 
Stopping Tungsten Connector...
Stopped Tungsten Connector.
Stopping Tungsten Replicator Service...
Stopped Tungsten Replicator Service.
Stopping Tungsten Manager Service...
Stopped Tungsten Manager Service.

To start all services, use the startall script:

shell> startall
Starting Tungsten Manager Service...

Starting Tungsten Replicator Service...

Starting Tungsten Connector...

4.3.1. Restarting the Replicator Service

Warning

Restarting a running replicator temporarily stops and restarts replication. Either set MAINTENANCE mode within cctrl (see Section 6.15, “Performing Database or OS Maintenance” or shun the datasource before restarting the replicator (Section 6.3.6.1, “Shunning a Datasource”.

To shutdown a running Tungsten Replicator you must switch off the replicator:

shell> replicator stop
Stopping Tungsten Replicator Service...
Stopped Tungsten Replicator Service.

To start the replicator service if it is not already running:

shell> replicator start
Starting Tungsten Replicator Service...

4.3.2. Restarting the Connector Service

Warning

Restarting the connector service will interrupt the communication of any running application or client connecting through the connector to MySQL.

To shutdown a running Tungsten Connector you must switch off the replicator:

shell> connector stop
Stopping Tungsten Connector Service...
Stopped Tungsten Connector Service.

To start the replicator service if it is not already running:

shell>  connector start
Starting Tungsten Connector Service...
Waiting for Tungsten Connector Service.....
running: PID:12338

If the cluster was configured with auto-enable=false then you will need to put each node online individually.

4.3.3. Restarting the Manager Service

The manager service is designed to monitor the status and operation of the each of the datasources within the dataservice. In the event that the manager has become confused with the current configuration, for example due to a network or node failure, the managers can be restarted. This forces the managers to update their current status and topology information.

Before restarting managers, the dataservice should be placed in maintenance policy mode. In maintenance mode, the connectors will continue to service requests and the manager restart will not be treated as a failure.

To restart the managers across an entire dataservice, each manager will need to be restarted. The dataservice must be placed in maintenance policy mode first, then:

  1. To set the maintenance policy mode:

    [LOGICAL:EXPERT] /dsone > set policy maintenance
  2. On each datasource in the dataservice:

    1. Stop the service:

      shell> manager stop
    2. Then start the manager service:

      shell> manager start
  3. Once all the managers have been restarted, set the policy mode back to the automatic:

    [LOGICAL:EXPORT] /alpha > set policy automatic 
    policy mode is now AUTOMATIC

4.3.4. Restarting the Multi-Site/Active-Active Replicator Service

Warning

Restarting a running replicator temporarily stops and restarts replication. When using Multi-Site/Active-Active, restarting the additional replicator will stop replication between sites.

These instructions assume you have installed the additional replicator with the --executable-prefix=mm option. If not, you should go to /opt/replicator/tungsten/tungsten-replicator/bin and run the replicator command directly.

To shutdown a running Tungsten Replicator you must switch off the replicator:

shell> mm_replicator stop
Stopping Tungsten Replicator Service...
Stopped Tungsten Replicator Service.

To start the replicator service if it is not already running:

shell> mm_replicator start
Starting Tungsten Replicator Service...

4.4. Configuring Startup on Boot

By default, Tungsten Cluster does not start automatically on boot. To enable Tungsten Cluster to start at boot time, use the deployall script provided in the installation directory to create the necessary boot scripts:

shell> sudo /opt/continuent/tungsten/cluster-home/bin/deployall
 Adding system startup for /etc/init.d/tmanager ...
   /etc/rc0.d/K80tmanager -> ../init.d/tmanager
   /etc/rc1.d/K80tmanager -> ../init.d/tmanager
   /etc/rc6.d/K80tmanager -> ../init.d/tmanager
   /etc/rc2.d/S80tmanager -> ../init.d/tmanager
   /etc/rc3.d/S80tmanager -> ../init.d/tmanager
   /etc/rc4.d/S80tmanager -> ../init.d/tmanager
   /etc/rc5.d/S80tmanager -> ../init.d/tmanager
 Adding system startup for /etc/init.d/treplicator ...
   /etc/rc0.d/K81treplicator -> ../init.d/treplicator
   /etc/rc1.d/K81treplicator -> ../init.d/treplicator
   /etc/rc6.d/K81treplicator -> ../init.d/treplicator
   /etc/rc2.d/S81treplicator -> ../init.d/treplicator
   /etc/rc3.d/S81treplicator -> ../init.d/treplicator
   /etc/rc4.d/S81treplicator -> ../init.d/treplicator
   /etc/rc5.d/S81treplicator -> ../init.d/treplicator
 Adding system startup for /etc/init.d/tconnector ...
   /etc/rc0.d/K82tconnector -> ../init.d/tconnector
   /etc/rc1.d/K82tconnector -> ../init.d/tconnector
   /etc/rc6.d/K82tconnector -> ../init.d/tconnector
   /etc/rc2.d/S82tconnector -> ../init.d/tconnector
   /etc/rc3.d/S82tconnector -> ../init.d/tconnector
   /etc/rc4.d/S82tconnector -> ../init.d/tconnector
   /etc/rc5.d/S82tconnector -> ../init.d/tconnector

To disable automatic startup at boot time, use the undeployall command:

shell> sudo /opt/continuent/tungsten/cluster-home/bin/undeployall

4.4.1. Configuring Multi-Site/Active-Active Replicator Startup on Boot

Because there is an additional Tungsten Replicator running, each must be individually configured to startup on boot:

  • For the Tungsten Cluster service, use Section 4.4, “Configuring Startup on Boot”.

  • For the Tungsten Replicator service, a custom startup script must be created, otherwise the replicator will be unable to start as it has been configured in a different directory.

    1. Create a link from the Tungsten Replicator service startup script in the operating system startup directory (/etc/init.d):

      shell> sudo ln -s /opt/replicator/tungsten/tungsten-replicator/bin/replicator /etc/init.d/mmreplicator
    2. Stop the Tungsten Replicator process. Failure to do this will cause issues because the service will no longer recognize the existing PID file and report it is not running.

      shell> /etc/init.d/mmreplicator stop
    3. Modify the APP_NAME variable within the startup script (/etc/init.d/mmreplicator) to mmreplicator:

      APP_NAME="mmreplicator"
    4. Start the Tungsten Replicator process.

      shell> /etc/init.d/mmreplicator start
    5. Update the operating system startup configuration to use the updated script.

      On Debian/Ubuntu:

      shell> sudo update-rc.d mmreplicator defaults

      On RedHat/CentOS:

      shell> sudo chkconfig --add mmreplicator

4.5. Upgrading Tungsten Cluster

4.5.1. Upgrading using the Staging Method (with ssh Access)
4.5.2. Upgrading when using INI-based configuration, or without ssh Access
4.5.2.1. Upgrading
4.5.2.2. Upgrading a Single Host using tpm
4.5.3. Upgrade/Convert: From Multi-Site/Active-Active (MSAA) to Composite Active/Passive (CAP)
4.5.3.1. Conversion Prerequisites
4.5.3.2. Step 1: Backups
4.5.3.3. Step 2: Redirect Client Connections
4.5.3.4. Step 3: Enter Maintenance Mode
4.5.3.5. Step 4: Stop the Cross-site Replicators
4.5.3.6. Step 5: Export the tracking schema databases
4.5.3.7. Step 6: Uninstall the Cross-site Replicators
4.5.3.8. Step 7: Create Composite Tracking Schema
4.5.3.9. Step 8: Reload the tracking schema for Passive clusters
4.5.3.10. Step 9: Stop local cluster Replicators
4.5.3.11. Step 10: Remove THL
4.5.3.12. Step 11: Export the tracking schema database on Active cluster
4.5.3.13. Step 12: Reload the tracking schema for Active cluster
4.5.3.14. Step 13: Update Configuration
4.5.3.15. Step 14: Install the Software on Active Cluster
4.5.3.16. Step 15: Start Local Replicators on Active cluster
4.5.3.17. Step 16: Install the Software on remaining Clusters
4.5.3.18. Step 17: Start Local Replicators on remaining clusters
4.5.3.19. Step 18: Convert Datasource roles for Passive clusters
4.5.3.20. Step 19: Upgrade the Software on Connectors
4.5.4. Upgrade/Convert: From Multi-Site/Active-Active (MSAA) to Composite Active/Active (CAA)
4.5.4.1. Supported Upgrade Paths
4.5.4.2. Upgrade Prerequisites
4.5.4.3. Step 1: Backups
4.5.4.4. Step 2: Stop the Cross-site Replicators
4.5.4.5. Step 3: Export the tungsten_* Databases
4.5.4.6. Step 4: Uninstall the Cross-site Replicators
4.5.4.7. Step 5: Reload the tracking schema
4.5.4.8. Step 6: Update Configuration
4.5.4.9. Step 7: Enter Maintenance Mode
4.5.4.10. Step 8: Stop Managers
4.5.4.11. Step 9: Install/Update the Software
4.5.4.12. Step 10: Start Managers
4.5.4.13. Step 11: Return to Automatic Mode
4.5.4.14. Step 12: Validate
4.5.5. Installing an Upgraded JAR Patch
4.5.6. Installing Patches
4.5.7. Upgrading to v7.0.0+
4.5.7.1. Background
4.5.7.2. Upgrade Decisions
4.5.7.3. Setup internal encryption and authentication
4.5.7.4. Enable Tungsten to Database Encryption
4.5.7.5. Enable Connector to Database Encryption
4.5.7.6. Enable MySQL SSL
4.5.7.7. Steps to upgrade using tpm
4.5.7.8. Optional Post-Upgrade steps to configure API

To upgrade an existing installation of Tungsten Cluster, the new distribution must be downloaded and unpacked, and the included tpm command used to update the installation. The upgrade process implies a small period of downtime for the cluster as the updated versions of the tools are restarted, However the process that takes place should not present as an outage to your applications providing steps when upgrading the connectors are followed carefully. Any downtime is deliberately kept to a minimum, and the cluster should be in the same operation state once the upgrade has finished as it was when the upgrade was started.

Warning

During the update process, the cluster will be in MAINTENANCE mode. This is intentional to prevent unwanted failovers during the process, however it is important to understand that should the primary fail for genuine reasons NOT associated with the upgrade, then failover will also not happen at that time.

It is important to ensure clusters are returned to the AUTOMATIC state as soon as all Maintenance operations are complete and the cluster is stable.

Note

It is NOT advised to perform rolling upgrades of the tungsten software to avoid miscommunication between components running older/newer versions of the software that may prevent switches/failovers from occuring, therefore it is recommended to upgrade all nodes in place. The process of the upgrade places the cluster into MAINTENANCE mode which in itself avoids outages whilst components are restarted, and allows for a successful upgrade.

Warning

From version 7.1.0 onwards, the JGroup libraries were upgraded, this means that when upgrading to any release from 7.1.0 onwards FROM any release OLDER than 7.1.0, all nodes must be upgraded before full cluster communication will be restored. For that reason upgrades to 7.1.0 onwards from an OLDER release MUST be performed together, ensuring the cluster is only running with a mix of manager versions for as little time as possible. When upgrading nodes, do NOT SHUN the node otherwise you will not be able to recover the node into the cluster until all nodes are upgraded, which could result in an outage to your applications. Additionally, do NOT perform a switch until all nodes are upgraded. This means you should upgrade the master node in situ. Providing the cluster is in MAINTENANCE, this will not cause an outage and the cluster can still be upgraded with no visible outage to you applications.

4.5.1. Upgrading using the Staging Method (with ssh Access)

Warning

Before performing and upgrade, please ensure that you have checked the Appendix B, Prerequisites, as software and system requirements may have changed between versions and releases.

To perform an upgrade of an entire cluster from a staging directory installation, where you have ssh access to the other hosts in the cluster:

  1. On your staging server, download the release package.

  2. Unpack the release package:

    shell> tar zxf tungsten-clustering-7.1.4-10.tar.gz
  3. Change to the extracted directory:

    shell> cd tungsten-clustering7.1.4-10
  4. The next step depends on your existing deployment:

    • If you are upgrading a Multi-Site/Active-Active deployment:

      If you installed the original service by making use of the $CONTINUENT_PROFILES and $REPLICATOR_PROFILES environment variables, no further action needs to be taken to update the configuration information. Confirm that these variables are set before performing the validation and update.

      If you did not use these environment variables when deploying the solution, you must load the existing configuration from the current hosts in the cluster before continuing by using tpm fetch:

      shell> ./tools/tpm fetch --hosts=east1,east2,east3,west1,west2,west3 \
          --user=tungsten --directory=/opt/continuent

      Important

      You must specify ALL the hosts within both clusters within the current deployment when fetching the configuration; use of the autodetect keyword will not collect the correct information.

    • If you are upgrading any other deployment:

      If you are are using the $CONTINUENT_PROFILES variable to specify a location for your configuration, make sure that the variable has been set correctly.

      If you are not using $CONTINUENT_PROFILES, a copy of the existing configuration must be fetched from the installed Tungsten Cluster installation:

      shell> ./tools/tpm fetch --hosts=host1,host2,host3,autodetect \
          --user=tungsten --directory=/opt/continuent

      Important

      You must use the version of tpm from within the staging directory (./tools/tpm) of the new release, not the tpm installed with the current release.

      The current configuration information will be retrieved to be used for the upgrade:

      shell> ./tools/tpm fetch --hosts=host1,host2,host3 --user=tungsten --directory=/opt/continuent
      .......
      NOTE  >> Configuration loaded from host1,host2,host3
  5. Check that the update configuration matches what you expect by using tpm reverse:

    shell> ./tools/tpm reverse
    # Options for the dsone data service
    tools/tpm configure dsone \
    --application-password=password \
    --application-port=3306 \
    --application-user=app_user \
    --connectors=host1,host2,host3 \
    --datasource-log-directory=/var/log/mysql \
    --install-directory=/opt/continuent \
    --master=host1 \
    --members=host1,host2,host3 \
    '--profile-script=~/.bashrc' \
    --replication-password=password \
    --replication-port=13306 \
    --replication-user=tungsten \
    --start-and-report=true \
    --user=tungsten \
    --witnesses=192.168.0.1
  6. Run the upgrade process:

    shell> ./tools/tpm update

    Note

    During the update process, tpm may report errors or warnings that were not previously reported as problems. This is due to new features or functionality in different MySQL releases and Tungsten Cluster updates. These issues should be addressed and the tpm update command re-executed.

    The following additional options are available when updating:

    • --no-connectors (optional)

      By default, an update process will restart all services, including the connector. Adding this option prevents the connectors from being restarted. If this option is used, the connectors must be manually updated to the new version during a quieter period. This can be achieved by running on each host the command:

      shell> tpm promote-connector

      This will result in a short period of downtime (couple of seconds) only on the host concerned, while the other connectors in your configuration keep running. During the upgrade, the Connector is restarted using the updated software and/or configuration.

    A successful update will report the cluster status as determined from each host in the cluster:

    ...........................................................................................................
    Getting cluster status on host1
    Tungsten Clustering (for MySQL) 7.1.4 build 10
    connect to 'dsone@host1'
    dsone: session established
    [LOGICAL] /dsone > ls
    
    COORDINATOR[host3:AUTOMATIC:ONLINE]
    
    ROUTERS:
    +----------------------------------------------------------------------------+
    |connector@host1[31613](ONLINE, created=0, active=0)                      |
    |connector@host2[27649](ONLINE, created=0, active=0)                      |
    |connector@host3[21475](ONLINE, created=0, active=0)                      |
    +----------------------------------------------------------------------------+
    
    ...
    
    #####################################################################
    # Next Steps
    #####################################################################
    We have added Tungsten environment variables to ~/.bashrc.
    Run `source ~/.bashrc` to rebuild your environment.
    
    Once your services start successfully you may begin to use the cluster.
    To look at services and perform administration, run the following command
    from any database server.
    
      $CONTINUENT_ROOT/tungsten/tungsten-manager/bin/cctrl
    
    Configuration is now complete.  For further information, please consult
    Tungsten documentation, which is available at docs.continuent.com.
    
    NOTE  >> Command successfully completed

The update process should now be complete. The current version can be confirmed by starting cctrl.

4.5.2. Upgrading when using INI-based configuration, or without ssh Access

To perform an upgrade of an individual node, tpm can be used on the individual host. The same method can be used to upgrade an entire cluster without requiring tpm to have ssh access to the other hosts in the dataservice.

Warning

Before performing and upgrade, please ensure that you have checked the Appendix B, Prerequisites, as software and system requirements may have changed between versions and releases.

Important

Application traffic to the nodes will be disconnected when the connector restarts. Use the --no-connectors tpm option when you upgrade to prevent the connectors from restarting until later when you want them to.

4.5.2.1. Upgrading

To upgrade:

  1. Place the cluster into maintenance mode

  2. Upgrade the Replicas in the dataservice. Be sure to shun and welcome each Replica.

  3. Upgrade the Primary node

    Important

    Replication traffic to the Replicas will be delayed while the replicator restarts. The delays will increase if there are a large number of stored events in the THL. Old THL may be removed to decrease the delay. Do NOT delete THL that has not been received on all Replica nodes or events will be lost.

  4. Upgrade the connectors in the dataservice one-by-one

    Important

    Application traffic to the nodes will be disconnected when the connector restarts.

  5. Place the cluster into automatic mode

4.5.2.2. Upgrading a Single Host using tpm

Note

For more information on performing maintenance across a cluster, see Section 6.15.3, “Performing Maintenance on an Entire Dataservice”.

To upgrade a single host using the tpm command:

  1. Download the release package.

  2. Unpack the release package:

    shell> tar zxf tungsten-clustering-7.1.4-10.tar.gz
  3. Change to the extracted directory:

    shell> cd tungsten-clustering-7.1.4-10
  4. Execute tpm update, specifying the installation directory. This will update only this host:

    shell> ./tools/tpm update --replace-release

To update all of the nodes within a cluster, the steps above will need to be performed individually on each host.

4.5.3. Upgrade/Convert: From Multi-Site/Active-Active (MSAA) to Composite Active/Passive (CAP)

These steps are designed to guide you in the safe conversion of an existing Multi-Site/Active-Active (MSAA) topology to a Composite Active/Passive (CAP) topology, based on an ini installation.

For details of the difference between these two topologies, please review the following pages:

Warning

It is very important to follow all the below steps and ensure full backups are taken when instructed. These steps can be destructive and without proper care and attention, data loss, data corruption or a split-brain scenario can happen.

Warning

Parallel apply MUST be disabled before starting your upgrade. You may re-enable it once the upgrade has been fully completed. See Section 4.1.5.3, “How to Disable Parallel Replication Safely” and Section 4.1.2, “Enabling Parallel Apply During Install” for more information.

Note

The examples in this section are based on three clusters named 'nyc', 'london' and 'tokyo'

Each cluster has two dedicated connectors on separate hosts.

The converted cluster will consist of a Composite Service named 'global' and the 'nyc' cluster will be the Active cluster, with 'london' and 'tokyo' as Passive clusters.

If you do not have exactly three clusters, please adjust this procedure to match your environment.

Examples of before and after tungsten.ini files can be downloaded here:

If you are currently installed using a staging-based installation, you must convert to an INI based installed for this process to be completed with minimal risk and minimal interuption. For notes on how to perform the staging to INI file conversion using the translatetoini.pl script, please visit Section 10.4.6, “Using the translatetoini.pl Script”.

4.5.3.1. Conversion Prerequisites

Warning

Parallel apply MUST be disabled before starting your upgrade. You may re-enable it once the upgrade has been fully completed. See Section 4.1.5.3, “How to Disable Parallel Replication Safely” and Section 4.1.2, “Enabling Parallel Apply During Install” for more information.

  • Obtain the latest Tungsten Cluster software build and place it within /opt/continuent/software

    If you are not upgrading, just converting, then this step is not required since you will already have the extracted software bundle available.

  • Extract the package

  • The examples below refer to the tungsten_prep_upgrade script, this can be located in the extracted software package within the tools directory.

4.5.3.2. Step 1: Backups

Take a full and complete backup of one node - this can be a Replica, and preferably should be either performed by:

  • Percona xtrabackup whilst database is open

  • Manual backup of all datafiles after stopping the database instance

4.5.3.3. Step 2: Redirect Client Connections

A big difference between Multi-Site/Active-Active (MSAA) and Composite Active/Passive (CAP) is that with MSAA, clients can write into all custers. With CAP clients only write into a single cluster.

To be able to complete this conversion process with minimal interuption and risk, it is essential that clients are redirected and only able to write into a single cluster. This cluster will become the ACTIVE custer after the conversion. For the purpose of this procedure, we will use the 'nyc' cluster for this role.

After redirecting you client applications to connect through the connectors associated with the 'nyc' cluster, stop the connectors associated with the remaining clusters as an extra safeguard against writes happening

On every connector node associated with london and tokyo :

shell> connector stop

4.5.3.4. Step 3: Enter Maintenance Mode

Enable Maintenance mode on all clusters using the cctrl command:

shell> cctrl
cctrl> set policy maintenance

4.5.3.5. Step 4: Stop the Cross-site Replicators

Important

Typically the cross-site replicators will be installed within /opt/replicator, if you have installed this in a different location you will need to pass this to the script in the examples using the --path option

  1. The following commands tell the replicators to go offline at a specific point, in this case when they receive an explicit heartbeat. This is to ensure that all the replicators stop at the same sequence number and binary log position. The replicators will NOT be offline until the explicit heartbeat has been issued a bit later in this step.

    • On every nyc node:

      shell> ./tungsten_prep_upgrade -o 
      ~or~
      shell> ./tungsten_prep_upgrade --service london --offline
      shell> ./tungsten_prep_upgrade --service tokyo --offline
    • On every london node:

      shell> ./tungsten_prep_upgrade -o 
      ~or~
      shell> ./tungsten_prep_upgrade --service nyc --offline
      shell> ./tungsten_prep_upgrade --service tokyo --offline
    • On every tokyo node:

      shell> ./tungsten_prep_upgrade -o 
      ~or~
      shell> ./tungsten_prep_upgrade --service london --offline
      shell> ./tungsten_prep_upgrade --service tokyo --offline
  2. Next, on the Primary hosts within each cluster we issue the heartbeat, execute the following using the cluster-specific trepctl, typically in /opt/continuent:

    shell> trepctl heartbeat -name offline_for_upg

    Ensure that every cross-site replicator on every node is now in the OFFLINE:NORMAL state:

    shell> mmtrepctl status
    ~or~
    shell> mmtrepctl --service {servicename} status
  3. Capture the position of the cross-site replicators on all nodes in all clusters.

    The service name provided should be the name of the remote service(s) for this cluster, so for example in the london cluster you get the positions for nyc and tokyo, and in nyc you get the position for london and tokyo, etc.

    • On every london node:

      shell> ./tungsten_prep_upgrade -g
      ~or~
      shell> ./tungsten_prep_upgrade --service nyc --get
      (NOTE: saves to ~/position-nyc-YYYYMMDDHHMMSS.txt)
      shell> ./tungsten_prep_upgrade --service tokyo --get
      (NOTE: saves to ~/position-tokyo-YYYYMMDDHHMMSS.txt)
    • On every nyc node:

      shell> ./tungsten_prep_upgrade -g
      ~or~
      shell> ./tungsten_prep_upgrade --service london --get
      (NOTE: saves to ~/position-london-YYYYMMDDHHMMSS.txt)
      shell> ./tungsten_prep_upgrade --service tokyo --get
      (NOTE: saves to ~/position-tokyo-YYYYMMDDHHMMSS.txt)
    • On every tokyo node:

      shell> ./tungsten_prep_upgrade -g
      ~or~
      shell> ./tungsten_prep_upgrade --service london --get
      (NOTE: saves to ~/position-london-YYYYMMDDHHMMSS.txt)
      shell> ./tungsten_prep_upgrade --service nyc --get
      (NOTE: saves to ~/position-nyc-YYYYMMDDHHMMSS.txt)
  4. Finally, to complete this step, stop the cross-site replicators on all nodes:

    shell> ./tungsten_prep_upgrade --stop 

4.5.3.6. Step 5: Export the tracking schema databases

On every node in each intended Passive cluster (london and tokyo), export the tracking schema associated the intended Active cluster (nyc)

Note the generated dump file is called tungsten_global.dmp. global refers to the name of the intended Composite Cluster service, if you choose a different service name, change this accordingly.

  • On every london node:

    shell> mysqldump --opt --single-transaction tungsten_nyc > ~/tungsten_global.dmp
  • On every tokyo node:

    shell> mysqldump --opt --single-transaction tungsten_nyc > ~/tungsten_global.dmp

4.5.3.7. Step 6: Uninstall the Cross-site Replicators

To uninstall the cross-site replicators, execute the following on every node:

shell> cd {replicator software path}
shell> tools/tpm uninstall --i-am-sure

4.5.3.8. Step 7: Create Composite Tracking Schema

In this step, we pre-create the database for the composite service tracking schema, we are using global as the service name in this example, if you choose a different Composite service name, adjust this accordingly

On every node in all clusters:

shell> mysql -e 'set session sql_log_bin=0; create database tungsten_global'

4.5.3.9. Step 8: Reload the tracking schema for Passive clusters

This step reloads the tracking schema associated with the intended Active cluster (nyc) into the tracking schema we created in the previous step. This should ONLY be carried out within the intended Passive clusters at this stage.

We DO NOT want the reloading of this schema to appear in the binary logs on the Primary, therefore the reload needs to be performed on each node individually:

  • On every london node:

    shell> mysql -e 'set session sql_log_bin=0; use tungsten_global; source ~/tungsten_global.dmp;'
  • On every tokyo node:

    shell> mysql -e 'set session sql_log_bin=0; use tungsten_global; source ~/tungsten_global.dmp;'

4.5.3.10. Step 9: Stop local cluster Replicators

On every node in every cluster:

shell> replicator stop

Warning

The effect of this step will now mean that only the Primary node in the Active cluster will be up to date with ongoing data changes. You must ensure that your applications handle this accordingly until the replicators are restarted at Step 14

4.5.3.11. Step 10: Remove THL

Warning

This step, if not followed correctly, could be destructive to the entire conversion. It is CRITICAL that this step is NOT performed on the intended Active cluster (nyc)

By default, THL files will be located within /opt/continuent/thl, if you have configured this in a different location you will need to adjust the path below accordingly

  • On every london node:

    shell> cd /opt/continuent/thl
    shell> rm */thl*
  • On every tokyo node:

    shell> cd /opt/continuent/thl
    shell> rm */thl*

4.5.3.12. Step 11: Export the tracking schema database on Active cluster

On every node within the intended Active cluster (nyc), export the tracking schema associated with the local service

Note the generated dump file is called tungsten_global.dmp. global refers to the name of the intended Composite Cluster service, if you choose a different service name, change this accordingly.

  • On every nyc node:

    shell> mysqldump --opt --single-transaction tungsten_nyc > ~/tungsten_global.dmp

4.5.3.13. Step 12: Reload the tracking schema for Active cluster

This step reloads the tracking schema associated with the intended Active cluster (nyc) into the tracking schema we created in the earlier step.

We DO NOT want the reloading of this schema to appear in the binary logs on the Primary, therefore the reload needs to be performed on each node individually:

  • On every nyc node:

    shell> mysql -e 'set session sql_log_bin=0; use tungsten_global; source ~/tungsten_global.dmp;'

4.5.3.14. Step 13: Update Configuration

Update /etc/tungsten/tungsten.ini to a valid Composite Active/Passive config. An example of a valid config is as follows, a sample can also be downloaded from Section 4.5.3.1, “Conversion Prerequisites” above:

Important

Within a Composite Active/Passive topology, the ini file must be identical on EVERY node, including Connector Nodes

[defaults]
user=tungsten
home-directory=/opt/continuent
application-user=app_user
application-password=secret
application-port=3306
profile-script=~/.bash_profile
replication-user=tungsten
replication-password=secret
mysql-allow-intensive-checks=true
skip-validation-check=THLSchemaChangeCheck

[nyc]
topology=clustered
master=db1
slaves=db2,db3
connectors=nyc-conn1,nyc-conn2

[london]
topology=clustered
master=db4
slaves=db5,db6
connectors=ldn-conn1,ldn-conn2b6
relay-source=nyc

[tokyo]
topology=clustered
master=db7
slaves=db8,db9
connectors=tky-conn1,tky-conn2
relay-source=nyc

[global]
composite-datasources=nyc,london,tokyo

4.5.3.15. Step 14: Install the Software on Active Cluster

Validate and install the new release on all nodes in the Active (nyc) cluster only:

shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
shell> tools/tpm validate-update

If validation shows no errors, run the install:

shell> tools/tpm update --replace-release

4.5.3.16. Step 15: Start Local Replicators on Active cluster

After the installation is complete on all nodes in the Active cluster, restart the replicator services:

shell> replicator start

After restarting, check the status of the replicator using the trepctl and check that all replicators are ONLINE:

shell> trepctl status

4.5.3.17. Step 16: Install the Software on remaining Clusters

Validate and install the new release on all nodes in the remaining Passive clusters (london and tokyo):

Important

The update should be performed on the Primary nodes within each cluster first, validation will report and error that the roles conflict (Primary vs Relay). This is expected and to override this warning the -f options should be used on the Primary nodes only

shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
shell> tools/tpm validate-update

If validation shows no errors, run the install:

On Primary Nodes:
shell> tools/tpm update --replace-release -f

On Replica Nodes:
shell> tools/tpm update --replace-release

4.5.3.18. Step 17: Start Local Replicators on remaining clusters

After the installation is complete on all nodes in the Active cluster, restart the replicator services:

shell> replicator start

After restarting, check the status of the replicator using the trepctl and check that all replicators are ONLINE:

shell> trepctl status

4.5.3.19. Step 18: Convert Datasource roles for Passive clusters

Following the upgrades, there are a number of "clean-up" steps that we need to perform within cctrl to ensure the datasource roles have been converted from the previous "master" roles to "relay" roles.

The following steps can be performed in a single cctrl session initiated from any node within any cluster

shell> cctrl

Connect to Active cluster
cctrl> use nyc

Check Status and verify all nodes online
cctrl> ls

Connect to COMPOSITE service
cctrl> use global

Place Active service online
cctrl> datasource nyc online

Connect to london Passive service
cctrl> use london

Convert old Primary to relay
cctrl> set force true
cctrl> datasource oldPrimaryhost offline
cctrl> datasource oldPrimaryhost relay

Repeat on tokyo Passive service
cctrl> use tokyo
cctrl> set force true
cctrl> datasource oldPrimaryhost offline
cctrl> datasource oldPrimaryhost relay

Connect to COMPOSITE service
cctrl> use global

Place Passive services online
cctrl> datasource london online
cctrl> datasource tokyo online

Place all clusters into AUTOMATIC
cctrl> set policy automatic

4.5.3.20. Step 19: Upgrade the Software on Connectors

Validate and install the new release on all connectors nodes:

shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
shell> tools/tpm validate-update

If validation shows no errors, run the install:

shell> tools/tpm update --replace-release

After upgrading previously stopped connectors, you will need to restart the process:

shell> connector restart

Warning

Upgrading a running connector will initiate a restart of the connector services, this will result in any active connections being terminated, therefore care should be taken with this process and client redirection should be handled accordingly prior to any connector upgrade/restart

4.5.4. Upgrade/Convert: From Multi-Site/Active-Active (MSAA) to Composite Active/Active (CAA)

These steps are specifically for the safe and successful upgrade (or conversion) of an existing Multi-Site/Active-Active (MSAA) topology, to a Composite Active/Active (CAA) topology.

Warning

It is very important to follow all the below steps and ensure full backups are taken when instructed. These steps can be destructive and without proper care and attention, data loss, data corruption or a split-brain scenario can happen.

Warning

Parallel apply MUST be disabled before starting your upgrade/conversion. You may re-enable it once the process has been fully completed. See Section 4.1.5.3, “How to Disable Parallel Replication Safely” and Section 4.1.2, “Enabling Parallel Apply During Install” for more information.

Note

The examples in this section are based on three clusters named 'nyc', 'london' and 'tokyo'

If you do not have exactly three clusters, please adjust this procedure to match your environment.

Click here for a video of the upgrade procedure, showing the full process from start to finish...

4.5.4.1. Supported Upgrade Paths

If you are currently installed using a staging-based installation, you must convert to an INI based installed, since INI based installation is the only option supported for the Composite Active/Active deployments. For notes on how to perform the staging to INI file conversion using the translatetoini.pl script, please visit Section 10.4.6, “Using the translatetoini.pl Script”.

Click here for a video of the INI conversion procedure, showing the full process from start to finish...

Path Supported
ini, in place Yes
ini, with Primary switch No
Staging No
Staging, with --no-connectors No

4.5.4.2. Upgrade Prerequisites

Warning

Parallel apply MUST be disabled before starting your upgrade. You may re-enable it once the upgrade has been fully completed. See Section 4.1.5.3, “How to Disable Parallel Replication Safely” and Section 4.1.2, “Enabling Parallel Apply During Install” for more information.

  • Obtain the latest v6 (or greater) Tungsten Cluster software build and place it within /opt/continuent/software

    If you are not upgrading, just converting, then this step is not required since you will already have the extracted software bundle available. However you must be running v6 or greater of Tungsten Cluster to deploy a CAA topology.

  • Extract the package

  • The examples below refer to the tungsten_prep_upgrade script, this can be located in the extracted software package within the tools directory.

4.5.4.3. Step 1: Backups

Take a full and complete backup of one node - this can be a Replica, and preferably should be either performed by:

  • Percona xtrabackup whilst database is open

  • Manual backup of all datafiles after stopping the database instance

4.5.4.4. Step 2: Stop the Cross-site Replicators

Important

Typically the cross-site replicators will be installed within /opt/replicator, if you have installed this in a different location you will need to pass this to the script in the examples using the --path option

  1. The following commands tell the replicators to go offline at a specific point, in this case when they receive an explicit heartbeat. This is to ensure that all the replicators stop at the same sequence number and binary log position. The replicators will NOT be offline until the explicit heartbeat has been issued a bit later in this step.

    • On every nyc node:

      shell> ./tungsten_prep_upgrade -o 
      ~or~
      shell> ./tungsten_prep_upgrade --service london --offline
      shell> ./tungsten_prep_upgrade --service tokyo --offline
    • On every london node:

      shell> ./tungsten_prep_upgrade -o 
      ~or~
      shell> ./tungsten_prep_upgrade --service nyc --offline
      shell> ./tungsten_prep_upgrade --service tokyo --offline
    • On every tokyo node:

      shell> ./tungsten_prep_upgrade -o 
      ~or~
      shell> ./tungsten_prep_upgrade --service london --offline
      shell> ./tungsten_prep_upgrade --service tokyo --offline
  2. Next, on the Primary hosts within each cluster we issue the heartbeat, execute the following using the cluster-specific trepctl, typically in /opt/continuent:

    shell> trepctl heartbeat -name offline_for_upg

    Ensure that every cross-site replicator on every node is now in the OFFLINE:NORMAL state:

    shell> mmtrepctl status
    ~or~
    shell> mmtrepctl --service {servicename} status
  3. Capture the position of the cross-site replicators on all nodes in all clusters.

    The service name provided should be the name of the remote service(s) for this cluster, so for example in the london cluster you get the positions for nyc and tokyo, and in nyc you get the position for london and tokyo, etc.

    • On every london node:

      shell> ./tungsten_prep_upgrade -g
      ~or~
      shell> ./tungsten_prep_upgrade --service nyc --get
      (NOTE: saves to ~/position-nyc-YYYYMMDDHHMMSS.txt)
      shell> ./tungsten_prep_upgrade --service tokyo --get
      (NOTE: saves to ~/position-tokyo-YYYYMMDDHHMMSS.txt)
    • On every nyc node:

      shell> ./tungsten_prep_upgrade -g
      ~or~
      shell> ./tungsten_prep_upgrade --service london --get
      (NOTE: saves to ~/position-london-YYYYMMDDHHMMSS.txt)
      shell> ./tungsten_prep_upgrade --service tokyo --get
      (NOTE: saves to ~/position-tokyo-YYYYMMDDHHMMSS.txt)
    • On every tokyo node:

      shell> ./tungsten_prep_upgrade -g
      ~or~
      shell> ./tungsten_prep_upgrade --service london --get
      (NOTE: saves to ~/position-london-YYYYMMDDHHMMSS.txt)
      shell> ./tungsten_prep_upgrade --service nyc --get
      (NOTE: saves to ~/position-nyc-YYYYMMDDHHMMSS.txt)
  4. Finally, to complete this step, stop the replicators on all nodes:

    shell> ./tungsten_prep_upgrade --stop 

4.5.4.5. Step 3: Export the tungsten_* Databases

On every node in each cluster, export the tracking schema for the cross-site replicator

Similar to the above step 2 when you captured the cross-site position, the same applies here, in london you export/backup nyc and tokyo, and in nyc you export/backup london and tokyo, and finally in tokyo you export/backup nyc and london.

  • On every london node:

    shell> ./tungsten_prep_upgrade -d --alldb 
    ~or~
    shell> ./tungsten_prep_upgrade --service nyc --dump
    shell> ./tungsten_prep_upgrade --service tokyo --dump
  • On every nyc node:

    shell> ./tungsten_prep_upgrade -d --alldb 
    ~or~
    shell> ./tungsten_prep_upgrade --service london --dump
    shell> ./tungsten_prep_upgrade --service tokyo --dump
  • On every tokyo node:

    shell> ./tungsten_prep_upgrade -d --alldb 
    ~or~
    shell> ./tungsten_prep_upgrade --service london --dump
    shell> ./tungsten_prep_upgrade --service nyc --dump

4.5.4.6. Step 4: Uninstall the Cross-site Replicators

To uninstall the cross-site replicators, execute the following on every node:

shell> cd {replicator software path}
shell> tools/tpm uninstall --i-am-sure

4.5.4.7. Step 5: Reload the tracking schema

We DO NOT want the reloading of this schema to appear in the binary logs on the Primary, therefore the reload needs to be performed on each node individually:

  • On every london node:

    shell> ./tungsten_prep_upgrade -s nyc -u tungsten -w secret -r
    shell> ./tungsten_prep_upgrade -s tokyo -u tungsten -w secret -r
    ~or~
    shell> ./tungsten_prep_upgrade --service nyc --user tungsten --password secret --restore
    shell> ./tungsten_prep_upgrade --service tokyo --user tungsten --password secret --restore
  • On every tokyo node:

    shell> ./tungsten_prep_upgrade -s london -u tungsten -w secret -r
    shell> ./tungsten_prep_upgrade -s nyc -u tungsten -w secret -r
    ~or~
    shell> ./tungsten_prep_upgrade --service london --user tungsten --password secret --restore
    shell> ./tungsten_prep_upgrade --service nyc --user tungsten --password secret --restore
  • On every nyc node:

    shell> ./tungsten_prep_upgrade -s london -u tungsten -w secret -r
    shell> ./tungsten_prep_upgrade -s tokyo -u tungsten -w secret -r
    ~or~
    shell> ./tungsten_prep_upgrade --service london --user tungsten --password secret --restore
    shell> ./tungsten_prep_upgrade --service tokyo --user tungsten --password secret --restore

4.5.4.8. Step 6: Update Configuration

Update /etc/tungsten/tungsten.ini to a valid v6 CAA configuration. An example of a valid configuration is as follows:

[defaults]
user=tungsten
home-directory=/opt/continuent
application-user=app_user
application-password=secret
application-port=3306
profile-script=~/.bash_profile
replication-user=tungsten
replication-password=secret
mysql-allow-intensive-checks=true
skip-validation-check=THLSchemaChangeCheck
start-and-report=true

[nyc]
topology=clustered
master=db1
members=db1,db2,db3
connectors=db1,db2,db3

[london]
topology=clustered
master=db4
members=db4,db5,db6
connectors=db4,db5,db6

[tokyo]
topology=clustered
master=db7
members=db8,db8,db9
connectors=db7,db8,db9

[global]
topology=composite-multi-master
composite-datasources=nyc,london,tokyo

Warning

It is critical that you ensure the master= entry in the configuration matches the current, live Primary host in your cluster for the purpose of this process.

4.5.4.9. Step 7: Enter Maintenance Mode

Enable Maintenance mode on all clusters using the cctrl command:

shell> cctrl
cctrl> set policy maintenance

4.5.4.10. Step 8: Stop Managers

Stop the manager process on all nodes:

shell> manager stop

4.5.4.11. Step 9: Install/Update the Software

Run the update as follows:

shell> tools/tpm update --replace-release

Important

If you had start-and-report=false you may need to restart manager services

Warning

Until all nodes have been updated, the output from cctrl may show services in an OFFLINE, STOPPED, or UNKNOWN state. This is to be expected until all the new v6 managers are online

4.5.4.12. Step 10: Start Managers

After the installation is complete on all nodes, start the manager services:

shell> manager start

4.5.4.13. Step 11: Return to Automatic Mode

Return all clusters to Automatic mode using the cctrl command:

shell> cctrl
cctrl> set policy automatic

4.5.4.14. Step 12: Validate

  1. Identify the cross-site service name(s):

    shell> trepctl services

    In our example, the local cluster service will one of london, nyc or tokyo depending on the node you are on. The cross site replication services would be:

    (within the london cluster)
    london_from_nyc
    london_from_tokyo
    
    (within the nyc cluster)
    nyc_from_london
    nyc_from_tokyo
    
    (within the tokyo cluster)
    tokyo_from_london
    tokyo_from_nyc
  2. Upon installation, the new cross-site replicators will come online, it is possible that they may be in an OFFLINE:ERROR state due to a change in Epoch numbers, check this on the Primary in each cluster by looking at the output from the trepctl command.

    Check each service as needed based on the status seen above:

    shell> trepctl -service london_from_nyc status
    shell> trepctl -service london_from_tokyo status
    ~or~
    shell> trepctl -service nyc_from_london status
    shell> trepctl -service nyc_from_tokyo status
    ~or~
    shell> trepctl -service tokyo_from_london status
    shell> trepctl -service tokyo_from_nyc status
  3. If the replicator is in an error state due to an epoch difference, you will see an error similar to the following:

    pendingErrorSeqno      : -1
    pendingExceptionMessage: Client handshake failure: Client response
    validation failed: Log epoch numbers do not match: master source
    ID=db1 client source ID=db4 seqno=4 server epoch number=0 client
    epoch number=4
    pipelineSource         : UNKNOWN

    The above error is due to the epoch numbers changing as a result of the replicators being restarted, and the new replicators being installed.

    To resolve, simply force the replicator online as follows:

    shell> trepctl -service london_from_nyc online -force
    shell> trepctl -service london_from_tokyo online -force
    ~or~
    shell> trepctl -service nyc_from_london online -force
    shell> trepctl -service nyc_from_tokyo online -force
    ~or~
    shell> trepctl -service tokyo_from_london online -force
    shell> trepctl -service tokyo_from_nyc online -force
  4. If the replicator shows an error state similar to the following:

    pendingErrorSeqno      : -1
    pendingExceptionMessage: Client handshake failure: Client response
    validation failed: Master log does not contain requested
    transaction: master source ID=db1 client source ID=db2 requested
    seqno=1237 client epoch number=0 master min seqno=5 master max
    seqno=7
    pipelineSource         : UNKNOWN

    The above error is possible if during install the Replica replicators came online before the Primary.

    Providing the steps above have been followed, just bringing the replicator online should be enough to get the replicator to retry and carry on successfully:

    shell> trepctl -service london_from_nyc online 
    shell> trepctl -service london_from_tokyo online 
    ~or~
    shell> trepctl -service nyc_from_london online 
    shell> trepctl -service nyc_from_tokyo online 
    ~or~
    shell> trepctl -service tokyo_from_london online 
    shell> trepctl -service tokyo_from_nyc online 

Important

Known Issue (CT-569)

During an upgrade, the tpm process will incorrectly create additional, empty, tracking schemas based on the service names of the auto-generated cross-site services.

For example, if your cluster has service names east and west, you should only have tracking schemas for tungsten_east and tungsten_west

In some cases, you will also see tungsten_east_from_west and/or tungsten_west_from_east

These tungsten_x_from_y tracking schemas will be empty and unused. They can be safely removed by issuing DROP DATABASE tungsten_x_from_y on a Primary node, or they can be safely ignored

4.5.5. Installing an Upgraded JAR Patch

Warning

The following instructions should only be used if Continuent Support have explicitly provided you with a customer JAR file designed to address a problem with your deployment.

If a custom JAR has been provided by Continuent Support, the following instructions can be used to install the JAR into your installation.

  1. Determine your staging directory or untarred installation directory:

    shell> tpm query staging

    Go to the appropriate host (if necessary) and the staging directory.

    shell>  cd tungsten-clustering-7.1.4-10
  2. Change to the correct directory. For example, to update Tungsten Replicator change to tungsten-replicator/lib; for Tungsten Manager use tungsten-manager/lib; for Tungsten Connector use tungsten-connector/lib:

    shell> cd tungsten-replicator/lib
  3. Copy the existing JAR to a backup file:

    shell> cp tungsten-replicator.jar tungsten-replicator.jar.orig
  4. Copy the replacement JAR into the directory:

    shell> cp /tmp/tungsten-replicator.jar . 
  5. Change back to the root directory of the staging directory:

    shell> cd ../..
  6. Update the release:

    shell> ./tools/tpm update --replace-release

4.5.6. Installing Patches

Warning

This procedure should only be followed with the advice and guidance of a Continuent Support Engineer.

There are two ways we can patch the running environment, and the method chosen will depend on the severity of the patch and whether or not your use case would allow for a maintenance window

  • Upgrade using a full software update following the standard upgrade procedures

  • Use the patch command to patch just the files necessary

From time to time, Continuent may provide you with a patch to apply as a quicker way to fix small issues. Patched software will always be provided in a subsequent release so the manual patch method described here should only be used as a temporary measure to patch a live installation when a full software update may not immediately be possible

You will have been supplied with a file containing the patch, for the purpose of this example we will assume the file you have been given is called undeployallnostop.patch

  1. Place cluster into maintenance mode

  2. On each node of your installation:

    1. Copy the supplied patch file to the host

    2. From the installed directory (Typically this would be /opt/continuent) issue the following:

      shell> cd /opt/continuent/tungsten
      shell> patch -p1 -i undeployallnostop.patch

  3. Return cluster to automatic mode

Warning

If a tpm update --replace-release is issued from the original software staging directory, the manual patch applied above will be over-written and removed.

The manual patch method is a temporary approach to patching a running environment, but is not a total replacement for a proper upgrade.

Following a manual patch, you MUST plan to upgrade the staged software to avoid reverting to an unpatched system.

If in doubt, always check with a Continuent Support Engineer.

4.5.7. Upgrading to v7.0.0+

Warning

v7 is a major release with many changes, specifically to security. At this time, upgrading directly to v7 is only supported from v5 onwards. If security is NOT enabled in your installation, then upgrading from an older release may work, however any issues encountered will not be addressed and upgrading to v6 first will be the advised route.

Warning

Whilst every care has been taken to ensure upgrades are as smooth and easy as possible, ALWAYS ensure full backups are taken before proceeding, and if possible, test the upgrade on a non-Production environment first.

4.5.7.1. Background

4.5.7.1.1. v6 (and earlier) behavior

Prior to v7, Tungsten came with security turned OFF through the tpm flag disable-security-controls set to true by default. This flag, when set to false would translate to the following settings being applied:

file-protection-level=0027
rmi-ssl=true
thl-ssl=true
rmi-authentication=true
jgroups-ssl=true

This would enable SSL communication between Tungsten components. However, connection to the database remained unencrypted, which would translate to the following settings being applied:

datasource-enable-ssl=false
connector-ssl=false

Setting these to true is possible, however there are many more manual steps that would have been required.

4.5.7.1.2. New behavior in v7

v7 enables full security by default, so the disable-security-controls flag will default to false when not specified.

In addition to the default value changing, disable-security-controls now enables encrypted communication to the database. Setting this value to false, now translates to the following settings being applied:

file-protection-level=0027
rmi-ssl=true
thl-ssl=true
rmi-authentication=true
jgroups-ssl=true
datasource-enable-ssl=true 
connector-ssl=true
4.5.7.1.3. Summary

In summary, this change in behavior means that upgrades need to be handled with care and appropriate decisions being made, both by the tpm process, and by the "human" to decide on what end result is desired. The various options and examples are outlined in the following sections of this document.

4.5.7.2. Upgrade Decisions

4.5.7.2.1. Keep existing level of security

This is the easiest and smoothest approach. tpm will process your configuration and do its best to maintain the same level of security. In order to achieve that, tpm will dynamically update your configuration (either the tungsten.ini file for INI installs, or the deploy.cfg for staging installs) with additional properties to adjust the level of security to match.

The properties that tpm will add to your configuration will be some or all of the following depending on the initial starting point of your configuration:

disable-security-controls
connector-rest-api-ssl
manager-rest-api-ssl
replicator-rest-api-ssl
datasource-enable-ssl
enable-connector-ssl

You can now proceed with the upgrade, refer to Section 4.5.7.7, “Steps to upgrade using tpm” for the required steps

4.5.7.2.2. Apply new recommendations and setup security

The following security setting levels can be enabled, and will require user action prior to upgrading. These are:

  1. Internal Encryption and Authentication

  2. Tungsten to Database Encryption

  3. Application (Connector) to Database Encryption

  4. API SSL

Applying all of the above steps will bring full security, equivalent to the default v7 configuration.

The steps to enable will depend on what (if any) security is enabled in your existing installation. The following sections outline the steps required to be performed to enable security for each of the various layers. To understand whether you have configured any of the various layers of security, the following summary will help to understand your configuration:

No Security

If no security has been configured, the installation that you are starting from will have disable-security-controls=true (or it will not supplied at all) and no additional securoty properties will be supplied.

Partial Security

The installation that you are starting from will have partial security in place. This could be a combination of any of the following:

To upgrade and enable security, you should follow one or more of the following steps based on your requirements. At a minimum, the first step should always be included, the remaining steps are optional.

4.5.7.3. Setup internal encryption and authentication

Prior to running the upgrade, you need to manually create the keystore, to do this follow these steps on one host, and then copy the files to all other hosts in your topology:

db1> mkdir /etc/tungsten/secure
db1> keytool -genseckey -alias jgroups -validity 3650 -keyalg Blowfish -keysize 56 \
-keystore /etc/tungsten/secure/jgroups.jceks -storepass tungsten -keypass tungsten -storetype JCEKS

If you have an INI based install, and this is the only level of security you plan on configuring you should now copy these new keystores to all other hosts in your topology. If you plan to enable SSL at the other remaining layers, or you use a Staging based install, then skip this copy step.

db1> for host in db2 db3 db4 db5 db6; do 
ssh ${host} mkdir /etc/tungsten/secure
scp /etc/tungsten/secure/*.jceks ${host}:/etc/tungsten/secure
done

Enabling internal encryption and authentication will also enable API SSL by default.

If you need to enable encryption to the underlying database, now proceed to the next step Section 4.5.7.4, “Enable Tungsten to Database Encryption” before running the upgrade, otherwise you can then start the upgrade by following the steps in Section 4.5.7.7, “Steps to upgrade using tpm”.

The following additional configuration properties will need adding to your existing configuration. The suggested process based on an INI or Staging based install are outlined in the final upgrade steps referenced above.

disable-security-controls=false
connector-rest-api-ssl=true
manager-rest-api-ssl=true
replicator-rest-api-ssl=true
java-jgroups-keystore-path=/etc/tungsten/secure/jgroups.jceks

4.5.7.4. Enable Tungsten to Database Encryption

The following prerequisite steps must be performed before continuing with this step

In this step, you pre-create the various keystores required and register the MySQL certificates for Tungsten. Execute all of the following steps on a single host, for example, db1. In the example below it is assumed that the mysql certificates reside in /etc/mysql/certs. If you use the example syntax below, you will also need to ensure the following directory exists: /etc/tungsten/secure

These commands will import the MySQL certificates into the required Tungsten truststores.

db1> keytool -importkeystore -srckeystore /etc/mysql/certs/client-cert.p12 -srcstoretype PKCS12 \
-destkeystore /etc/tungsten/secure/keystore.jks -deststorepass tungsten -srcstorepass tungsten

db1> keytool -import -alias mysql -file /etc/mysql/certs/ca.pem -keystore /etc/tungsten/secure/truststore.ts \
-storepass tungsten -noprompt

If you have an INI based install, and you do not intend to configure SSL for your applications (via Connectors), or if your connectors reside on remote, dedicated hosts, you should now copy all of the generated keystores and truststores to all of the other hosts. If you use a Staging based install, then skip this copy step.

db1> for host in db2 db3 db4 db5 db6; do 
ssh ${host} mkdir /etc/tungsten/secure
scp /etc/tungsten/secure/*.jceks ${host}:/etc/tungsten/secure 
scp /etc/tungsten/secure/*.jks ${host}:/etc/tungsten/secure
scp /etc/tungsten/secure/*.ts ${host}:/etc/tungsten/secure
done

If you need to enable encryption to the underlying database from the connectors, now proceed to Section 4.5.7.5, “Enable Connector to Database Encryption” before running the upgrade, alternatively you can now follow the steps outlined in Section 4.5.7.7, “Steps to upgrade using tpm”

The following additional configuration properties will need adding to your existing configuration. The suggested process based on an INI or Staging based install are outlined in the final upgrade steps referenced above.

datasource-enable-ssl=true
java-truststore-path=/etc/tungsten/secure/truststore.ts
java-truststore-password=tungsten
java-keystore-path=/etc/tungsten/secure/keystore.jks
java-keystore-password=tungsten
datasource-mysql-ssl-cert=/etc/mysql/certs/client-cert.pem
datasource-mysql-ssl-key=/etc/mysql/certs/client-key.pem
datasource-mysql-ssl-ca=/etc/mysql/certs/ca.pem

4.5.7.5. Enable Connector to Database Encryption

The steps outlined in this section will need to be performed on all nodes where the Connector has been installed

If you are also enabling Internal Encryption, you would have followed the steps in Section 4.5.7.3, “Setup internal encryption and authentication” and you would have a number of files already in /etc/tungsten/secure. This next step will pre-create the keystore and truststore, and register the MySQL certificates for the Connectors. Execute all of the following steps on a single host, in this example, db1. In the example below it is assumed the mysql certificates reside in /etc/mysql/certs. If you use the example syntax below, you will need to ensure the following directory also exists: /etc/tungsten/secure

db1> keytool -importkeystore -srckeystore /etc/mysql/certs/client-cert.p12 \
-srcstoretype PKCS12 -destkeystore /etc/tungsten/secure/tungsten_connector_keystore.jks \
-deststorepass tungsten -srcstorepass tungsten

db1> keytool -import -alias mysql -file /etc/mysql/certs/ca.pem \
-keystore /etc/tungsten/secure/tungsten_connector_truststore.ts -storepass tungsten -noprompt

Now that all of the necessary steps have been taken to create the various keystores, and if you use an INI based install, you now need to copy all of these files to all other hosts in your topology. If you are using a Staging based installation, then skip this copy step.

db1> for host in db2 db3 db4 db5 db6; do 
ssh ${host} mkdir /etc/tungsten/secure
scp /etc/tungsten/secure/*.jceks ${host}:/etc/tungsten/secure
scp /etc/tungsten/secure/*.jks ${host}:/etc/tungsten/secure 
scp /etc/tungsten/secure/*.ts ${host}:/etc/tungsten/secure
done

Once the steps above have been performed, you can then continue with the upgrade, following the steps outlined in Section 4.5.7.7, “Steps to upgrade using tpm”

The following additional configuration properties will need adding to your existing configuration. The suggested process based on an INI or Staging based install are outlined in the final upgrade steps referenced above.

enable-connector-ssl=true
java-connector-keystore-path=/etc/tungsten/secure/tungsten_connector_keystore.jks
java-connector-keystore-password=tungsten
java-connector-truststore-path=/etc/tungsten/secure/tungsten_connector_truststore.ts
java-connector-truststore-password=tungsten

4.5.7.6. Enable MySQL SSL

A prerequisite to enabling full security, is to enable SSL within your database if this isn't already configured. To do this, we can use the mysql_ssl_rsa_setup tool supplied with most distributions of MySQL. If you do not have this tool, or require more detail, you can refer to Section 5.13.1, “Enabling Database SSL”. The steps below summarise the process using the mysql_ssl_rsa_setup

  1. The first step is to setup the directories for the certs, perform this on ALL hosts in your topology:

    shell> sudo mkdir -p /etc/mysql/certs
    shell> sudo chown -R tungsten: /etc/mysql/certs/

    NB: The ownership is temporarily set to tungsten so that the subsequent scp will work between hosts.

  2. This next step should be performed on just one single host, for the purpose of this example we will use db1 as the host:

    db1> mysql_ssl_rsa_setup -d /etc/mysql/certs/
    db1> openssl pkcs12 -export -inkey /etc/mysql/certs/client-key.pem \
    -name mysql -in /etc/mysql/certs/client-cert.pem -out /etc/mysql/certs/client-cert.p12 \
    -passout pass:tungsten

    Important

    When using OpenSSL 3.0 with Java 1.8, you MUST add the -legacy option to the openssl command.

    db1> for host in db2 db3 db4 db5 db6; do 
    scp /etc/mysql/certs/* ${host}:/etc/mysql/certs 
    done
  3. Next, on every host we need to reset the directory ownership

    shell> sudo chown -R mysql: /etc/mysql/certs/
    shell> sudo chmod g+r /etc/mysql/certs/client-*
  4. Now on every host, we need to reconfigure MySQL. Add the following properties into your my.cnf

    [mysqld]
    ssl-ca=/etc/mysql/certs/ca.pem
    ssl-cert=/etc/mysql/certs/server-cert.pem
    ssl-key=/etc/mysql/certs/server-key.pem
    
    [client]
    ssl-cert=/etc/mysql/certs/client-cert.pem
    ssl-key=/etc/mysql/certs/client-key.pem
    ssl-ca=/etc/mysql/certs/ca.pem
  5. Next, place your cluster(s) into MAINTENANCE mode

    shell> cctrl
    cctrl> set policy maintenance
  6. Restart MySQL for the new settings to take effect

    shell> sudo service mysqld restart
  7. Finally, return your cluster(s) into AUTOMATIC mode

    shell> cctrl
    cctrl> set policy automatic

4.5.7.7. Steps to upgrade using tpm

When you are ready to perform the upgrade, the following steps should be followed:

4.5.7.7.1. Steps for INI Based Installations
  1. Ensure you place your cluster(s) into MAINTENANCE mode

  2. If no additional steps taken, and you wish to maintain the same level of security, skip Step 3, and proceed directly to Step 4.

  3. Update your tungsten.ini and include some, or all, of the options below depending on which steps you took earlier. All entries should be placed within the [defaults] stanza.

    disable-security-controls=false
    connector-rest-api-ssl=true
    manager-rest-api-ssl=true
    replicator-rest-api-ssl=true
    java-jgroups-keystore-path=/etc/tungsten/secure/jgroups.jceks

    If "Tungsten to Database Encryption" IS configured, also add:

    datasource-enable-ssl=true
    java-truststore-path=/etc/tungsten/secure/truststore.ts
    java-truststore-password=tungsten
    java-keystore-path=/etc/tungsten/secure/keystore.jks
    java-keystore-password=tungsten
    datasource-mysql-ssl-cert=/etc/mysql/certs/client-cert.pem
    datasource-mysql-ssl-key=/etc/mysql/certs/client-key.pem
    datasource-mysql-ssl-ca=/etc/mysql/certs/ca.pem

    If "Tungsten to Database Encryption" IS NOT configured, also add:

    datasource-enable-ssl=false

    If "Application (Connector) to Database Encryption" IS configured, also add:

    enable-connector-ssl=true
    java-connector-keystore-path=/etc/tungsten/secure/tungsten_connector_keystore.jks
    java-connector-keystore-password=tungsten
    java-connector-truststore-path=/etc/tungsten/secure/tungsten_connector_truststore.ts
    java-connector-truststore-password=tungsten

    If "Application (Connector) to Database Encryption" IS NOT configured, also add:

    enable-connector-ssl=false

    Important

    If start-and-report=true, remove this value or set to false

  4. Obtain the TAR or RPM package for your installation. If using a TAR file unpack this into your software staging tree, typically /opt/continuent/software. If you use the INI install method, this needs to be performed on every host. For staging install, this applies to the staging host only.

  5. Change into the directory for the software

    shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
  6. Issue the following command on all hosts.

    shell> tools/tpm update --replace-release

    When upgrading the connectors, you could include the optional --no-connectors option if you wish to control the restart of the connectors manually

  7. For Multi-Site/Active-Active topologies, you will also need to repeat the steps for the cross-site replicators

  8. Finally, before returning the cluster(s) to AUTOMATIC, you will need to sync the new certificates, created by the upgrade, to all hosts. This step will be required even if you have disabled security as these files will be used by the API and also, if you choose to enable it, THL Encryption.

    From one host, copy the certificate and keystore files to ALL other hosts in your topology. The following scp command is an example assuming you are issuing from db1, and the install directory is /opt/continuent:

    db1> for host in db2 db3 db4 db5 db6; do
    scp /opt/continuent/share/[jpt]* ${host}:/opt/continuent/share
    scp /opt/continuent/share/.[jpt]* ${host}:/opt/continuent/share
    done

    Note

    The examples assume you have the ability to scp between hosts as the tungsten OS user. If your security restrictions do not permit this, you will need to use alternative procedures appropriate to your environment to ensure these files are in sync across all hosts before continuing.

    If the files are not in sync between hosts, the software will fail to start!

  9. You will also need to repeat this if you have a Multi-Site/Active-Active topology for the cross-site replicators:

    db1> for host in db2 db3 db4 db5 db6; do
    scp /opt/replicator/share[jpt]* ${host}:/opt/replicator/share
    scp /opt/replicator/share.[jpt]* ${host}:/opt/replicator/share
    done
  10. Restart all tungsten components, one host at a time

    shell> manager restart
    shell> replicator restart
    shell> connector restart
  11. Return the cluster(s) to AUTOMATIC mode

4.5.7.7.2. Steps for Staging Based Installations
  1. Ensure you place your cluster(s) into MAINTENANCE mode

  2. Obtain the TAR or RPM package for your installation. If using a TAR file unpack this into your software staging tree, typically /opt/continuent/software. If you use the INI install method, this needs to be performed on every host. For staging install, this applies to the staging host only.

  3. Change into the directory for the software and fetch the configuration, e.g

    shell> cd /opt/continuent/software/tungsten-clustering-7.1.4-10
    shell> tpm reverse > deploy.sh
  4. If no additional steps taken, and you wish to maintain the same level of security, skip Step 5, and proceed directly to Step 6.

  5. Edit the deploy.sh file just created, and include some, or all, of the options below depending on which steps you took earlier (They should be placed within the defaults.

    --disable-security-controls=false
    --connector-rest-api-ssl=true
    --manager-rest-api-ssl=true
    --replicator-rest-api-ssl=true
    --java-jgroups-keystore-path=/etc/tungsten/secure/jgroups.jceks

    If "Tungsten to Database Encryption" IS configured, also add:

    --datasource-enable-ssl=true
    --java-truststore-path=/etc/tungsten/secure/truststore.ts
    --java-truststore-password=tungsten
    --java-keystore-path=/etc/tungsten/secure/keystore.jks
    --java-keystore-password=tungsten
    --datasource-mysql-ssl-cert=/etc/mysql/certs/client-cert.pem
    --datasource-mysql-ssl-key=/etc/mysql/certs/client-key.pem
    --datasource-mysql-ssl-ca=/etc/mysql/certs/ca.pem

    If "Tungsten to Database Encryption" IS NOT configured, also add:

    --datasource-enable-ssl=false

    If "Application (Connector) to Database Encryption" IS configured, also add:

    --enable-connector-ssl=true
    --java-connector-keystore-path=/etc/tungsten/secure/tungsten_connector_keystore.jks
    --java-connector-keystore-password=tungsten
    --java-connector-truststore-path=/etc/tungsten/secure/tungsten_connector_truststore.ts
    --java-connector-truststore-password=tungsten

    If "Application (Connector) to Database Encryption" IS NOT configured, also add:

    --enable-connector-ssl=false

    Important

    If start-and-report=true, remove this value or set to false

    An example of a BEFORE and AFTER edit including all options:

    shell> cat deploy.sh
    # BEFORE
    tools/tpm configure defaults \
    --reset \
    --application-password=secret \
    --application-port=3306 \
    --application-user=app_user \
    --disable-security-controls=true \
    --install-directory=/opt/continuent \
    --mysql-allow-intensive-checks=true \
    --profile-script=/home/tungsten/.bash_profile \
    --replication-password=secret \
    --replication-user=tungsten \
    --start-and-report=true \
    --user=tungsten
    # Options for the nyc data service
    tools/tpm configure nyc \
    --connectors=db1,db2,db3 \
    --master=db1 \
    --slaves=db2,db3 \
    --topology=clustered
    shell> cat deploy.sh
    # AFTER 
    tools/tpm configure defaults \
    --reset \
    --application-password=secret \
    --application-port=3306 \
    --application-user=app_user \
    --install-directory=/opt/continuent \
    --mysql-allow-intensive-checks=true \
    --profile-script=/home/tungsten/.bash_profile \
    --replication-password=secret \
    --replication-user=tungsten \
    --user=tungsten \
    --start-and-report=false \
    --disable-security-controls=false \
    --connector-rest-api-ssl=true \
    --manager-rest-api-ssl=true \
    --replicator-rest-api-ssl=true \
    --datasource-enable-ssl=true \
    --java-jgroups-keystore-path=/etc/tungsten/secure/jgroups.jceks \
    --java-truststore-path=/etc/tungsten/secure/truststore.ts \
    --java-truststore-password=tungsten \
    --java-keystore-path=/etc/tungsten/secure/keystore.jks \
    --java-keystore-password=tungsten \
    --enable-connector-ssl=true \
    --java-connector-keystore-path=/etc/tungsten/secure/tungsten_connector_keystore.jks \
    --java-connector-keystore-password=tungsten \
    --java-connector-truststore-path=/etc/tungsten/secure/tungsten_connector_truststore.ts \
    --java-connector-truststore-password=tungsten \
    --datasource-mysql-ssl-cert=/etc/mysql/certs/client-cert.pem \
    --datasource-mysql-ssl-key=/etc/mysql/certs/client-key.pem \
    --datasource-mysql-ssl-ca=/etc/mysql/certs/ca.pem
    
    # Options for the nyc data service
    tools/tpm configure nyc \
    --connectors=db1,db2,db3 \
    --master=db1 \
    --slaves=db2,db3 \
    --topology=clustered
  6. Next, source the file to load the configuration and then execute the update:

    shell> source deploy.sh
    shell> tools/tpm update --replace-release

    You may wish to include the optional --no-connectors option if you wish to control the restart of the connectors manually

  7. For Multi-Site/Active-Active topologies, you will also need to repeat the steps for the cross-site replicators

  8. Finally, before returning the cluster(s) to AUTOMATIC, you will need to sync the new certificates, created by the upgrade, to all hosts. This step will be required even if you have disabled security as these files will be used by the API and also, if you choose to enable it, THL Encryption.

    From one host, copy the certificate and keystore files to ALL other hosts in your topology. The following scp command is an example assuming you are issuing from db1, and the install directory is /opt/continuent:

    db1> for host in db2 db3 db4 db5 db6; do
    scp /opt/continuent/share/[jpt]* ${host}:/opt/continuent/share
    scp /opt/continuent/share/.[jpt]* ${host}:/opt/continuent/share
    done

    Note

    The examples assume you have the ability to scp between hosts as the tungsten OS user. If your security restrictions do not permit this, you will need to use alternative procedures appropriate to your environment to ensure these files are in sync across all hosts before continuing.

    If the files are not in sync between hosts, the software will fail to start!

  9. You will also need to repeat this if you have a Multi-Site/Active-Active topology for the cross-site replicators:

    db1> for host in db2 db3 db4 db5 db6; do
    scp /opt/replicator/share[jpt]* ${host}:/opt/replicator/share
    scp /opt/replicator/share.[jpt]* ${host}:/opt/replicator/share
    done
  10. Restart all tungsten components, one host at a time

    shell> manager restart
    shell> replicator restart
    shell> connector restart
  11. Return the cluster(s) to AUTOMATIC mode

4.5.7.8. Optional Post-Upgrade steps to configure API

Once the upgrade has been completed, if you plan on using the API you will need to complete a few extra steps before you can use it. By default, after installation the API will only allow the ping method and the createAdminUser method.

To open up the API and access all of its features, you will need to configure the API User. To do this, execute the following on all hosts (Setting the value of pass to your preferred password):

shell> curl -k -H 'Content-type: application/json' --request POST 'https://127.0.0.1:8096/api/v2/createAdminUser?i-am-sure=true' \
> --data-raw '{
>   "payloadType": "credentials",
>   "user":"tungsten",
>   "pass":"security"
> }'

For more information on using the new API, please refer to Chapter 11, Tungsten REST API (APIv2)

4.6. Removing Datasources, Managers or Connectors

Removing components from a dataservice is quite straightforward, usually involved both modifying the running service and changing the configuration. Changing the configuration is necessary to ensure that the host is not re-configured and installed when the installation is next updated.

In this section:

4.6.1. Removing a Datasource from an Existing Deployment

To remove a datasource from an existing deployment there are two primary stages, removing it from the active service, and then removing it from the active configuration.

For example, to remove host6 from a service:

  1. Check the current service state:

    [LOGICAL] /alpha > ls
    
    COORDINATOR[host1:AUTOMATIC:ONLINE]
    
    ROUTERS:
    +----------------------------------------------------------------------------+
    |connector@host1[11401](ONLINE, created=17, active=0)                        |
    |connector@host2[7998](ONLINE, created=0, active=0)                          |
    |connector@host3[31540](ONLINE, created=0, active=0)                         |
    |connector@host4[26829](ONLINE, created=27, active=1)                        |
    +----------------------------------------------------------------------------+
    
    DATASOURCES:
    +----------------------------------------------------------------------------+
    |host1(slave:ONLINE, progress=373, latency=0.000)                            |
    |STATUS [OK] [2014/02/12 12:48:14 PM GMT]                                    |
    +----------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                     |
    |  REPLICATOR(role=slave, master=host6, state=ONLINE)                        |
    |  DATASERVER(state=ONLINE)                                                  |
    |  CONNECTIONS(created=30, active=0)                                         |
    +----------------------------------------------------------------------------+
    
    +----------------------------------------------------------------------------+
    |host2(slave:ONLINE, progress=373, latency=1.000)                            |
    |STATUS [OK] [2014/01/24 05:02:34 PM GMT]                                    |
    +----------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                     |
    |  REPLICATOR(role=slave, master=host6, state=ONLINE)                        |
    |  DATASERVER(state=ONLINE)                                                  |
    |  CONNECTIONS(created=0, active=0)                                          |
    +----------------------------------------------------------------------------+
    
    +----------------------------------------------------------------------------+
    |host3(slave:ONLINE, progress=373, latency=1.000)                            |
    |STATUS [OK] [2014/02/11 03:17:08 PM GMT]                                    |
    +----------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                     |
    |  REPLICATOR(role=slave, master=host6, state=ONLINE)                        |
    |  DATASERVER(state=ONLINE)                                                  |
    |  CONNECTIONS(created=0, active=0)                                          |
    +----------------------------------------------------------------------------+
    
    +----------------------------------------------------------------------------+
    |host6(master:ONLINE, progress=373, THL latency=0.936)                       |
    |STATUS [OK] [2014/02/12 12:39:52 PM GMT]                                    |
    +----------------------------------------------------------------------------+
    |  MANAGER(state=ONLINE)                                                     |
    |  REPLICATOR(role=master, state=ONLINE)                                     |
    |  DATASERVER(state=ONLINE)                                                  |
    |  CONNECTIONS(created=14, active=1)                                         |
    +----------------------------------------------------------------------------+
  2. Switch to MAINTENANCE policy mode:

    [LOGICAL] /alpha > set policy maintenance
    policy mode is now MAINTENANCE
  3. Switch to administration mode:

    [LOGICAL] /alpha > admin
  4. Remove the node from the active service using the rm command. You will be warned that this is an expert command and to confirm the operation:

    [ADMIN] /alpha > rm host6
    
    WARNING: This is an expert-level command:
    Incorrect use may cause data corruption
    or make the cluster unavailable.
    
    Do you want to continue? (y/n)> y
  5. Switch back to logical mode:

    [ADMIN] /alpha > logical
  6. Switch to AUTOMATIC policy mode:

    [LOGICAL] /alpha > set policy automatic
    policy mode is now AUTOMATIC

Now the node has been removed from the active dataservice, the services must be stopped and then removed from the configuration.

  1. Stop the running services:

    shell> stopall
  2. Now you must remove the node from the configuration, although the exact method depends on which installation method used with tpm:

    • If you are using staging directory method with tpm:

      shell> tpm query staging
      tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10
      
      shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
      The staging USER is tungsten
      
      shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
      The staging HOST is db1
      
      shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
      The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10
      
      shell> ssh {STAGING_USER}@{STAGING_HOST}
      shell> cd {STAGING_DIRECTORY}
      shell> ./tools/tpm configure alpha \
          --connectors=host1,host2,host3,host4 \
          --members=host1,host2,host3
      

      Run the tpm command to update the software with the Staging-based configuration:

      shell> ./tools/tpm update

      For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

    • If you are using the INI file method with tpm:

      • Remove the INI configuration file:

        shell> rm /etc/tungsten/tungsten.ini
  3. Stop the replicator/manager from being started again.

    • If this all the services on the this node, replicator, manager and connector are being removed, remove the Tungsten Cluster installation entirely:

      • Remove the startup scripts from your server:

        shell> sudo /opt/continuent/tungsten/cluster-home/bin/undeployall
      • Remove the installation directory:

        shell> rm -rf /opt/continuent
    • If the replicator/manager has been installed on a host but the connector is not being removed, remove the start scripts to prevent the services from being automatically started:

      shell> rm /etc/init.d/tmanager
      shell> rm /etc/init.d/treplicator

4.6.2. Removing a Composite Datasource/Cluster from an Existing Deployment

To remove an entire composite datasource (cluster) from an existing deployment there are two primary stages, removing it from the active service, and then removing it from the active configuration.

For example, to remove cluster west from a composite dataservice:

  1. Check the current service state:

    shell> cctrl -multi
    [LOGICAL] / > ls
    
    +----------------------------------------------------------------------------+
    |DATA SERVICES:                                                              |
    +----------------------------------------------------------------------------+
    east
    global
    west
    
    [LOGICAL] / > use global
    [LOGICAL] /global > ls
    
    COORDINATOR[db1:AUTOMATIC:ONLINE]
    
    DATASOURCES:
    +----------------------------------------------------------------------------+
    |east(composite master:ONLINE)                                               |
    |STATUS [OK] [2017/05/16 01:25:31 PM UTC]                                    |
    +----------------------------------------------------------------------------+
    
    +----------------------------------------------------------------------------+
    |west(composite slave:ONLINE)                                                |
    |STATUS [OK] [2017/05/16 01:25:30 PM UTC]                                    |
    +----------------------------------------------------------------------------+
  2. Switch to MAINTENANCE policy mode:

    [LOGICAL] /global > set policy maintenance
    policy mode is now MAINTENANCE
  3. Remove the composite member cluster from the composite service using the drop command.

    [LOGICAL] /global > drop composite datasource west
    COMPOSITE DATA SOURCE 'west@global' WAS DROPPED
    
    [LOGICAL] /global > ls
    
    COORDINATOR[db1:AUTOMATIC:ONLINE]
    
    DATASOURCES:
    +----------------------------------------------------------------------------+
    |east(composite master:ONLINE)                                               |
    |STATUS [OK] [2017/05/16 01:25:31 PM UTC]                                    |
    +----------------------------------------------------------------------------+
    
    [LOGICAL] /global > cd /
    [LOGICAL] / > ls
    
    +----------------------------------------------------------------------------+
    |DATA SERVICES:                                                              |
    +----------------------------------------------------------------------------+
    east
    global
  4. If the removed composite datasource still appears in the top-level listing, then you will need to clean up by hand. For example:

    [LOGICAL] /global > cd /
    [LOGICAL] / > ls
    
    +----------------------------------------------------------------------------+
    |DATA SERVICES:                                                              |
    +----------------------------------------------------------------------------+
    east
    global
    west

    Stop all managers on all nodes at the same time

    [LOGICAL] /global > use west
    [LOGICAL] /west > manager * stop
    shell > vim $CONTINUENT_HOME/cluster-home/conf/dataservices.properties
    
    Before:
    east=db1,db2,db3
    west=db4,db5,db6
    
    After:
    east=db1,db2,db3

    Start all managers one-by-one, starting with the current Primary

    shell > manager start

    Once all managers are running, check the list again:

    shell> cctrl -multi
    [LOGICAL] / > ls
    
    +----------------------------------------------------------------------------+
    |DATA SERVICES:                                                              |
    +----------------------------------------------------------------------------+
    east
    global
  5. Switch to AUTOMATIC policy mode:

    [LOGICAL] / > set policy automatic
    policy mode is now AUTOMATIC

Now the cluster has been removed from the composite dataservice, the services on the old nodes must be stopped and then removed from the configuration.

  1. Stop the running services on all nodes in the removed cluster:

    shell> stopall
  2. Now you must remove the node from the configuration, although the exact method depends on which installation method used with tpm:

    • If you are using staging directory method with tpm:

      1. Change to the staging directory. The current staging directory can be located using tpm query staging:

        shell> tpm query staging
        tungsten@host1:/home/tungsten/tungsten-clustering-7.1.4-10
        shell> cd /home/tungsten/tungsten-clustering-7.1.4-10
      2. Update the configuration, omitting the cluster datasource name from the list of members of the dataservice:

        shell> tpm update global --composite-datasources=east
    • If you are using the INI file method with tpm:

      • Remove the INI configuration file:

        shell> rm /etc/tungsten/tungsten.ini
  3. Stop the replicator/manager from being started again.

    • If this all the services on the this node, replicator, manager and connector are being removed, remove the Tungsten Cluster installation entirely:

      • Remove the startup scripts from your server:

        shell> sudo /opt/continuent/tungsten/cluster-home/bin/undeployall
      • Remove the installation directory:

        shell> rm -rf /opt/continuent
    • If the replicator/manager has been installed on a host but the connector is not being removed, remove the start scripts to prevent the services from being automatically started:

      shell> rm /etc/init.d/tmanager
      shell> rm /etc/init.d/treplicator

4.6.3. Removing a Connector from an Existing Deployment

Removing a connector involves only stopping the connector and removing the configuration. When the connector is stopped, the manager will automatically remove it from the dataservice. Note that applications that have been configured to talk to the connector must be updated to point to another connector.

For example, to remove host4 from the current dataservice:

  1. Login to the host running the connector.

  2. Stop the connector service:

    shell> connector stop
  3. Remove the connector from the configuration, the exact method depends on which installation method used with tpm:

    • If you are using staging directory method with tpm:

      1. Change to the staging directory. The current staging directory can be located using tpm query staging:

        shell> tpm query staging
        tungsten@host1:/home/tungsten/tungsten-clustering-7.1.4-10
        shell> cd /home/tungsten/tungsten-clustering-7.1.4-10
      2. Update the configuration, omitting the host from the list of members of the dataservice:

        shell> tpm update alpha \
            --connectors=host1,host2,host3 \
            --members=host1,host2,host3
    • If you are using the INI file method with tpm:

      • Remove the INI configuration file:

        shell> rm /etc/tungsten/tungsten.ini
  4. Stop the connector from being started again. If the connector is restarted, it will connect to the previously configured Primary and begin operating again.

    • If this is a standalone Connector installation, remove the Tungsten Cluster installation entirely:

      • Remove the startup scripts from your server:

        shell> sudo /opt/continuent/tungsten/cluster-home/bin/undeployall
      • Remove the installation directory:

        shell> rm -rf /opt/continuent
    • If the connector has been installed on a host with replicator and/or managers, remove the start script to prevent the connector from being automatically started:

      shell> rm /etc/init.d/tconnector

Chapter 5. Deployment: Security

Tungsten Cluster supports SSL, TLS and certificates for both communication and authentication for all components within the system, and to the underlying databases. This security is enabled by default and includes:

  • Authentication between command-line tools (cctrl), and between background services.

  • SSL/TLS between command-line tools and background services.

  • SSL/TLS between Tungsten Replicator and datasources.