Tungsten Clustering (for MySQL) 5.4 Manual

Question

G.1.1.

On a Tungsten Replicator Replica, how do I set both the local Replica THL listener port and the upstream Primaries THL listener port?

Answer 1

You need to specify two options: thl-port to set the Replica THL listener port and master-thl-port to define the upstream Primary THL listener port. Otherwise thl-port alone sets BOTH.

Answer 2

To update the IP address used by one or more hosts in your cluster, you must perform the following steps:

If possible, switch the node into SHUNNED mode.
Reconfigure the IP address on the machine.
Update the hostname lookup, for example, by editing the IP configuration in /etc/hosts.
Restart the networking to reconfigure the service.
On the node that has changed IP address, run:
```
shell> tpm update
```
The above updates the configuration, but does not restart the individual services, which may still have the old, incorrect, IP address information for the host cached.
Restart the node services:
```
shell> tpm restart
```
On each other node within the cluster:
1. Update the hostname lookup for the new node, for example, by updating the IP configuration in /etc/hosts.
2. Update the configuration, using tpm:
```
shell> tpm update
```
3. Restart the services:
```
shell> tpm restart
```

Answer 3

When upgrading from version 2 to v4+, or simply just moving away from the mysql-connectorj driver to the Drizzle driver, the update process doesn't correctly remove all the connectorJ properties, causing a mismatch when connectors that did get the update try to make a connection to the cluster.

This is a known issue logged as CT-7

As yet, a fix has not been found, but the following workaround will correct the issue by hand:

To properly identify this issue, check the extended output of cctrl for the active driver. There will be one line of output for each node in the local cluster. Repeat once per cluster, on which node does not matter.

shell> echo ls -l | cctrl -expert| grep driver: | awk '{print $3}'

For example, for a three-node cluster, you may see something like this:

com.mysql.jdbc.Driver
com.mysql.jdbc.Driver
com.mysql.jdbc.Driver

If any line on any node in any cluster shows the com.mysql.jdbc.Driver, please use the workaround below:

Warning

If you have multiple clusters, either MSMM, CMM or Composite HA/DR, always ensure you check ALL clusters. Especially in Composite clusters, the Primary cluster, and especially the Primary node, must be checked and corrected if necessary.

Ensure the tpm update was done with the --replace-release option.

Review the tpm reverse output and analyze based on the following:

--mysql-driver=drizzle should exist in the defaults section
You may (or may not) see the old --mysql-connectorj-path entry within each service definition or in the defaults
If none of the above appear in the output, then the default drizzle driver will be active by default as of v4.0.0.

Repeat the following steps for all clusters, one by one:

Place the cluster into Maintenance Mode using the cctrl command:
```
cctrl> set policy maintenance
```

Stop all managers on all nodes within the single cluster:

shell> manager stop
Stopping Tungsten Manager Service...
Waiting for Tungsten Manager Service to exit...
Stopped Tungsten Manager Service.

On all nodes within the single cluster, remove all files from the /opt/continuent/tungsten/cluster-home/conf/cluster/{local_servicename}/datasource/ directory.
Only delete the files from the local cluster service name directory, do not touch the composite service directory if there is one.

Start all managers on all nodes within the single cluster,starting with the Primary:

shell> manager start
Starting Tungsten Manager Service...
Waiting for Tungsten Manager Service..........
running: PID:24819

Place the cluster back into Automatic Mode

shell> echo set policy automatic | cctrl -expert
Tungsten Clustering 6.0.3 build 608
alpha: session established, encryption=false, authentication=false
[LOGICAL:EXPERT] /alpha > set policy automatic
policy mode is now AUTOMATIC
[LOGICAL:EXPERT] /alpha >
Exiting...

Once the above has been completed, confirm that the procedure has worked as follows:

shell> echo ls -l | cctrl -expert| grep driver: | awk '{print $3}'
org.drizzle.jdbc.DrizzleDriver
org.drizzle.jdbc.DrizzleDriver
org.drizzle.jdbc.DrizzleDriver

Answer 4

If you need to change the password used by Tungsten Cluster to connect to a dataserver and apply changes, the password can be updated first by changing the information within the your dataserver, and then by updating the configuration using tpm update. The new password is not checked until the Tungsten Replicator process is starting. Changing the password and then updating the configuration will keep replication from failing.

Within cctrl set the maintenance policy mode:
```
cctrl> set policy maintenance
```
Within MySQL, update the password for the user, allowing the change to be replicated to the other datasources:
```
mysql> SET PASSWORD FOR tungsten@'%' = PASSWORD('new_pass');
```
Follow the directions for tpm update to apply the --datasource-password=new_pass setting.
Set the policy mode in cctrl back to AUTOMATIC :
```
cctrl> set policy automatic
```

Answer 5

The most likely culprit for this issue is that the time is different on the machine in question. If you have ntp or a similar network time tool installed on your machine, use it to update the current time across all the hosts within your deployment:

shell> ntpdate pool.ntp.org

Once the command has been executed across all the hosts, trying sending a heartbeat on the Primary to Replicas and checking the latency:

shell> trepctl heartbeat

Answer 6

Both filters replicate.do and replicate.ignore will either do or ignore both DML and DDL

DDL is currently ONLY replicated for MySQL to MySQL or Oracle to Oracle topologies, or within MySQL Clusters, although it would be advisable not to use ignore/do filters in a clustered environment where data/structural integrity is key.

With replicate.do, all DML and DDL will be replicated ONLY for any database or table listed as part of the do filter.

With replicate.ignore, all DML and DDL will be replicated except for any database or table listed as part of the ignore filter.

Answer 7

You can change the configuration by running the following command from the staging directory:

shell> ./tools/tpm --host=host1 --java-mem-size=2048

Answer 8

This is a normal deployment pattern for working in AWS reduce risk. A single cluster works quite well in this topology.

Answer 9

Standard settings work out of the box. Fine tuning can be done by working with the specific customer application during a Proof-Of-Concept or Production roll-out.

Answer 10

Currently a manual process. New puppet modules to aid this process are being developed, and will be included in the documentation when completed. Here is a link to the relevant procedure Section 3.5.1, “Adding Datasources to an Existing Deployment”.

Answer 11

This is not something currently supported.

Continuent Term	Traditional Term	Description
composite dataservice	Multi-Site Cluster	A configured Tungsten Cluster service consisting of multiple dataservices, typically at different physical locations.
dataservice	Cluster	The collection of machines that make up a single Tungsten Dataservice. Individual hosts within the dataservice are called datasources. Each dataservice is identified by a unique name, and multiple dataservices can be managed from one server.
dataserver	Database	The database on a host.
datasource	Host or Node	One member of a dataservice and the associated Tungsten components.
staging host	-	The machine (and directory) from which Tungsten Cluster™ is installed and configured. The machine does not need to be the same as any of the existing hosts in the dataservice.
active witness	-	A machine in the dataservice that runs the manager process but is not running a database server. This server will be used to establish quorum in the event that a datasource becomes unavailable.
passive witness	-	A witness host is a host that can be contacted using the ping protocol to act as a network check for the other nodes of the cluster. Witness hosts should be on the same network and segment as the other nodes in the dataservice.
coordinator		The datasource or active witness in a dataservice that is responsible for making decisions on the state of the dataservice. The coordinator is usually the member that has been running the longest. It will not always be the Primary. When the manager process on the coordinator is stopped, or no longer available, a new coordinator will be chosen from the remaining members.

Tungsten Term	Traditional Term	Description
composite dataservice	Multi-Site Cluster	A configured Tungsten Cluster service consisting of multiple dataservices, typically at different physical locations.
dataservice	Cluster	A configured Tungsten Cluster service consisting of dataservers, datasources and connectors.
dataserver	Database	The database on a host. Datasources include MySQL, PostgreSQL or Oracle.
datasource	Host or Node	One member of a dataservice and the associated Tungsten components.
staging host	-	The machine from which Tungsten Cluster is installed and configured. The machine does not need to be the same as any of the existing hosts in the cluster.
staging directory	-	The directory where the installation files are located and the installer is executed. Further configuration and updates must be performed from this directory.
connector	-	A connector is a routing service that provides management for connectivity between application services and the underlying dataserver.
Witness host	-	A witness host is a host that can be contacted using the ping protocol to act as a network check for the other nodes of the cluster. Witness hosts should be on the same network and segment as the other nodes in the dataservice.

Distribtion	Published EOL	Notes
Amazon Linux 2	30th June 2024
CentOS 7	30th June 2024
Debian GNU/Linux 10 (Buster)	June 2024
Debian GNU/Linux 11 (Bullseye)	June 2026
Oracle Linux 8.4	July 2029
RHEL 7	30th June 2024
RHEL 8.4.0	31st May 2029
Rocky Linux 8	31st May 2029
Rocky Linux 9	31st May 2032
SUSE Linux Enterprise Server 15	21st June 2028
Ubuntu 20.04 LTS (Focal Fossa)	April 2025
Ubuntu 22.04 LTS (Canonical)	April 2027

Database	MySQL Version	Tungsten Version	Notes
MySQL	5.7	All non-EOL Versions	Full Support
MySQL	8.0.0-8.0.34	6.1.0-6.1.3	Supported, but does not support Partitioned Tables or the use of `binlog-transaction-compression=ON` introduced in 8.0.20
MySQL	8.0.0-8.0.34	6.1.4 onwards	Fully Supported.
MariaDB	10.0, 10.1	All non-EOL Versions	Full Support
MariaDB	10.2, 10.3	6.1.13-6.1.20	Partial Support. See note below.
MariaDB	Up to, and including, 10.11	7.x	Full Support

Attribute	Guidance	Amazon Example
Instance Type	Instance sizes and types are dependent on the workload, but larger instances are recommended for transactional databases.	`m4.xlarge` or better
Instance Boot Volume	Use block, not ephemeral storage.	EBS
Instance Deployment	Use standard Linux distributions and bases. For ease of deployment and configuration, the use of Ansible, Puppet or other script based solutions could be used.	Amazon Linux AMIs

Parameter	tpm Option	tpm Value	MySQL `my.cnf` Option	MySQL Value
`/` (root)
MySQL Data	`datasource-mysql-data-directory`	`/volumes/mysql/data`	`datadir`	`/volumes/mysql/data`
MySQL Binary Logs	`datasource-log-directory`	`/volumes/mysql/binlogs`	`log-bin`	`/volumes/mysql/binlogs/mysql-bin`
Transaction History Logs (THL)	`thl-directory`	`/volumes/mysql/thl`

Deployment Type/Package	TAR/GZip	RPM
Staging Installation	Yes	No
INI File Configuration	Yes	Yes
Deploy Entire Cluster	Yes	No
Deploy Per Machine	Yes	Yes

	Tungsten Cluster Service Seqno		Tungsten Replicator Service Seqno
Operation	`east`	`west`	`east`	`west`
Insert/update data on `east`	Seqno Increment		Seqno Increment
Insert/update data on `west`		Seqno Increment		Seqno Increment

Command	Description
trepctl status	Shows basic variables including overall latency of Replica and number of apply channels
trepctl status -name shards	Shows the number of transactions for each shard
trepctl status -name stores	Shows the configuration and internal counters for stores between tasks
trepctl status -name tasks	Shows the number of transactions (events) and latency for each independent task in the replicator pipeline

Role	Supplies Replication Data	Receives Replication Data	Load Balancing	Failover
`Master`	Yes	No	Yes	Yes
`Slave`	No	Yes	Yes	Yes
`Standby`	No	Yes	No	Yes
`Archive`	No	Yes	Yes	No

Datasource State	Alert STATUS
ONLINE	OK
OFFLINE	WARN (for non-composite datasources)
OFFLINE	DIMINISHED (for composite passive replica)
OFFLINE	CRITICAL (for composite active primary)
FAILED	CRITICAL
SHUNNED	SHUNNED

	Policy Mode
Ruleset	Automatic	Manual	Maintenance
Monitoring	Yes	Yes	Yes
Fault Detection	Yes	Yes	No
Failure Fencing	Yes	Yes	No
Failure Recovery	Yes	No	No

Policy Mode	Scenario	Datasource State	Resolution
`AUTOMATIC`
	Primary Failure		Automatic
	Primary Recovery	Master:SHUNNED(FAILED-OVER-TO-host2)	Section 6.6.2, “Recover a failed Primary”
	Replica Failure		Automatic
	Replica Recovery		Automatic
`MANUAL`
	Primary Failure	Master:FAILED(NODE 'host1' IS UNREACHABLE))	Section 6.6.2.4, “Failing over a Primary”
	Primary Recovery	Master:SHUNNED(FAILED-OVER-TO-host2)	Section 6.6.2.2, “Recover a shunned Primary”
	Replica Failure	slave:FAILED(NODE 'host1' IS UNREACHABLE)	Automatically removed from service
	Replica Recovery	slave:FAILED(NODE 'host1' IS UNREACHABLE)	Section 6.6.1, “Recover a failed Replica”
`MAINTENANCE`
	Primary Failure		Use Section 6.6.2.4, “Failing over a Primary” to promote a different Replica
	Primary Recovery		Section 6.6.2.3, “Manually Failing over a Primary in `MAINTENANCE` policy mode”
	Replica Failure		N/A
	Replica Recovery		N/A
`Any`
	Replica Shunned	slave:SHUNNED(MANUALLY-SHUNNED)	Section 6.6.1, “Recover a failed Replica”
	No Primary	slave:SHUNNED(SHUNNED)	Section 6.6.2.1, “Recover when there are no Primaries”

Step	Description	Command	host1	host2	host3
1	Initial state		Primary	Replica	Replica
2	Set the maintenance policy	set policy maintenance	Primary	Replica	Replica
3	Switch Primary	switch to host2	Replica	Primary	Replica
4	Shun `host1`	datasource host1 shun	Shunned	Primary	Replica
5	Perform maintenance		Shunned	Primary	Replica
6	Validate the `host1` server configuration	tpm validate	Shunned	Primary	Replica
7	Recover the Replica ( `host1` ) back	datasource host1 recover	Replica	Primary	Replica
8	Ensure the Replica has caught up		Replica	Primary	Replica
9	Switch Primary back to `host1`	switch to host1	Primary	Replica	Replica
10	Set automatic policy	set policy automatic	Primary	Replica	Replica

	Routing Method	QoS	Latency	Affinity
Global Configuration	Yes	Implied	Yes	Yes
Connection String	Yes	Yes	Yes	Yes
`user.map`	Yes	Yes	Yes	Yes
SQL statement	Yes (with SQL routing enabled)	Yes (with SQL routing enabled)	No	No

Routing Method	Host Selection	Auto R/W Splitting	Replica Latency	Maximum Applied Latency
Smartscale	By Session	Yes (by SQL statement)	Lazy	Yes
Direct Reads	By Content	Yes (by SQL statement)	Lazy	Yes
Host-based	By Hostname	No	Yes	Yes
Port-based	By Network Port	No	No	Yes
SQL-based	By SQL comment	No	No	Yes

QoS	Primary Selected	Replica Selected
`RW_STRICT`	Yes, always	No.
`RO_RELAXED`	Only if no Replica available	Yes, if below max applied latency.

Load Balancer	Default QoS	Description
DefaultLoadBalancer	`RW_STRICT`	Always selects the Primary data source
MostAdvancedSlaveLoadBalancer	`RO_RELAXED`	Selects the Replica that has replicated the most events, by comparing data sources "high water" marks. If no Replica is available, the Primary will be returned.
LowestLatencySlaveLoadBalancer		Selects the Replica data source that has the lowest replication lag, or `appliedLatency` in ls -l within cctrl output. If no Replica data source is eligible, the Primary data source will be selected.
RoundRobinSlaveLoadBalancer		Selects a Replica in a round robin manner, by iterating through them using internal index. Returns the Primary if no Replica is found online
HighWaterSlaveLoadBalancer	`RW_SESSION`	Given a session high water (usually the high water mark of the update event), selects the first Replica that has higher or equal high water, or the Primary if no Replica is online or has replicated the given session event. This is the default used when SmartScale is enabled.

Auto Read/Write Splitting	Yes
Primary Selection	Automatically, by SQL examination
Replica Selection	Automatically, by SQL examination
QoS Compatibility	None
SmartScale Compatibility	None

Auto Read/Write Splitting	No
Primary Selection	Manually, by SQL comments
Replica Selection	Manually, by SQL comments
QoS Compatibility	Supported
SmartScale Compatibility	Yes
Direct Compatibility	Yes

Auto Read/Write Splitting	No
Primary Selection	Manually, by hostname/IP address
Replica Selection	Manually, by hostname/IP address
QoS Compatibility	None
SmartScale Compatibility	None

Auto Read/Write Splitting	No
Primary Selection	Manually, by network port
Replica Selection	Manually, by network port
QoS Compatibility	None
SmartScale Compatibility	None

Feature	Proxy Mode	Bridge Mode
Primary/Replica Selection	Yes	Yes
Switch/Failover	Yes	Yes
Automatic Read/Write Splitting	Yes	No
Application-based Read/Write Splitting	Yes	Yes
Seamless Reconnects	Yes	No
Data Source Selection	Current data source is checked to confirm latency and affinity	Pass-through
Session KeepAlive	Yes	No

Option	Description
`client-list`	Return a list of the current client connections through this connector.
`cluster-status`	Return the cluster status, as the connector currently understands it. This is the command-line alternative to the inline cluster status command.
`condrestart`	Restart only if already running
`console`	Launch in the current console (instead of a daemon)
`dump`	Request a Java thread dump (if connector is running)
`graceful-stop [seconds]`	Stops the connector gracefully, allowing outstanding open connections to finish and close before the connector process is stopped. All new connection requests are denied. The Connector will shut down as soon as there are no active connections. [seconds] is an integer specifying the optional time to wait before terminating the connector. Specifying no value for seconds will cause the Connector to wait indefinitely for all connections to finish. Specifying zero (0) seconds will cause the Connector to shut down immediately without waiting for existing connections to complete gracefully. As of v7.0.0, connector drain is available as an alias for connector graceful-stop.
`install`	Install the service to automatically start when the system boots
`reconfigure`	Reconfigure the connector by forcing the connector to reread the configuration, including the configuration files and `user.map`.
`remove`	Remove the service from starting during boot
`restart`	Stop connector if already running and then start
`start`	Start in the background as a daemon process
`status`	Query the current status
`stop`	Stop if running (whether as a daemon or in another console) Optional timeout in seconds can be provided (From release 6.1.19 only)

Option	Description
tungsten cluster status	Displays a detailed view of the information the connector has about the cluster
tungsten connection count	Display the current number of active connection to each datasource
tungsten connection status	Displays information about the connection status for the last statement executed
tungsten flush privileges	Reload the user.map file and update the user credentials
tungsten gc	Executes the connector garbage collector to free memory
tungsten help	Shows help description each statement
tungsten mem info	Display the memory usage information for the connector
tungsten show processlist	List all active queries on this connector instance
tungsten show variables	Display the connector configuration options currently in use

Option	`-admin`
Description	Enter admin mode when connecting
Value Type	string

Option	`-expert`
Description	Enter expert mode when connecting
Value Type	string

Option	`-host`
Description	Host name of the service manager to use
Value Type	string
Default	localhost

Option	`-logical`
Description	Enter logical mode when connecting
Value Type	string

Tungsten Clustering (for MySQL) 5.4 Manual

Continuent Ltd

Preface

1. Legal Notice

2. Conventions

3. Quickstart Guide

Chapter 1. Introduction

1.1. Tungsten Replicator

1.1.1. Transaction History Log (THL)

1.2. Tungsten Manager

1.3. Tungsten Connector

Chapter 2. Deployment

2.1. Host Types

2.1.1. Manager Hosts

2.1.2. Connector (Router) Hosts

2.1.3. Replicator Hosts

2.1.4. Witness Hosts

2.2. Requirements

2.2.1. Operating Systems Support

2.2.2. Database Support

Version Support Matrix

MySQL "Innovation" Releases

2.2.3. RAM Requirements

2.2.4. Disk Requirements

2.2.5. Java Requirements

Important

2.2.6. Cloud Deployment Requirements

2.2.7. Docker Support Policy

2.2.7.1. Overview

2.2.7.2. Background

2.2.7.3. Current State

2.2.7.4. Summary

2.3. Deployment Sources

2.3.1. Using the TAR/GZipped files

2.3.2. Using the RPM package files

Note

2.4. Common tpm Options During Deployment

2.5. Best Practices

2.5.1. Best Practices: Deployment

2.5.2. Best Practices: Upgrade

2.5.3. Best Practices: Operations

2.5.4. Best Practices: Maintenance

Chapter 3. Deployment: MySQL Topologies

3.1. Deploying Standalone HA Clusters

3.1.1. Prepare: Standalone HA Cluster

3.1.2. Install: Standalone HA Cluster

Important

3.1.3. Best Practices: Standalone HA Cluster

3.2. Deploying Composite Active/Passive Clustering

3.2.1. Prepare: Composite Active/Passive Cluster

3.2.2. Install: Composite Active/Passive Cluster

3.2.3. Best Practices: Composite Active/Passive Cluster

3.2.4. Adding a remote Composite Cluster

3.3. Deploying Multi-Site/Active-Active Clustering

Note

3.3.1. Prepare: Multi-Site/Active-Active Clusters

3.3.2. Install: Multi-Site/Active-Active Clusters

Note

Warning

3.3.3. Best Practices: Multi-Site/Active-Active Clusters

Note

Note

Note

3.3.4. Configuring Startup on Boot

Note

3.3.5. Resetting a single dataservice

Note

3.3.6. Resetting all dataservices

Note

3.3.7. Provisioning during live operations

Note

3.3.8. Adding a new Cluster/Dataservice

Note

Note

Note

3.3.9. Enabling SSL for Replicators Only

Note

Important

3.3.10. Dataserver maintenance

Note

Option	`-multi`
Description	Allow support for connecting to multiple services
Value Type	string

Option	`-no-history`
Description	Disable command history
Value Type	string

Option	`-physical`
Description	Enter physical mode when connecting
Value Type	string

Option	`-port`
Description	Specify the TCP/IP port of the service manager
Value Type	string
Default	9997

Option	`-proxy`
Description	Operate as a proxy service
Value Type	string

Option	`-service`
Description	Connect to a specific service
Value Type	string

Option	`-timeout`
Description	Specify timeout (in seconds) to determine how long to wait before timing out when unable to connect to the manager. Default 30 seconds
Value Type	string

Option	Description
admin	Change to admin mode
cd	Change to a specific site within a multisite service
cluster	Issue a command across the entire cluster
cluster validate	Validate the cluster quorum configuration
create composite	Create a composite dataservice
datasource	Issue a command on a single datasource
expert	Change to expert mode
failover	Perform a failover operation from a primary to a replica
help	Display the help information
ls	Show cluster status
members	List the managers of the dataservice
physical	Enter physical mode
ping	Test host availability
quit, exit	Exit cctrl
recover master using	Recover the Primary within a datasource using the specified Primary
replicator	Issue a command on a specific replicator
router	Issue a command on a specific router (connector)
service	Run a service script
set	Set management options
set master	Set the Primary within a datasource
show topology	Shows the currently configured cluster topology
switch	Promote a Replica to a Primary

Option	Description
backup	Backup a datasource
connections	Displays the current number of connections running to the given node through connectors.
drain	Prevents new connection to be made to the given data source, while ongoing connection remain untouched. If a timeout (in seconds) is given, ongoing connections will be severed only after the timeout expires.
fail	Fail a datasource
host	Hostname of the datasource
offline	Put a datasource into the offline state
online	Put a datasource into the online state
recover	Recover a datasource into operation state as Replica
restore	Restore a datasource from a previous backup
shun	Shun a datasource
welcome	Welcome a shunned datasource back to the cluster

Option	Description
`-c`	Report a critical status if the latency is above this level
`--perslave-perfdata`	Show the latency performance information on a per-Replica basis
`--perfdata`	Show the latency performance information
`-w`	Report a warning status if the latency is above this level