Tungsten Clustering

Tungsten Clustering 7.0.2

Build: 161

Release Date: 9 Jan 2023

End of Life Date: 28 Jun 2026

Release 7.0.2 contains a number of key bug fixes and improvements.

v7.0.2 was originally released on 9th Dec 2022 as build 145, and re-released on 9th Jan 2023 as build 161

Behavior Changes (7)

The following changes may affect existing scripts and integration tools. Any scripts or environment which make use of these tools should check and update for the new configuration:

Command-line Tools (3)

Both the tungsten_get_ports and tpm report commands have been updated to use the ss OS command when the netstat OS command is unavailable or deprecated.
Issue: CT-2007
The check_tungsten.sh was deprecated in release 6.1.18 and has now been removed from this release.
Issue: CT-1939
The various user-xxxx.log files are no longer generated
Issue: CT-1914

Core Replicator (1)

repl-svc-extractor-multi-frag-service-detection is now turned **ON** by default. Event shards are determined at extraction time. With fragmented events, the shard cannot be determined by only reading the first fragment, but needs to check the last fragment as well. With this setting turned OFF, there is no issue with pipelines that don't need it, i.e. no parallel apply downstream replicas. However, as this is done at extract time, THL contains this information, and adding or changing a replica using parallel apply could introduce issues.
Note
It can be disabled if you see a performance overhead but this should be done with caution. For Aurora<>Aurora Active/Active deployments it is essential that this property be left ON.
Issue: CT-1959

Connector (1)

Improved tungsten show processlist by running underlying commands in parallel.
Issue: CT-1569

Manager (1)

A failsafe shunned cluster (Caused by a network split) will be auto recovered after the network connection is re-established.
Issue: CT-241

Core Clustering (1)

Changed the behavior of the connector and manager upon local site failover:
No longer sending write traffic to the remote site unless the local site is fully offline. In case of local failover, the connector will now pause connections until a new primary is elected. This will avoid risks of out-of-order apply after local failover
Issue: CT-2009

Improvements, new features and functionality (25)

Command-line Tools (16)

The tungsten_reset_manager command now supports the ability to simply print out the path or paths to be cleared, one per line via the -l or --list arguments.
Issue: CT-1917
The tungsten_generate_haproxy_for_api and tpm generate-haproxy-for-api commands no longer call the Perl Data::Dumper module.
Issue: CT-1915
The tungsten_generate_haproxy_for_api and tpm generate-haproxy-for-api commands now support using connector hosts in the backend definitions via -c, and extra backend flags to the backend hosts lines using -f.
Issue: CT-1909
The tungsten_merge_logs command now supports the --before TIMESTAMP and --after TIMESTAMP filters
Issue: CT-1869
A new thl dsctl option has been added to the thl command, and a new -event option added to thl list
Issue: CT-2012
The tpm report command now prints the hostname and listener ports where available when using the--extra|-x option or the new --ports option.
Issue: CT-1969
A new standalone status script has been added called tungsten_get_status that shows the datasources and replicators for all nodes in all services along with seqno and latency.
Issue: CT-1962
The tpm ask command has five new variables available: dsrole & dsstate for the current datasource, and trrole & trstate for the current replicator, and nodeinfo which displays all 4 of the new variables.
Issue: CT-1944
The tpm ask stages and tpm ask allstages commands have been added to display the Replicator stages for the current node (stages) and the stages for each role (allstages).
Issue: CT-1943
The tpm command calls to glob have been improved to be more strict and compliant.
Issue: CT-1940
The tmonitor command now accepts cli args to specify the ports and will auto-configure the ports if they have been changed via the Tungsten configuration.
Issue: CT-1919
The tpm ask summary command now provides the coordinator host and the isCoordinator boolean if the Manager is running on that node.
Also, tpm ask now supports direct calls to coordinator, [isCoordinator|iscoordinator]and [isBridgeMode|isBridge|bridge|isbridge|isbridgemode].
Issue: CT-1874
Added a new log file (tungsten-replicator/log/data-drift.log) for data drift messages, i.e. :
- an update statement was logged on primary, but did not update any row on replica
- a delete statement was logged on primary, but did not delete any row on replica
Issue: CT-1873
The trepctl reset will now show the last known applied seqno and latency.
This information is stored on disk at regular intervals (10s minimum) so as not to overload the replicator, therefore the value can be shown as slightly old dependant on when the status command was issued.
By default, this feature is disabled. It can be enabled by setting the following parameter in the configuration :
```
svc-applier-last-applied-write-interval=20
```
This will write current position to disk every 20 seconds. This information is also exported by the Prometheus exporter.
If the service is online, it will display the current value (the same as appliedLastSeqno and appliedLatency)
```
shell> trepctl status 
Processing status command... 
NAME VALUE 
---- ----- 
appliedLastEventId : mysql-bin.000017:0000000151329854;70 
appliedLastSeqno : 999 
appliedLatency : 347707.0 
... 
lastKnownAppliedLatency: 347707.0 
lastKnownAppliedSeqno : 999 
...  
```
Issue: CT-1823
A new -c option is now available with some trepctl commands that can be used in conjunction with the -r option to indicate the number of times to refresh before automatically terminating. For example, the following command:
```
shell> trepctl perf -r 3 -c 10
```
Will refresh the output every 3 seconds, 10 times.
Issue: CT-679
rsync is now an option in tprovision, in addition to using xtrabackup and mysqldump. To use rsync, specify-m rsync.
Using rsync by default will provision a replica in 2 passes:
- The first pass will live copy (seed) the replica from the source.
- The second pass will quiesce the source and run the rsync again, resulting in shorter down time than a single pass rsync
Issue: CT-338

Core Replicator (4)

Added a new feature that enables pausing a replicator stage for some amount of time.
This will pause the given stage for 100 seconds.:
```
trepctl pause -stage thl-to-q -time 100
```
This will pause the stage indefinitely (or until restart, etc) Add -y to avoid the prompt message whether you are sure.
```
trepctl pause -stage thl-to-q  
```
For the previous 2 commands, running a pause command again will override the previous command.
This will resume the suspended stage (Note that if the stage is not paused, this will have no effect):
```
trepctl resume -stage thl-to-q
```
Note
Please note this pause does not survive a replicator restart or a service offline/online.
Issue: CT-1912
Added a way to configure the maximum number of rows that can be grouped together when applying row based events for multiple insert or delete statements.
For these properties to be in effect, you must ensure that optimize-row-events=true is either explicitly set in your configuration, or not present (since it will be enabled by default)
For example, the following settings will limit the number of inserted or deleted rows applied at once to 10:
```
optimize-row-events-limit-insert-rows=10 
optimize-row-events-limit-delete-rows=10
```
The default values if not specified will be 50 for inserts and 100 for deletes. Note that for deletes to be optimized, the affected table MUST have a single column PK.
Issue: CT-1980
A new replicator role (thl-applier) has been added to allow a replicator service to apply its locally available THL, without pulling from a remote host
Issue: CT-1936
Per-service tuning of the replicator thl directory is now possible for multi-service replicator-only installs as well as for clustering. The given value should be the base directory, to which tungsten will add the service name. For example, the following entry in the tungsten.ini:
```
[alpha]
...
...
thl-directory=/drv1/thl
...
```
Would result in the THL being placed in /drv1/thl/alpha
Note
Update of thl directory is only available when tpm is called from the staging installation directory, **NOT** from the running directory.
Issue: CT-1927

Connector (3)

Added logging configuration example to print load balancers activity
Issue: CT-1966
Two new commands have been introduced to cctrl to provide better insight and control of the connectors. These are:
- datasource <dsname> connections [-l] This command displays the current number of connections running to the given node through connectors.
- datasource <dsname> drain [optional timeout] This command will prevent new connection to be made to the given data source, while ongoing connection remain untouched.
Issue: CT-1949
The connector graceful-stop command now supports systemd service manager properly. The connector stop command now takes an optional argument that will make it a graceful stop. If connector stop is run without the parameter, it will stop the connector immediately. If a positive number of seconds is passed, it will wait, at most, this timeout for connections to disconnect (refusing new connections), after which it will force close all connections and shutdown the connector. connector graceful-stop behavior is unchanged: without the parameter, the connector will wait "forever" for connections to disconnect. A positive timeout in seconds can be passed to sever connections after the given delay
Issue: CT-1921

Manager (1)

Added a new option to TPM manager-replicator-offline-timeout that configures the timeout for the manager to wait until the replicator goes offline. When parallel applier is in use the default timeout was too low, so it’s now user configurable so that it can be adjusted to suit different topologies. If not supplied, the default is 180 (3 minutes). This value should be sufficient in most use cases.
Issue: CT-1892

API (1)

New logs files have been added for the REST API, these are as follows:
- service_logs/connector-api.log
- service_logs/manager-api.log
- service_logs/replicator-api.log
Issue: CT-1983

Bug Fixes (30)

Installation and Deployment (4)

Fixes issues where fixed properties and filters passed to tpm in service stanzas were not being configured correctly
Issue: CT-1463
The tpm install and tpm update commands now properly support the thl-port option for cross-site subservices.
Issue: CT-1953
No longer using Tanuki wrapper functionality to print jvm version, which was creating defunct java processes at startup, now using internal code.
Issue: CT-1876
ddlscan, dsctl and tungsten_send_diag are now added to the aliases.shscript.
Issue: CT-813

Command-line Tools (10)

The tpm diag command now passes when the nodename defined in the tungsten.ini is the shortname, and DNS returns the FQDN.
Issue: CT-1908
Fixes an issue that prevented ddlscan from connecting to MySQL if SSL was enabled.
Issue: CT-1808
Note
This fix was released in Tungsten Clustering and Tungsten Replicator 7.0.2 Build 161.
The tpm command checks for the existence of the mysql command-line client when installing/upgrading. The process will no longer abort with an error on non-MySQL targets such as heterogeneous replicator appliers, or Active-Witness hosts.
Issues: CT-1924, CT-2018
Both TungstenAPI and tpasswd now properly update .passwords.store.orig backup file so that proper manipulation of passwords won't trigger tpm update failure
Issue: CT-1981
The tpm mysql command no longer aborts with an access denied error on CentOS 6.
Issue: CT-1977
The tpm mysql command will now gracefully handle being run on a non-database node.
Issue: CT-1946
Fixes an issue that prevented dsctl reacting to MySQL if SSL was enabled.
Issue: CT-1928
The tpm diag command now gathers the mysql.log file when SSL is enabled in the server.
Issue: CT-1920
The tpm command now allows any case for section entries (i.e. [alpha_FROM_beta]) in the INI files.
Issue: CT-1879
The tungsten_skip_seqno command no longer fails when -i is specified, and now properly filters using--filter when there is a long error message.
Issue: CT-1877

Backup and Restore (1)

The cluster_backup script will no longer backup a replica if the replicator is in an ERROR state.
Issue: CT-1036

Core Replicator (6)

Fixed an issue where filtered events would trigger a useless update to the service trep_commit_seqno table while it is overwritten anyway once the last statement of the applied event is done, just prior to committing the whole block.
Issue: CT-1931
Fixes an issue that prevented geometry datatypes with SRID from being replicated.
Issue: CT-1904
Fixed a possible issue when recovering an old primary as a replica after failover when parallel apply is enabled, that could lead the replica to be unable to come online and require a reprovisioning of this replica.
Issue: CT-1890
Fixes an issue that would prevent a service from going offline at a specified time (trepctl online -until-time) when parallel apply is enabled. This is a rework of CT-1243.
Issue: CT-1684
Fixed a parsing issue that would prevent the replicator from correctly detecting a CREATE TABLE statement with START TRANSACTION
Issue: CT-1987
Fixed an issue where the replicator would hang after applying a DROP TABLE event, that originally failed on the primary, but got logged into the binlog.
Issue: CT-1973

Filters (1)

Fixed an issue where the dropsqlmodes filter would fail to remove invalid sql modes from a multi-statement event
Issue: CT-1993

Connector (3)

Fixed connector logging configuration to show hostname and class printing logs
Issue: CT-1965
No longer printing warning "SequenceException: Parents differ" when canceling connections of a composite data service
Issue: CT-1964
Connector now auto detects default authentication plugin by retrieving MySQL data source variable default_authentication_pluginrather than just using MySQL server version
Issue: CT-1926

Manager (2)

A bug has been fixed that, in a few very rare cases, would allow replicas to continue to pull and apply THL from a failed primary whilst a failover was in the process of electing a new primary. This resulted in failovers being unable to complete fully. Whilst the new primary would be online and functioning, existing replicas in the cluster could experience errors due to THL discrepencies between the old and new primary nodes.
Issue: CT-1986
The cctrl command datasource slave now sets the replicator role correctly. Previously, only the datasource role would change.
Issue: CT-1882

API (3)

The REST API call /api/v2/manager/control/service/<service>/datasource/<datasource>/slave now sets the role of the replicator correctly.
Issue: CT-1975
Calls to /api/v2/manager/cluster/status now return properly when a peer cluster is fully offline or unreachable.
Issue: CT-1945
Fixed REST API /api/v2/manager/control/service/<service>/switch call. It will not switch anymore to a shunned node.
Issue: CT-796