Common Reference

Error/Cause/Solution

This page collects known errors observed in the field, along with their cause and recommended solution. Use the search box to narrow the list by keyword, tag, error text, version, or id. The summary table is sorted by the most recently updated entry first; click any row to jump to its detail block below.

35 of 35

Updated	Error	Tags
2026-04-30	Replication latency very high	replicatorlatencylag	→
2025-03-20	Services requires a reset	replicatorreset	→
2025-03-20	MySQLExtractException: unknown data type 0	extractormysql	→
2021-11-29	MySQL 8.0+, User Roles and Smartscale	clustertpmconnectorsmartscalemysql8	→
2021-10-14	Cluster remains in MAINTENANCE mode after `tpm update`	clustertpmstagingmaintenancepolicy	→
2020-10-13	Unexpected failure while extracting event	replicatorextractormysqlbinlog	→
2020-01-21	WARNING: An illegal reflective access operation has occurred	javajava9manager	→
2019-03-04	ERROR >> host1 >> can't alloc thread	hostprerequisites	→
2019-01-11	ERROR backup.BackupTask Backup operation failed: null	backupxtrabackup	→
2017-02-15	WARN [KeepAliveTimerTask] - Error while sending a KEEP_ALIVE query to connection	connectorkeepaliveerrormysql	→
2016-05-18	Attempt to write new log record with equal or lower fragno: seqno=3 previous stored fragno=32767 attempted new fragno=-32768	replicatormemoryfragment size	→
2016-05-18	Replicator reports an Out of Memory error	replicatorout-of-memoryoom	→
2015-06-01	The Primary replicator stopped with a JDBC error.	jdbcprimary	→
2015-06-01	Latency is high: master:ONLINE, progress=41331580333, THL latency=78849.733	replicatorlatencyprimary	→
2015-06-01	Lots of entries added to replicator log		→
2015-06-01	element 'mysql_readonly' not found in path		→
2015-06-01	pendingExceptionMessage": "Unable to update last commit seqno: Incorrect datetime value: '2025-03-13 02:02:26' for column 'update_timestamp' at row 1	timestampreplicator	→
2015-06-01	Error: could not settle on encryption_client algorithm		→
2015-06-01	[S1000][unixODBC][MySQL][ODBC 5.3(w) Driver]SSL connection error: unknown error number [ISQL]ERROR: Could not SQLConnect	zabbixodbc	→
2015-06-01	Backup agent name not found: xtrabackup-full	xtrabackup	→
2014-07-28	OptimizeUpdatesFilter cannot filter, because column and key count is different.	filtersoptimize	→
2013-11-01	MySQL is incorrectly configured	mysqlconfiguration	→
2013-11-01	Replicator fails to connect after updating password	replicatoruserpassword	→
2013-11-01	cctrl reports MANAGER(state=STOPPED)	managerstoppedclustering	→
2013-11-01	Triggers not firing correctly on Replica	triggers	→
2013-11-01	trepctl status hangs	trepctlstatushang	→
2013-11-01	Connector shows errors with "java.net.SocketException: Broken pipe"	connectorjavadirectread	→
2013-11-01	ERROR 2013 (HY000) at line 583: Lost connection to MySQL server during query	connectionmysql	→
2013-11-01	Starting replication after performing a restore because of an invalid restart sequence number		→
2013-11-01	ERROR 1010 (HY000) at line 5094506: Error dropping database (can't rmdir './mysql-bin/', errno: 17)	drop databasemysqlmysqldumprestore	→
2013-11-01	ERROR 1580 (HY000) at line 5093787: You cannot 'DROP' a log table if logging is enabled	drop tablemysqlloggingmysqldumprestore	→
2013-11-01	Backup/Restore is not bringing my host back to normal	backuprestore	→
2013-10-09	Too many open processes or files	limitsubuntudebianulimit	→
2013-08-07	Unable to update the configuration of an installed directory	tpmstagingdeployment	→
2013-07-17	Number of connections exceeded for MySQL	connectormemorymysqloperations	→

Replication latency very high

Error

The latency of updates on the Replicas is very high

Cause

First the reason and location of the delay should be identified. It is possible for replication data to have been replicated quickly, but applying the data changes is taking a long time. Using row-based replication may increase the latency due to the increased quantity of data that must be transferred.

Solution

There can be a number of causes for replication lag and more in depth investigation will be required, some common causes that are often seen:

Slow network causing slow transfer of THL
Long running transaction giving the appearance of lag, or replication hanging
Under performing MySQL requiring deeper performance tuning

trepctl perf and trepctl perflog can also be useful in checking replication statistics.

Services requires a reset

Error

The replicator service needs to be reset, for example if your MySQL service has been reconfigured, or when resetting a data warehouse or batch loading service after a significant change to the configuration.

Cause

If the replicator stops replicating effectively, or the configuration and/or schema of a source or target in a datawarehouse loading solution has changed significantly. This will reset the service, starting extraction from the current point, and the target/Replica from the new Primary position. It will also reset all the positions for reading and writing.

Solution

To reset a service entirely within a Tungsten Cluster, without having to perform a re-installation, refer to the steps in "Resetting a Dataservice"To reset a service entirely within a standalone Tungsten Replicator installation, without having to perform a re-installation, refer to the steps in "The trepctl reset Command"

MySQLExtractException: unknown data type 0

Error

Replication fails to extract the data from the MySQL binary log, and the replicator will not go online again.

Cause

This error points to possible use of a data type that has been introduced or changed between MySQL versions.

Solution

It would be advisable to contact Continuent Support for further analysis to establish the exact cause and for a solution.

MySQL 8.0+, User Roles and Smartscale

Error

When configuring a connector to use SmartScale, you receive an error indicating users do not ave the required repl_client privilege despite the privilege being granted via a Role (New in MySQL 8.0)

Cause

Known issue occuring in all versions that support MySQL 8.0. This is due to tpm not aware of Roles, and therefore unable to determine that privileges have been grated in this way

Solution

Providing you are certain the correct privileges have been granted, you can simply instruct tpm to ignore this particular validate check by adding the following option to your configuration:

skip-validation-check=MySQLConnectorPermissionsCheck

Cluster remains in MAINTENANCE mode after `tpm update`

Error

Afer issuing ./tools/tpm update --replace-release from a remote staging host, the cluster policy remains in MAINTENANCE mode.

Cause

Known issue occuring from v6.1.5 onwards.

Solution

Following the update, log into cctrl and check the cluster policy mode. If the cluster is still in MAINTENANCE and this is NOT expected, issue:

cctrl> set policy automatic

Unexpected failure while extracting event

Error

Replicator (extractor) is unable to stay online and extract an event. Error logs consistently show a stack trace similar to the following:

2020/07/24 15:06:14.637 | Event extraction failed 
2020/07/24 15:06:14.637 | com.continuent.tungsten.replicator.extractor.ExtractorException: Unexpected failure 
                          while extracting event myhost-db-04.qa.mydomain.local (1334) 
2020/07/24 15:06:14.637 | at com.continuent.tungsten.replicator.extractor.mysql.MySQLExtractor.extractEvent(Unknown Source) 
2020/07/24 15:06:14.637 | at com.continuent.tungsten.replicator.extractor.mysql.MySQLExtractor.extract(Unknown Source) 
2020/07/24 15:06:14.637 | at com.continuent.tungsten.replicator.extractor.ExtractorWrapper.extract(Unknown Source) 
2020/07/24 15:06:14.637 | at com.continuent.tungsten.replicator.extractor.ExtractorWrapper.extract(Unknown Source) 
2020/07/24 15:06:14.637 | at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.runTask(Unknown Source) 
2020/07/24 15:06:14.637 | at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.run(Unknown Source) 
2020/07/24 15:06:14.637 | at java.lang.Thread.run(Thread.java:748) 
2020/07/24 15:06:14.637 | Caused by: java.lang.IndexOutOfBoundsException 
2020/07/24 15:06:14.637 | at java.io.DataInputStream.readFully(DataInputStream.java:192) 
2020/07/24 15:06:14.637 | at com.continuent.tungsten.common.io.BufferedFileDataInput.readFully(Unknown Source) 
2020/07/24 15:06:14.637 | at com.continuent.tungsten.replicator.extractor.mysql.BinlogReader.read(Unknown Source) 
2020/07/24 15:06:14.637 | at com.continuent.tungsten.replicator.extractor.mysql.LogEvent.readDataFromBinlog(Unknown Source) 
2020/07/24 15:06:14.637 | at com.continuent.tungsten.replicator.extractor.mysql.LogEvent.readLogEvent(Unknown Source) 
2020/07/24 15:06:14.637 | at com.continuent.tungsten.replicator.extractor.mysql.MySQLExtractor.processFile(Unknown Source) 
2020/07/24 15:06:14.637 | ... 7 more

Cause

You could be hitting a MySQL bug where the binlog is over-writing itself due to periods in the log-bin my.cnf entry. See [https://bugs.mysql.com/bug.php?id=75507](https://bugs.mysql.com/bug.php?id=75507) for more details. Example of my.cnf entry that may trigger this bug:

log-bin = /data/mysql/myhost-db-04.qa.mydomain.local.com-bin

Solution

Replace the dots with hyphens and restart MySQL Example of a fixed my.cnf entry:

log-bin = /data/mysql/myhost-db-04-qa-mydomain-local-com-bin

Adjusting the binlog pattern within MySQL may also require a configuration change to the replicator if the pattern is changed after installation. To do this, add the datasource-log-pattern to your configuration and issue tpm update

WARNING: An illegal reflective access operation has occurred

Error

The following Warning may be seen in the Manager Logs when running v6.1.2 and above in conjunction with Java 9 and above

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.drools.core.rule.builder.dialect.asm.ClassGenerator 
(file:/opt/continuent/releases/tungsten-clustering-6.1.2-75_pid16553/tungsten-manager/lib/drools-core-6.3.0.Final.jar) 
to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int)
WARNING: Please consider reporting this to the maintainers of org.drools.core.rule.builder.dialect.asm.ClassGenerator
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

Cause

This is a known issue due to new encapsulation controls in Java 9+

Solution

This warning does not affect operations of the Cluster and can be safely ignored

ERROR >> host1 >> can't alloc thread

Error

tpm installation fails with this error.

Cause

Most common occurenece of this error is often attributed to OS permissions.

Solution

Review "Directory Locations and Configuration"
Ensure all installation paths are owned by the correct OS user
Ensure OS user configured with correct sudo rights

ERROR backup.BackupTask Backup operation failed: null

Error

A full Xtrabackup backup has failed, and left the datasource's replicator offline.

Cause

A common cause of this failure is the existance of zero-length store-*.properties files in the backups/[serviceName]/ directory.

Solution

Simply remove any zero-byte store-*.properties files from the backups/[serviceName]/ directory and retry the backup.

WARN [KeepAliveTimerTask] - Error while sending a KEEP_ALIVE query to connection

Error

Connections to MySQL through the connector report KeepAliveTimerTask errors in the connector.log file

Cause

Possible causes include local scripts that kill stale connections after some fixed period of time, or the wait_timeout was changed without restarting the connector.

Solution

Check for local scripts that are killing connections Restart the Connector

shell> connector restart

Attempt to write new log record with equal or lower fragno: seqno=3 previous stored fragno=32767 attempted new fragno=-32768

Error

The number of fragments in a single transaction has been exceeded.

Cause

The maximum number of fragments within a single transaction within the network protocol is limited to 32768. If there is a very large transaction that exceeds this number of fragments, the replicator can stop and be unable to continue. The total transaction size is a combination of the fragment size (default is 1,000,000 bytes, or 1MB), and this maximum number (approximately 32GB).

Solution

It is not possible to change the number of fragments in a single transaction, but the size of each fragment can be increased to handle much larger single transactions. To change the fragment size, configure the property=replicator.extractor.dbms.transaction_frag_size parameter. For example, by doubling the size, a transaction of 64GB could be handled:

property=replicator.extractor.dbms.transaction_frag_size=2000000

If you change the fragment size in this way, the service on the extractor must be reset so that the transaction can be reprocessed and the binary log is parsed again. You can reset the service by using the trepctl restore command.

Replicator reports an Out of Memory error

Error

The replicator runs out of memory, triggers a stack trace indicator a memory condition, or the replicator fails to extract the transaction information from the MySQL binary log.

Cause

The replicator operates by extracting (or applying) an entire transaction. This means that when extracting data from the binary log, and writing that to THL, or extracting from the THL in preparation for applying to the target, the entire transaction, or an entire statement within a multi-statement transaction, must be held in memory. In the event of a very large transaction having to be extracted, this can cause a problem with the memory configuration. The actual configuration of how much memory is used is determined through a combination of the number of fragments, the size of the internal buffer used to store those fragments, and the overall fragment size.

Solution

Although you can increase the overall memory allocated to the replicator, changing the internal sizes used can also improve the performance and ability to extract data. First, try reducing the size of the buffer (buffer-size) used to hold the transaction fragments. The default for this value is 10, but reducing this to 5 or less will ease the required memory:

buffer-size=10

Altering the size of each fragment can also help, as it reduces the memory required to hold the data before it is written to disk and sent out over the network to Replica replicators. Reducing the fragment size will reduce the memory footprint. The size is controlled by the the property=replicator.extractor.dbms.transaction_frag_size parameter:

property=replicator.extractor.dbms.transaction_frag_size=1000000

Note that if you change the fragment size, you may need to reset the service on the extractor so that the binary log is parsed again. You can reset the service by using the trepctl restore command. Finally, if you do need to increase the memory allocated to the repliction, you will need to adjust the java-mem-size property

The Primary replicator stopped with a JDBC error.

Error

The Primary replicator stopped with a JDBC error.

Cause

The error log may show a more detailed failure with the JDBC error message:

INFO | jvm 1 | 2016/02/08 17:16:24 | 2016-02-08 17:16:24,627 [qktdb - pool-2-thread-1] ERROR management.tungsten.TungstenPlugin Unable to start replication service due to underlying error
INFO | jvm 1 | 2016/02/08 17:16:24 | java.lang.NumberFormatException: For input string: "0000002417562130"
INFO | jvm 1 | 2016/02/08 17:16:24 | at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
INFO | jvm 1 | 2016/02/08 17:16:24 | at java.lang.Integer.parseInt(Integer.java:495)
INFO | jvm 1 | 2016/02/08 17:16:24 | at java.lang.Integer.valueOf(Integer.java:582)
INFO | jvm 1 | 2016/02/08 17:16:24 | at com.continuent.tungsten.replicator.extractor.mysql.MySQLExtractor.setLastEventId(MySQLExtractor.java:1139)
INFO | jvm 1 | 2016/02/08 17:16:24 | at com.continuent.tungsten.replicator.extractor.ExtractorWrapper.setLastEventId(ExtractorWrapper.java:243)
INFO | jvm 1 | 2016/02/08 17:16:24 | at com.continuent.tungsten.replicator.extractor.ExtractorWrapper.setLastEvent(ExtractorWrapper.java:219)
INFO | jvm 1 | 2016/02/08 17:16:24 | at com.continuent.tungsten.replicator.pipeline.StageTaskGroup.prepare(StageTaskGroup.java:210)
INFO | jvm 1 | 2016/02/08 17:16:24 | at com.continuent.tungsten.replicator.pipeline.Stage.prepare(Stage.java:272)
INFO | jvm 1 | 2016/02/08 17:16:24 | at com.continuent.tungsten.replicator.pipeline.Pipeline.prepare(Pipeline.java:274)
INFO | jvm 1 | 2016/02/08 17:16:24 | at com.continuent.tungsten.replicator.conf.ReplicatorRuntime.prepare(ReplicatorRuntime.java:642)
INFO | jvm 1 | 2016/02/08 17:16:24 | at com.continuent.tungsten.replicator.management.tungsten.TungstenPlugin.online(TungstenPlugin.java:391)
INFO | jvm 1 | 2016/02/08 17:16:24 | at com.continuent.tungsten.replicator.management.OpenReplicatorManager$OfflineToSynchronizingAction.doAction(OpenReplicatorManager.java:1376)
INFO | jvm 1 | 2016/02/08 17:16:24 | at com.continuent.tungsten.fsm.core.StateMachine.applyEvent(StateMachine.java:220)
INFO | jvm 1 | 2016/02/08 17:16:24 | at com.continuent.tungsten.fsm.event.EventProcessor.run(EventProcessor.java:78)
INFO | jvm 1 | 2016/02/08 17:16:24 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
INFO | jvm 1 | 2016/02/08 17:16:24 | at java.util.concurrent.FutureTask.run(FutureTask.java:262)
INFO | jvm 1 | 2016/02/08 17:16:24 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
INFO | jvm 1 | 2016/02/08 17:16:24 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
INFO | jvm 1 | 2016/02/08 17:16:24 | at java.lang.Thread.run(Thread.java:745)

The underlying reason for the error is that MySQL has created a binlog over 2GB and the replicator could not process the event due to the limit of a Java integer.

Solution

The solution for this error, if the log is a rotate event (use mysqlbinlog) is to reposition the replicator using dsctl.

Latency is high: master:ONLINE, progress=41331580333, THL latency=78849.733

Error

Latency is high: master:ONLINE, progress=41331580333, THL latency=78849.733

Cause

There are many possible causes for this error, however, if you see the following within the log on the Primary it may indicate a specific issue:

INFO | jvm 1 | 2016/02/09 15:01:54 | at com.continuent.tungsten.replicator.thl.CommitSeqnoTable.updateLastCommitSeqno(CommitSeqnoTable.java:548)
INFO | jvm 1 | 2016/02/09 15:01:54 | at com.continuent.tungsten.replicator.thl.CatalogManager.updateCommitSeqnoTable(CatalogManager.java:223)
INFO | jvm 1 | 2016/02/09 15:01:54 | at com.continuent.tungsten.replicator.thl.THL.updateCommitSeqno(THL.java:593)
INFO | jvm 1 | 2016/02/09 15:01:54 | at com.continuent.tungsten.replicator.thl.THLStoreApplier.apply(THLStoreApplier.java:163)
INFO | jvm 1 | 2016/02/09 15:01:54 | at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.apply(SingleThreadStageTask.java:768)
INFO | jvm 1 | 2016/02/09 15:01:54 | at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.runTask(SingleThreadStageTask.java:501)
INFO | jvm 1 | 2016/02/09 15:01:54 | at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.run(SingleThreadStageTask.java:176)
INFO | jvm 1 | 2016/02/09 15:01:54 | at java.lang.Thread.run(Thread.java:745)
INFO | jvm 1 | 2016/02/09 15:01:54 |

The stack trace shows that the replicator is updating the trep_commit_seqno table which is normally a very fast operation. The underlying reason may either be:

It is possible that updates to MySQL are somehow getting delayed, which would slow down the operation of the replicator as it updates each status update.
Check the block commit size, as low values will increase the number of updates to the table, and if the MySQL server updates are slow, this in turn slows down the operation of the replicator.

Solution

Focus on making sure the IO system and MySQL commits are not being blocked. You can try increasing svc-applier-block-commit-size so the replicator has less commits to MySQL on the Primary. You can continue to track progress through trepctl status -name tasks as you may see the appliedLastSeqno value updating less often if you increase this by a lot. Beware that increasing this value too much increases possible data loss since it creates less sync points with the Replicas.

Lots of entries added to replicator log

Error

The logging level used by Tungsten Cluster creates a lot of entries, including WARN, this generates a lot of information that is difficult to find the real errors and problems. How do i change the logging level?

Cause

By default, Tungsten Cluster reports a lot of information and detail, including INFO and other levels of detail that may generate a lot of information. For example:

INFO | jvm 1 | 2016/02/18 10:16:46 | 2016-02-18 15:16:46,789 [brm - remote-to-thl-0] WARN filter.ReplicateFilter
Ignoring query : No schema found for this query from event 4020717251 (SET @current_db_user := NULL...)
INFO | jvm 1 | 2016/02/18 10:16:46 | 2016-02-18 15:16:46,789 [brm - remote-to-thl-0] WARN filter.ReplicateFilter 
Ignoring query : No schema found for this query from event 4020717257 (SET @disabled_trigger := NULL...)
INFO | jvm 1 | 2016/02/18 10:16:46 | 2016-02-18 15:16:46,789 [brm - remote-to-thl-0] WARN filter.ReplicateFilter 
Ignoring query : No schema found for this query from event 4020717257 (SET @cols := ''...)
INFO | jvm 1 | 2016/02/18 10:16:46 | 2016-02-18 15:16:46,789 [brm - remote-to-thl-0] WARN filter.ReplicateFilter 
Ignoring query : No schema found for this query from event 4020717257 (SET @current_user_fk := NULL...)
INFO | jvm 1 | 2016/02/18 10:16:46 | 2016-02-18 15:16:46,789 [brm - remote-to-thl-0] WARN filter.ReplicateFilter 
Ignoring query : No schema found for this query from event 4020717257 (SET @current_db_user := NULL...)
INFO | jvm 1 | 2016/02/18 10:16:46 | 2016-02-18 15:16:46,789 [brm - remote-to-thl-0] WARN filter.ReplicateFilter 
Ignoring query : No schema found for this query from event 4020717259 (SET @disabled_trigger := NULL...)
INFO | jvm 1 | 2016/02/18 10:16:46 | 2016-02-18 15:16:46,789 [brm - remote-to-thl-0] WARN filter.ReplicateFilter 
Ignoring query : No schema found for this query from event 4020717259 (SET @cols := ''...)
INFO | jvm 1 | 2016/02/18 10:16:46 | 2016-02-18 15:16:46,789 [brm - remote-to-thl-0] WARN filter.ReplicateFilter 
Ignoring query : No schema found for this query from event 4020717259 (SET @current_user_fk := NULL...)
INFO | jvm 1 | 2016/02/18 10:16:46 | 2016-02-18 15:16:46,789 [brm - remote-to-thl-0] WARN filter.ReplicateFilter 
Ignoring query : No schema found for this query from event 4020717259 (SET @current_db_user := NULL...)

Solution

The logging level used to report status and other information, and that is written into the log, can be changed to reduce or lower the reporting level. To do this:

Edit the ~tungsten_home/tungsten/tungsten-replicator/conf/log4j.properties

Find the following line:

log4j.logger.com.continuent.tungsten.replicator.filter.ReplicateFilter=DEBUG, stdout

This will change the logging level so that only entries at DEBUG and higher will be output.

element 'mysql_readonly' not found in path

Error

We are getting the following INFO message in the tmsvc.log every few seconds (host1 is the Primary), logs attached:

INFO | jvm 1 | 2016/03/04 15:09:38 | |host1 |
INFO | jvm 1 | 2016/03/04 15:09:38 | +----------------------------------------------------------------------------+
INFO | jvm 1 | 2016/03/04 15:09:38 | |element 'mysql_readonly' not found in path |
INFO | jvm 1 | 2016/03/04 15:09:38 | |'/podsjqe/sjqedb1/conf/service/' while searching for entry |
INFO | jvm 1 | 2016/03/04 15:09:38 | |'podsjqe/sjqedb1/conf/service/mysql_readonly' |
INFO | jvm 1 | 2016/03/04 15:09:38 | +----------------------------------------------------------------------------+

Cause

This is caused by the manager

Solution

To prevent INFO messages being reported in the /opt/continuent/tungsten/tungsten-manager/log/tmsvc.log file:

Put the cluster into MAINTENANCE mode
Stop all managers
Start all managers starting with the Primary
Put the cluster into AUTOMATIC mode

pendingExceptionMessage": "Unable to update last commit seqno: Incorrect datetime value: '2025-03-13 02:02:26' for column 'update_timestamp' at row 1

Error

The following error is reported when applying an event:

pendingExceptionMessage": "Unable to update last commit seqno: Incorrect datetime value: '2025-03-13 02:02:26' for column 'update_timestamp' at row 1

Cause

The underlying reason for this error is the format and value of the datetime value that is being represented are either incompatible with the current SQL mode within MySQL, or the datetime combination is one that occurs during a DST switch, which may be incompatible with the SQL mode.

Solution

The solution is to update the SQL mode so that explicit changes are ignored when applying the data, rather than using the information defined during the session. To update the settings. Because the problem will be short lived and specific to the data being applied it can be done temporarily:

Edit file /opt/continuent/tungsten/tungsten-replicator/conf/static-endtest.properties

Find this line:

replicator.applier.dbms.ignoreSessionVars=autocommit

Change the line to:

replicator.applier.dbms.ignoreSessionVars=autocommit|sql_mode

Restart the replicator using:
```
shell> replicator restart
```
Wait for the replicator to come online, and process the change that originally caused the problem. Once the data has been replicated, revert the settings in the file back to the old value and restart the replicator again.

Error: could not settle on encryption_client algorithm

Error

The following error is reported when trying to connect:

Error: could not settle on encryption_client algorithm

Cause

Can be due to missing an acceptable cipher on any one of the hosts.

Solution

This is a list of acceptable ciphers:

aes128-cbc
3des-cbc
blowfish-cbc
cast128-cbc
aes192-cbc
aes256-cbc
rijndael-cbc@lysator.liu.se
idea-cbc
none
arcfour128
arcfour256

These can be configured in /etc/ssh/sshd_config under Ciphers. Try adding a supported cipher (aes256-cbc_ to the end of the ciphers in your ssh server config file. Note that SSH and OpenSSL ciphers are mapped, for example like the following:

// Maps the SSH name of a cipher to it's corresponding OpenSSL name
SSH_TO_OSSL = {
  "3des-cbc"                    => "des-ede3-cbc",
  "blowfish-cbc"                => "bf-cbc",
  "aes256-cbc"                  => "aes-256-cbc",
  "aes192-cbc"                  => "aes-192-cbc",
  "aes128-cbc"                  => "aes-128-cbc",
  "idea-cbc"                    => "idea-cbc",
  "cast128-cbc"                 => "cast-cbc",
  "arcfour128"                  => "rc4",
  "arcfour256"                  => "rc4",
  "arcfour512"                  => "rc4",
  "none"                        => "none"
}

[S1000][unixODBC][MySQL][ODBC 5.3(w) Driver]SSL connection error: unknown error number [ISQL]ERROR: Could not SQLConnect

Error

We have a new server dedicated to Zabbix monitoring. Zabbix uses an ODBC connection for MySQL. When we try to connect to a Tungsten connector from the new server using ODBC we receive an error:

[S1000][unixODBC][MySQL][ODBC 5.3(w) Driver]SSL connection error: unknown error number
[ISQL]ERROR: Could not SQLConnect

Cause

The underlying cause is related to an SSL or encryption error, either the certificate is wrong, or the ciphers being used are not supported. Examine the connector.log on the Tungsten server we are connecting to returns an error with each attempt:

INFO | jvm 1 | 2016/05/20 13:07:17 | WARN [MySQLProtocolHandler] - [172.16.0.120:43571] Error during transfer of authentication packet: no cipher suites in common

Connecting from to the new server using the mysql client may work:

[root@zabbix etc]# mysql -uzabbix -pZ@bbix487sql -hnas-db-ct01-a.safemls.net
Welcome to the MariaDB monitor. Commands end with ; or g.
Your MySQL connection id is 40019
Server version: 5.6.20-68.0-log-tungsten Percona Server (GPL), Release 68.0, Revision 656

Connecting directly MySQL database on port 13306 using the ODBC connection may also work:

[root@zabbix etc]# isql -v ct01
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+

Solution

zabbix is trying to connect to the connector with SSL encryption, but the SSL is not operating. The easiest way to bypass this is disable SSL connections for ODBC. Add the following entry in odbc.ini (under the section for the host you're testing):

useSSL = No

Backup agent name not found: xtrabackup-full

Error

A backup was taken with xtrabackup-full from the Primary. Replica appears to not be configured for xtrabackup-full, which results in there being issues with the restore. How can we configure the Replica to use xtrabackup-full for restore?

Cause

The underlying cause and indication is that the xtrabackup has not been installed properly on the Replica, or not installed at all, at the point when Tungsten Cluster was being installed. The following will be seen in the status output after a failed restore:

minimumStoredSeqNo : -1
offlineRequests : NONE
pendingError : Unable to spawn restore request
pendingErrorCode : NONE
pendingErrorEventId : NONE
pendingErrorSeqno : -1
pendingExceptionMessage: Backup agent name not found: xtrabackup-full

Probably the other hosts didn't require this setting specifically because xtrabackup was installed and detected when Tungsten Cluster was installed on them.

Solution

The steps you need are:

Install xtrabackup if not already installed on the Replica in question
Add the line below to /etc/tungsten/tungsten.ini:
```
backup-method=xtrabackup-full
```
And then run tpm update on that Replica host to update the configuration.

OptimizeUpdatesFilter cannot filter, because column and key count is different.

Error

When using the optimizeupdates filter, replication stops with the error message in the output from trepctl reset or when examining the log file.

Cause

The optimizeupdates filter works by removing indexed columns from updates that are unnecessary when a primary key exists to locate the record. If the key information has already been removed (for example, by the pkey filter, then the columns cannot be effectively compared and optimized.

Solution

If the pkey filter is required, change the order of the filters within the specified stage within the replicator so that the optimizeupdates filter is called **before** the pkey filter.

MySQL is incorrectly configured

Error

The configuration of MySQL was wrong; it included autocommit=0 and the wrong server-id

Cause

Prerequisites were not followed correctly.

Solution

Edit my.cnf and clean up. Restart MySQL if possible. Alternatively, set manually:

mysql> set GLOBAL autocommit=1;
Query OK, 0 rows affected (0.00 sec)
 
mysql> set GLOBAL server_id=2;
Query OK, 0 rows affected (0.01 sec)

Replicator fails to connect after updating password

Error

Tungsten Replicator fails to connect after changing the tungsten user password.

Cause

The most likely cause is that the configuration within ~/.my.cnf was forcing a connection to the cluster astungsten user, and user change may have only been made on one host and not replicated to the other MySQL servers.

Solution

First, update the credentials in ~/.my.cnf and ensure you can connect to all the Replicas with the updated credentials. Also check that tpm has been configured with the right password and that all servers have the right information. Errors such as:

ERROR>>host1>>Unable to connect to the MySQL server using 
    tungsten@host1:3306 (WITH PASSWORD) (MySQLLoginCheck)

Indicate that the password may not have been replicated properly. Check the following:

Check the user configuration information within each MySQL server and compare the values:
```
mysql> select * from mysql.user where user='tungsten';
```

For any node that is not up to date, update the password manually:

shell> mysql -u root -ppassword -P 3306 -h host1 
mysql> UPDATE `mysql`.`user` SET Password=PASSWORD('secret') WHERE User='tungsten';
mysql> flush privileges;

Update the tpm and Tungsten Cluster configuration:
```
datasource-password=secret
```

Restart the replicators:

shell> replicator restart

Then put the replicators offline/online to refresh the configuration:

[LOGICAL] /alpha > datasource host1 offline
DataSource 'host1@alpha' is now OFFLINE 
[LOGICAL] /alpha > datasource host1 online 
Setting server for data source 'host1' to READ-ONLY

cctrl reports MANAGER(state=STOPPED)

Error

cctrl reports the status for the manager as MANAGER(state=STOPPED)

Cause

The manager has stopped running, possibly due to a fault or error state.

Solution

Restart the manager process on this server is not running. You can start it by running:

shell> manager start

Or:

shell> /opt/continuent/tungsten/tungsten-manager/bin/manager start

Triggers not firing correctly on Replica

Error

Newly created triggers are not firing when executed

Cause

If a new user (definer) was used to create the triggers, they may fail to be executed, raising the following warning in the logs:

INFO | jvm 1 | 2013/10/16 04:21:33 | WARNING: Could not execute query 
    org.drizzle.jdbc.internal.common.query.DrizzleQuery@60dc4c81: The 
    MySQL server is running with the --read-only option so it cannot 
    execute this statement 
INFO | jvm 1 | 2013/10/16 04:21:33 | 2013-10-16 04:21:33,208 ERROR 
    replicator.pipeline.SingleThreadStageTask [q-to-dbms] Event 
    application failed: seqno=524545571 fragno=0 message=java.sql.SQLException: 
    Statement failed on slave but succeeded on master 
INFO | jvm 1 | 2013/10/16 04:21:33 | com.continuent.tungsten.replicator.applier.ApplierException: 
    java.sql.SQLException: Statement failed on slave but succeeded on master

This is an indication that the new definer does not have the required SUPER privilege and that a trigger is failing to run.

Solution

In order to fix this issue, the new definer should be given the SUPER privilege on each server and then replication should be restarted. The SUPER privilege allows the user to run a statement on a Replica server where the read_only flag has been turned on. If necessary, the scope of the privilege can be restricted to an individual schema. The GRANT statement should be done on every database server, while the shun and recover should only be done on the Replicas.

mysql> grant SUPER on *.* to user;
mysql> flush privileges;

Within cctrl if using Tungsten Clustering:

cctrl> datasource hostname shun;
cctrl> datasource hostname recover;

You should continue to review the tungsten-replicator/log/trepsvc.log file to see what log messages are being written there. It appears that replication is still failing and it is probably related to the same issue. If you want us to review logs to interpret the results for you, you can upload the log file here and someone will look at it.

Important

It is strongly advised to review "Triggers" to understand the complexities of triggers in combinations with Tungsten Replication and Tungsten Clustering.

trepctl status hangs

Error

The trepctl reset command hangs at the end of the output after a "cannot fork" error.

Cause

This can be caused by THL corruption on the Replica. This can be also be caused by an Out-Of-Memory condition in either the replicator or in the OS itself.

Solution

You can recreate the THL files on the Replica(s) ONLY. This can be achieved by deleting the existing THL files, which will cause the Replica replicator to download all of the THL data from the Primary again:

shell> replicator stop
shell> cd /opt/continuent
shell> mv thl thl.old;
shell> mkdir thl
shell> replicator start
shell> trepctl status

To increase the Replicator memory, add or edit the following tpm configuration option, then run tpm update:

java-mem-size=2048

Connector shows errors with "java.net.SocketException: Broken pipe"

Error

When using DirectReads, the connector reports errors with a broken pipe.

Cause

The most likely culprit for this error is that the wait_timeout and/or interactive_timeout is too low. This causes a problem because pooled connections get timeouts and are closed by the MySQL server.

Solution

Change the configuration for your MySQL server (in my.cnf) to increase these timeouts.

ERROR 2013 (HY000) at line 583: Lost connection to MySQL server during query

Error

Client was disconnected during a query with the error number.

Cause

Usually this means that the MySQL server has closed the connection or the server has restarted. The exact cause will be more difficult to determine.

Solution

We need a bit more information to provide assistance.

Were you connected through the Tungsten Connector?
Did anything else happen on the servers?
If you were connected through the Tungsten Connector, please upload the tungsten-connector/log/connector.log file from the server you were connected to.

Starting replication after performing a restore because of an invalid restart sequence number

Error

Starting replication fails because of an invalid restart sequence number. Checking the sequence number, trep_commit_seqno shows an empty or invalid table contents:

mysql> select * from tungsten.trep_commit_seqno;
+-------+--------+-----------+-----------+--------------+---------+-----------------+---------------------+
| seqno | fragno | last_frag | source_id | epoch_number | eventid | applied_latency | update_timestamp    |
+-------+--------+-----------+-----------+--------------+---------+-----------------+---------------------+
| -1    | NUL  L | NULL      | NULL      | NULL         | NULL    | -1              | 2013-10-27 23:44:05 |
+-------+--------+-----------+-----------+--------------+---------+-----------------+---------------------+
1 row in set (0.00 sec)

Cause

The restore may have failed to correctly restore the tungsten tables.

Solution

Retry the restore process, making sure the replicator is stopped and there are no updates to the table taking place:

Ensure no replicator processes are running:
```
shell> replicator stop
```
Ensure the /opt/continuent/thl directory is empty:
```
shell> tpm reset-thl
```
Restore the backup using whatever backup/restore tool you used.
Check that trep_commit_seqno has a valid restart position in it.
```
shell> replicator start
```

ERROR 1010 (HY000) at line 5094506: Error dropping database (can't rmdir './mysql-bin/', errno: 17)

Error

Loading a mysqldump into a MySQL server from a backup/restore fails.

Cause

The problem may be that your MySQL binary logs are in a subdirectory of your MySQL data directory, causing MySQL to view them as a schema.

Solution

Possible steps to resolution:

Modify the dump file so it isn't trying to drop a schema named after the bin log directory.
Update the mysql configuration so the bin logs aren't in a directory in the data dir. mysql sees all directories in the data dir as a schema.

ERROR 1580 (HY000) at line 5093787: You cannot 'DROP' a log table if logging is enabled

Error

Loading a mysqldump into a MySQL server from a backup/restore fails.

Cause

This appears to be a bug in MySQL that causes mysqldump loads to fail.

Solution

You should be able to import the dump by switching off the slow query log globally before running the import:

mysql> SET GLOBAL slow_query_log=0

Backup/Restore is not bringing my host back to normal

Error

A backup/restore was performed as requested, but the host is still not coming up.

Cause

When you backup a node, the backup is stored on that physical server. The correct backup file from an active server should be used on the host being restored.

Solution

You can use that backup to restore another server in two ways:

If the backup directory is shared between servers using NFS or a clustered file system, the commands will work like you tried.
You must copy the backup files between nodes. See "Restoring from Another Replica" for instructions.

Too many open processes or files

Error

The operating system or environment reports that the tungsten or designated OS user has too many open files, processes, or both.

Cause

User limits for processes or files have either been exhausted, or recommended limits for user configuration have not been set.

Solution

Check the output of ulimit and check the configure file and process limits:

shell> ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 709
virtual memory          (kbytes, -v) unlimited

If the figures reported are less than the recommended settings, see "Creating the User Environment" for guidance on how these values should be changed.

Unable to update the configuration of an installed directory

Error

Running an update or configuration with tpm returns the error 'Unable to update the configuration of an installed directory'

Cause

Updates to the configuration of a running cluster must be performed from the staging directory where Tungsten was originally installed.

Solution

Change to the staging directory and perform the necessary commands with tpm. To determine the staging directory, use:

shell> tpm query staging

Then change to the staging directory and perform the updates:

shell> ./tools/tpm update --replace-release

Number of connections exceeded for MySQL

Error

Connections to MySQL through the connector report that there are too many connections open.

Cause

The maximum number of connections supported by MySQL is dependent on the available memory. If the available memory is exceeded, then the maximum number of connections may be reached, which in turn will lead to errors connecting to MySQL, either directly or through the connector.

Solution

The maximum number of supported connections for MySQL can be determined using the following query:

mysql> SELECT ( @@key_buffer_size + @@query_cache_size + @@tmp_table_size + \
    @@innodb_buffer_pool_size + @@innodb_additional_mem_pool_size + \
    @@innodb_log_buffer_size + @@max_connections * \
    ( @@read_buffer_size + @@read_rnd_buffer_size + @@sort_buffer_size + \
    @@join_buffer_size + @@binlog_cache_size + @@thread_stack ) ) / 1073741824 AS MAX_MEMORY_GB;

If the size of this value is greater than the available memory on the host running MySQL, the number of connections configured through the max_connections parameter should be modified.