7.5.1. Identifying a Transaction Mismatch

When a mismatch occurs, the replicator service will indicate that there was a problem applying a transaction on the Replica. The replication process stops applying changes to the Replica when the first transaction fails to be applied to the Replica. This prevents multiple-statements from failing

By checking the replication status with trepctl, the pendingError and pendingExceptionMessage will show the error indicating the failure to insert the statement. For example:

shell> trepctl status
...
pendingError           : Event application failed: seqno=18 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master
pendingErrorCode       : NONE
pendingErrorEventId    : mysql-bin.000003:0000000000001389;2010
pendingErrorSeqno      : 18
pendingExceptionMessage: java.sql.SQLException: Statement failed on slave but succeeded on master
                         insert into regions values (1, "EMEA")
...

The trepsvc.log log file will also contain the error information about the failed statement. For example:

...
2025/01/31 10:54:17.194 | com.continuent.tungsten.replicator.applier.ApplierException: java.sql.SQLException: Statement failed on slave but succeeded on master
2025/01/31 10:54:17.194 |       at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:162) ~[tungsten-replicator.jar:?]
2025/01/31 10:54:17.194 |       at com.continuent.tungsten.replicator.applier.JdbcApplier.apply(JdbcApplier.java:1789) ~[tungsten-replicator.jar:?]
2025/01/31 10:54:17.194 |       at com.continuent.tungsten.replicator.applier.ApplierWrapper.apply(ApplierWrapper.java:101) ~[tungsten-replicator.jar:?]
2025/01/31 10:54:17.194 |       at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.apply(SingleThreadStageTask.java:1032) [tungsten-replicator.jar:?]
2025/01/31 10:54:17.194 |       at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.runTask(SingleThreadStageTask.java:634) [tungsten-replicator.jar:?]
2025/01/31 10:54:17.195 |       at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.run(SingleThreadStageTask.java:207) [tungsten-replicator.jar:?]
2025/01/31 10:54:17.195 |       at java.lang.Thread.run(Thread.java:750) [?:1.8.0_432]
2025/01/31 10:54:17.195 | Caused by: java.sql.SQLException: Statement failed on slave but succeeded on master
2025/01/31 10:54:17.195 |       at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:124) ~[tungsten-replicator.jar:?]
2025/01/31 10:54:17.195 |       ... 6 more
2025/01/31 10:54:17.195 | Caused by: java.sql.SQLIntegrityConstraintViolationException: Duplicate entry '1' for key 'regions.PRIMARY'
2025/01/31 10:54:17.195 |       at org.drizzle.jdbc.internal.SQLExceptionMapper.get(SQLExceptionMapper.java:92) ~[drizzle-jdbc-1.5.20-SNAPSHOT.jar:?]
2025/01/31 10:54:17.195 |       at org.drizzle.jdbc.DrizzleStatement.executeBatch(DrizzleStatement.java:951) ~[drizzle-jdbc-1.5.20-SNAPSHOT.jar:?]
2025/01/31 10:54:17.195 |       at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:110) ~[tungsten-replicator.jar:?]
2025/01/31 10:54:17.195 |       ... 6 more

Once the error or problem has been found, the exact nature of the error should be determined so that a resolution can be identified:

  1. Identify the reason for the failure by examining the full error message. Common causes are:

    • Duplicate primary key

      A row or statement is being inserted or updated that already has the same insert ID or would generate the same insert ID for tables that have auto increment enabled. The insert ID can be identified from the output of the transaction using thl. Check the Replica to identify the faulty row. To correct this problem you will either need to skip the transaction or delete the offending row from the Replica dataserver.

      In the above example, the error can be identified through the methods described. The log file output will also show the error direct from the database, and in this case that error is Duplicate entry '1' for key 'regions.PRIMARY'

      The error can be generated when an insert or update has taken place on the Replica rather than on the Primary.

      To resolve this issue, check the full THL for the statement that failed. The information is provided in the error message, but full examination of the THL can help with identification of the full issue. The sequence number of the statement causing error can be found in the pendingErrorSeqno field. In our example, this is 18. For example, to view the THL for the sequence number:

      shell> thl list -seqno 18
      SEQ# = 18 / FRAG# = 0 (last frag)
      - FILE = thl.data.0000000001
      - TIME = 2025-01-31 10:53:33.0
      - EPOCH# = 15
      - EVENTID = mysql-bin.000003:0000000000001389;2010
      - SOURCEID = db3
      - METADATA = [mysql_server_id=12;mysql_thread_id=2010;dbms_type=mysql;tz_aware=true;service=nyc;shard=hr]
      - TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
      - OPTIONS = [##charset = UTF-8, autocommit = 1, sql_auto_is_null = 0, foreign_key_checks = 1, unique_checks = 1, »
      sql_mode = 'NO_ENGINE_SUBSTITUTION,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,IGNORE_SPACE,»
      NO_ZERO_DATE,NO_ZERO_IN_DATE', character_set_client = 33, collation_connection = 33, collation_server = 255]
      - SCHEMA = hr
      - SQL(0) = insert into regions values (1, "EMEA")

      In this example, an INSERT operation is inserting a new row and the ID has been explicity defined (1) Check the destination database and determine the what the current value of the corresponding row:

      mysql> select * from regions where region_id = 1;
      +-----------+-------------+
      | region_id | region_name |
      +-----------+-------------+
      |         1 | EMEA        |
      +-----------+-------------+
      1 row in set (0.00 sec)

      The actual row values are identicial, which means that it is likely this row was inserted into the replica database directly by mistake.

    • Missing table or schema

      If a table or database is missing, this should be reported in the detailed error message. For example:

      Caused by: java.sql.SQLSyntaxErrorException: Unable to switch to database »
          'contacts'Error was: Unknown database 'contacts'
    • Incompatible table or schema

      A modified table structure on the Replica can cause application of the transaction to fail if there are missing or different column specifications for the table data.

      This particular error can be generated when changes to the table definition have been made, perhaps during a maintenance window.

      Check the table definition on the Primary and Replica and ensure they match.

  2. Choose a resolution method:

    Depending on the data structure and environment, resolution can take one of the following forms:

    • Skip the transaction on the Replica

      If the data on the Replica is considered correct, or the data in both tables is the same or similar, the transaction from the Primary to the Replica can be skipped. This process involves placing the replicator online and specifying one or more transactions to be skipped or ignored. At the end of this process, the replicator should be in the ONLINE state.

      For more information on skipping single or multiple transactions, see Section 7.5.2, “Skipping Transactions”.

    • Delete the offending row or rows on the Replica

      If the data on the Primary is considered canonical, then the data on the Replica can be removed, and the replicator placed online.

      Warning

      Deleting data on the Replica may cause additional problems if the data is used by other areas of your application, or has relations to foreign tables.

      For example:

      mysql> delete from regions where region_id = 1;
      Query OK, 1 row affected (0.01 sec)

      Now place the replicator online and check the status:

      shell> trepctl online
    • Restore or reprovision the Replica

      If the transaction cannot be skipped, or the data safely deleted or modified, and only a single Replica is affected, a backup of an existing, working, Replica can be taken and restored to the broken Replica.

      The tprovision command automates this process. See Section 7.6, “Provision or Reprovision a Replica” for more information on reprovisioning.

      To perform a backup and restore, see Section 7.7, “Creating a Backup”, or Section 7.8, “Restoring a Backup”. To reprovision a Replica from the Primary or another Replica, see tprovision.