6.18. Rebuilding THL on the Primary

If THL is lost on a Primary before the events contained within it have been applied to the Replica(s), the THL will need to be rebuilt from the existing MySQL binary logs.

Important

If the MySQL binary logs no longer exist, then recovery of the lost transactions in THL will NOT be possible.

The basic sequence of operation for recovering the THL on both Primary and Replicas is:

  1. Gather the failing requested sequence numbers from all Replicas:

    shell> trepctl status
    
    pendingError           : Event extraction failed
    pendingErrorCode       : NONE
    pendingErrorEventId    : NONE
    pendingErrorSeqno      : -1
    pendingExceptionMessage: Client handshake failure: Client response validation failed:
    Master log does not contain requested transaction:
    master source ID=db1 client source ID=db2 requested seqno=4 client epoch number=0 master min seqno=8 master max seqno=8

    In the above example, when Replica db2 comes back online, it requests a copy of the last seqno in local thl (4) from the Primary db1 to compare for data integrity purposes, which the Primary no longer has.

    Keep a note of the lowest sequence number and the host that it is on across all Replicas for use in the next step.

  2. On the Replica with the lowest failing requested seqno, get the epoch, source-id and event-id (binlog position) from the THL using the command thl list -seqno specifying the sequence number above. This information will be needed on the extractor (Primary) in a later step. For example:

    tungsten@db2:/opt/replicator> thl list -seqno 4
    
    SEQ# = 4 / FRAG# = 0 (last frag)
    - TIME = 2017-07-14 14:49:00.0
    - EPOCH# = 0
    - EVENTID = mysql-bin.000009:0000000000001844;56
    - SOURCEID = db1
    - METADATA = [mysql_server_id=33155307;dbms_type=mysql;tz_aware=true;is_metadata=true; »
                  service=east;shard=#UNKNOWN;heartbeat=NONE]
    - TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
    - OPTIONS = [##charset = UTF-8, autocommit = 1, sql_auto_is_null = 0,
                 foreign_key_checks = 1, unique_checks = 1, time_zone = '+00:00',
                 sql_mode = 'NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES,IGNORE_SPACE',
                 character_set_client = 33, collation_connection = 33, collation_server = 8]
    - SCHEMA = tungsten_east
    - SQL(0) = UPDATE tungsten_east.heartbeat SET source_tstamp= '2017-07-14 14:49:00', $raquo; 
               salt= 5, name= 'NONE'  WHERE id= 1

    There are two more ways of getting the same information using the dsctl command, so use the one you are most comfortable with:

    tungsten@db2:/opt/replicator> dsctl get
    [{"extract_timestamp":"2017-07-14 14:49:00.0","eventid":"mysql-bin.000009:0000000000001844;56",»
      "fragno":0,"last_frag":true,"seqno":4,"update_timestamp":"2017-07-14 14:49:00.0",»
      "shard_id":"#UNKNOWN","applied_latency":0,"epoch_number":0,"task_id":0,"source_id":"db1"}]
    tungsten@db2:/opt/replicator> dsctl get -ascmd
    dsctl set -seqno 4 -epoch 0 -event-id "mysql-bin.000009:0000000000001844;56;566" -source-id "db1"
  3. Clear all THL on the Primary since it is no longer needed by any Replicas:

    shell> thl purge
  4. Use the dsctl command on the Primary with the values we got from the Replica with the lowest seqno to tell the Primary replicator to begin generating THL starting from that event in the MySQL binary logs:

    Note: If you used the dsctl get -ascmd earlier, you may use that provided command now, just add the -reset argument at the end.

    shell> dsctl set -seqno 4 -epoch 0 -event-id "mysql-bin.000009:0000000000001844;56;566" -source-id "db1" -reset
  5. Switch the Primary to online state:

    shell> trepctl online
  6. Switch the Replicas to online state once the Primary is fully online:

    shell> trepctl online