5.17. Rebuilding THL on the Master

If THL is lost on a master before the events contained within it have been applied to the slave(s), the THL will need to be rebuilt from the existing MySQL binary logs.

Important

If the MySQL binary logs no longer exist, then recovery of the lost transactions in THL will NOT be possible.

The basic sequence of operation for recovering the THL on both master and slaves is:

  1. Gather the failing requested sequence numbers from all slaves:

    shell> trepctl status
    
    pendingError           : Event extraction failed
    pendingErrorCode       : NONE
    pendingErrorEventId    : NONE
    pendingErrorSeqno      : -1
    pendingExceptionMessage: Client handshake failure: Client response validation failed:
    Master log does not contain requested transaction:
    master source ID=db1 client source ID=db2 requested seqno=4 client epoch number=0 master min seqno=8 master max seqno=8

    In the above example, when slave db2 comes back online, it requests a copy of the last seqno in local thl (4) from the master db1 to compare for data integrity purposes, which the master no longer has.

    Keep a note of the lowest sequence number and the host that it is on across all slaves for use in the next step.

  2. On the slave with the lowest failing requested seqno, get the epoch, source-id and event-id (binlog position) from the THL using the command thl list -seqno specifying the sequence number above. This information will be needed on the extractor (master) in a later step. For example:

    tungsten@db2:/opt/replicator> thl list -seqno 4
    
    SEQ# = 4 / FRAG# = 0 (last frag)
    - TIME = 2017-07-14 14:49:00.0
    - EPOCH# = 0
    - EVENTID = mysql-bin.000009:0000000000001844;56
    - SOURCEID = db1
    - METADATA = [mysql_server_id=33155307;dbms_type=mysql;tz_aware=true;is_metadata=true; »
                  service=east;shard=#UNKNOWN;heartbeat=NONE]
    - TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
    - OPTIONS = [##charset = UTF-8, autocommit = 1, sql_auto_is_null = 0,
                 foreign_key_checks = 1, unique_checks = 1, time_zone = '+00:00',
                 sql_mode = 'NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES,IGNORE_SPACE',
                 character_set_client = 33, collation_connection = 33, collation_server = 8]
    - SCHEMA = tungsten_east
    - SQL(0) = UPDATE tungsten_east.heartbeat SET source_tstamp= '2017-07-14 14:49:00', $raquo; 
               salt= 5, name= 'NONE'  WHERE id= 1

    There are two more ways of getting the same information, use the one you are most comfortable with:

    tungsten@db2:/opt/replicator> dsctl get
    [{"extract_timestamp":"2017-07-14 14:49:00.0","eventid":"mysql-bin.000009:0000000000001844;56",»
      "fragno":0,"last_frag":true,"seqno":4,"update_timestamp":"2017-07-14 14:49:00.0",»
      "shard_id":"#UNKNOWN","applied_latency":0,"epoch_number":0,"task_id":0,"source_id":"db1"}]
    
    
    tungsten@db2:/opt/replicator> tungsten_get_position
    {
      "applied_latency": 0,
      "epoch_number": 0,
      "eventid": "mysql-bin.000009:0000000000001844;56",
      "extract_timestamp": "2017-07-14 14:49:00.0",
      "fragno": 0,
      "last_frag": "1",
      "seqno": 4,
      "shard_id": "#UNKNOWN",
      "source_id": "db1",
      "task_id": 0,
      "update_timestamp": "2017-07-14 14:49:00.0"
    }
  3. Clear all THL on the master since it is no longer needed by any slaves:

    shell> thl purge
  4. Use the tungsten_set_position command on the master with the values we got from the slave with the lowest seqno to tell the master replicator to begin generating THL starting from that event in the MySQL binary logs:

    shell> tungsten_set_position --seqno=4 --epoch=0 --source-id=db1 --event-id=mysql-bin.00009:0000000000001844

    You may also use dsctl, but that requires executing the dsctl reset command first.

  5. Switch the master to online state:

    shell> trepctl online
  6. Switch the slaves to online state once the master is fully online:

    shell> trepctl online