Rebuilding THL on the Primary
If THL is lost on a Primary before the events contained within it have been applied to the Replica(s), the THL will need to be rebuilt from the existing MySQL binary logs.
If the MySQL binary logs no longer exist, then recovery of the lost transactions in THL will NOT be possible.
The basic sequence of operation for recovering the THL on both Primary and Replicas is:
Gather the failing requested sequence numbers from all Replicas:
shell> trepctl status...pendingError : Event extraction failedpendingErrorCode : NONEpendingErrorEventId : NONEpendingErrorSeqno : -1pendingExceptionMessage: Client handshake failure: Client response validation failed: Master log »does not contain requested transaction: master source ID=db1 client source ID=db2 requested »seqno=22 client epoch number=22 master min seqno=27 master max seqno=27...In the above example, when Replica db2 comes back online, it requests a copy of the last seqno in local thl (22) from the Primary db1 to compare for data integrity purposes, which the Primary no longer has.
Keep a note of the lowest sequence number and the host that it is on across all Replicas for use in the next step.
On the Replica with the lowest failing requested seqno, get the epoch, source-id and event-id (binlog position) from the THL using the command
thl list -seqnospecifying the sequence number above. This information will be needed on the extractor (Primary) in a later step. You can add the option-headersto reduce the amount of output if required. For example:shell> thl list -seqno 22 -headersSEQ# = 22 / FRAG# = 0 (last frag)- FILE = thl.data.0000000001- TIME = 2025-02-10 11:04:42.0- EPOCH# = 22- EVENTID = mysql-bin.000011:0000000000004723;181- SOURCEID = db1There are two more ways of getting the same information using the
dsctlcommand, so use the one you are most comfortable with:shell> dsctl get[{"extract_timestamp":"2025-02-10 11:04:42.0","eventid":"mysql-bin.000011:0000000000004723;181","fragno":0,"last_frag":true,"seqno":22,"update_timestamp":"2025-02-10 11:04:47.0","shard_id":"tungsten_alpha","applied_latency":5,"epoch_number":22,"task_id":0,"source_id":"db1"}]shell> dsctl get -ascmddsctl set -seqno 22 -epoch 22 -event-id "mysql-bin.000011:0000000000004723;181" -source-id "db1"Additionally, if you are using v7.0.2 or later, you can also use
thl dsctl:shell> thl dsctl -seqno 22dsctl -service alpha set -reset -seqno 22 -epoch 22 -event-id "mysql-bin.000011:0000000000004723;181" -source-id "db1"Place the cluster into MAINTENANCE and take the primary replicator
OFFLINEshell> cctrl[LOGICAL] /alpha > set policy maintenance[LOGICAL] /alpha > replicator db1 offlineClear all THL on the Primary since it is no longer needed by any Replicas:
shell> thl purgeUse the
dsctlcommand on the Primary with the values we got from the Replica with the lowest seqno to tell the Primary replicator to begin generating THL starting from that event in the MySQL binary logs:NoteIf you used the
dsctl get -ascmdor thethl dsctlearlier, you may use that provided command now. Ensure the-resetoption is supplied.shell> dsctl set -reset -seqno 22 -epoch 22 -event-id "mysql-bin.000011:0000000000004723;181" -source-id "db1"Place the cluster back to AUTOMATIC. This should also return the replicator to the
ONLINEstate.shell> cctrl[LOGICAL] /alpha > set policy automaticSwitch the Replicas to online state once the Primary is fully online:
shell> trepctl online