Tungsten Clustering

Rebuilding THL on the Primary

If THL is lost on a Primary before the events contained within it have been applied to the Replica(s), the THL will need to be rebuilt from the existing MySQL binary logs.

important

If the MySQL binary logs no longer exist, then recovery of the lost transactions in THL will NOT be possible.

The basic sequence of operation for recovering the THL on both Primary and Replicas is:

Gather the failing requested sequence numbers from all Replicas:

shell> trepctl status

...
pendingError           : Event extraction failed
pendingErrorCode       : NONE
pendingErrorEventId    : NONE
pendingErrorSeqno      : -1
pendingExceptionMessage: Client handshake failure: Client response validation failed: Master log &raquo;
does not contain requested transaction: master source ID=db1 client source ID=db2 requested &raquo;
seqno=22 client epoch number=22 master min seqno=27 master max seqno=27
...

In the above example, when Replica db2 comes back online, it requests a copy of the last seqno in local thl (22) from the Primary db1 to compare for data integrity purposes, which the Primary no longer has.

Keep a note of the lowest sequence number and the host that it is on across all Replicas for use in the next step.

On the Replica with the lowest failing requested seqno, get the epoch, source-id and event-id (binlog position) from the THL using the command thl list -seqno specifying the sequence number above. This information will be needed on the extractor (Primary) in a later step. You can add the option -headers to reduce the amount of output if required. For example:

shell> thl list -seqno 22 -headers

SEQ# = 22 / FRAG# = 0 (last frag)
- FILE = thl.data.0000000001
- TIME = 2025-02-10 11:04:42.0
- EPOCH# = 22
- EVENTID = mysql-bin.000011:0000000000004723;181
- SOURCEID = db1

There are two more ways of getting the same information using the dsctl command, so use the one you are most comfortable with:

shell> dsctl get
[{"extract_timestamp":"2025-02-10 11:04:42.0","eventid":"mysql-bin.000011:0000000000004723;181",
"fragno":0,"last_frag":true,"seqno":22,"update_timestamp":"2025-02-10 11:04:47.0",
"shard_id":"tungsten_alpha","applied_latency":5,"epoch_number":22,"task_id":0,"source_id":"db1"}]

shell> dsctl get -ascmd
dsctl set -seqno 22 -epoch 22 -event-id "mysql-bin.000011:0000000000004723;181" -source-id "db1"

Additionally, if you are using v7.0.2 or later, you can also use thl dsctl:

shell> thl dsctl -seqno 22

dsctl -service alpha set -reset -seqno 22 -epoch 22 -event-id "mysql-bin.000011:0000000000004723;181" -source-id "db1"

Place the cluster into MAINTENANCE and take the primary replicator OFFLINE

shell> cctrl
[LOGICAL] /alpha > set policy maintenance
[LOGICAL] /alpha > replicator db1 offline

Clear all THL on the Primary since it is no longer needed by any Replicas:
```
shell> thl purge
```
Use the dsctl command on the Primary with the values we got from the Replica with the lowest seqno to tell the Primary replicator to begin generating THL starting from that event in the MySQL binary logs:
Note
If you used the dsctl get -ascmd or the thl dsctl earlier, you may use that provided command now. Ensure the -reset option is supplied.
```
shell> dsctl set -reset -seqno 22 -epoch 22 -event-id "mysql-bin.000011:0000000000004723;181" -source-id "db1"
```
Place the cluster back to AUTOMATIC. This should also return the replicator to the ONLINE state.
```
shell> cctrl
[LOGICAL] /alpha > set policy automatic
```
Switch the Replicas to online state once the Primary is fully online:
```
shell> trepctl online
```