Skip to main content
Common Reference

Transaction History Log (THL)

The Transaction History Log (THL) stores transactional data from different data servers in a universal format that is then used to exchange and transfer the information between replicator instances. Because the THL is stored and independently managed from the data servers that it reads and writes, the data can be moved, exchanged, and transmuted during processing.

The THL is created by any replicator service acting as a Primary, where the information is read from the database using the native format, such as the MySQL binary log, and writing the information to the THL. Once in the THL, the THL data can be exchanged with other processes, including transmission over the network, and then applied to a destination database. Within Tungsten Replicator, this process is handled through the pipeline stages that read and write information between the THL and internal queues.

Information stored in THL is recorded in a series of event records in sequential format. The THL therefore acts as a queue of the transactions. On a replicator reading data from a database, the THL represents the queue of transactions applied on the source database. On a replicator applying that information to a database, the THL represents the list of the transactions to be written. The THL has the following properties:

  • THL is a sequential list of events
  • THL events are written to a THL file through a single thread (to enforce the sequential nature)
  • THL events can be read from individually or sequentially, and multiple threads can read the same THL at the same time
  • THL events are immutable; once stored, the contents of the THL are never modified or individually deleted (although entire files may be deleted)
  • THL is written to disk without any buffering to prevent software failure causing a problem; the operating system buffers are used.

THL data is stored on disk within the thl directory of your installation. The exact location can be configured using thl-directory. A sample directory is shown below:

total 710504
-rw-r--r-- 1 tungsten tungsten 0 May 2 10:48 disklog.lck
-rw-r--r-- 1 tungsten tungsten 100042900 Jun 4 10:10 thl.data.0000000013
-rw-rw-r-- 1 tungsten tungsten 101025311 Jun 4 11:41 thl.data.0000000014
-rw-rw-r-- 1 tungsten tungsten 100441159 Jun 4 11:43 thl.data.0000000015
-rw-rw-r-- 1 tungsten tungsten 100898492 Jun 4 11:44 thl.data.0000000016
-rw-rw-r-- 1 tungsten tungsten 100305613 Jun 4 11:44 thl.data.0000000017
-rw-rw-r-- 1 tungsten tungsten 100035516 Jun 4 11:44 thl.data.0000000018
-rw-rw-r-- 1 tungsten tungsten 101690969 Jun 4 11:45 thl.data.0000000019
-rw-rw-r-- 1 tungsten tungsten 23086641 Jun 5 21:55 thl.data.0000000020

The THL files have the format thl.data.#########, and the sequence number increases for each new log file. The size of each log file is controlled by the thl-log-file-size configuration parameter. The log files are automatically managed by Tungsten Replicator, with old files automatically removed according to the retention policy set by the thl-log-retention configuration parameter. The files can be viewed and managed by using the thl command. For more information, see "The THL Command".

THL Format

The THL is stored on disk in a specific format that combines the information about the SQL and row data, metadata about the environment in which the row changes and SQL changes were made (metadata), and the log specific information, including the source, database, and timestamp of the information.

A sample of the output is shown below, the information is taken from the output of the thl command:

SEQ# = 5 / FRAG# = 0 (last frag)
- FILE = thl.data.0000000001
- TIME = 2025-03-19 11:07:32.0
- EPOCH# = 0
- EVENTID = mysql-bin.000002:0000000000002081;1748
- SOURCEID = db1
- METADATA = [mysql_server_id=10;mysql_thread_id=1748;dbms_type=mysql;tz_aware=true;service=alpha;shard=hr]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [##charset = UTF-8, autocommit = 1, sql_auto_is_null = 0, foreign_key_checks = 1, unique_checks = 1,
sql_mode = 'NO_ENGINE_SUBSTITUTION,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,IGNORE_SPACE,NO_ZERO_DATE,NO_ZERO_IN_DATE',
character_set_client = 33, collation_connection = 33, collation_server = 255]
- SCHEMA = hr
- SQL(0) = insert into regions values (7, 'Europe')

The sample above shows the information for the SQL executed on a MySQL server. The EVENTID shows the MySQL binary log from which the statement has been read. The MySQL server has stored the information in the binary log using ROW mode; MIXED or STATEMENT mode stores the actual SQL statement that was executed. thl command output is provided in the table below:

THL Event Format
Displayed FieldInternal NameData typeSizeDescription
-record_lengthInteger4 bytesLength of the full record information, including this field
-record_typeByte1 byteEvent record type identifier
-header_lengthUnsigned int4 bytesLength of the header information
SEQ#seqnoUnsigned long8 bytesLog sequence number, a sequential value given to each log entry
FRAG#fragnoUnsigned short2 bytesEvent fragment number. An event can consist of multiple fragments of SQL or row log data
-last_fragByte1 byteIndicates whether the fragment is the last fragment in the sequence
EPOCH#epoch_numberUnsigned long8 bytesEvent epoch number. Used to identify log sections within the Primary THL
SOURCEIDsource_idUTF-8 StringVariable (null terminated)Event source ID, the hostname or identity of the dataserver that generated the event
EVENTIDevent_idUTF-8 StringVariable (null terminated)Event ID; in MySQL, for example, the binlog filename and position that contained the original event
SHARDIDshard_idUTF-8 StringVariable (null terminated)Shard ID to which the event belongs
TIMEtstampUnsigned long8 bytesTime of the commit that triggered the event
FILE-String-Filename of the THL file containing the event
-data_lengthUnsigned int4 bytesLength of the included event data
-eventBinaryVariableSerialized Java object containing the SQL or ROW data
METADATAPart of event--Metadata about the event
TYPEPart of event--Internal storage type of the event
OPTIONSPart of event--Options about the event operation
SCHEMAPart of event--Schema used in the event
SQLPart of event--SQL statement or row data
-crc_methodByte1 byteMethod used to compute the CRC for the event.
-crcUnsigned int4 bytesCRC of the event record (not including the CRC value)
  • SEQ# and FRAG#

    Individual events within the log are identified by a sequential SEQUENCE number. Events are further divided into individual fragments. Fragments are numbered from 0 within a given sequence number. Events are applied to the database wholesale, fragments are used to divide up the size of the statement or row information within the log file. The fragments are stored internally in memory before being applied to the database and therefore memory usage is directly affected by the size and number of fragments held in memory.

    The sequence number as generated during this process is unique and therefore acts as a global transaction ID across a deployment. It can be used to determine whether the Replicas and Primary are in sync, and can be used to identify individual transactions within the replication stream.

  • EPOCH#

    The EPOCH value is used as a check to ensure that the logs on the Replica and the Primary match. The EPOCH is stored in the THL, and a new EPOCH is generated each time a Primary goes online. The EPOCH value is then written and stored in the THL alongside each individual event. The EPOCH acts as an additional check, beyond the sequence number, to validate the information between the Replica and the Primary. The EPOCH value is used to prevent the following situations:

    • In the event of a failover within a Tungsten Cluster, where there are events stored in the Primary log, but which did not make it to a Replica, the EPOCH acts as a check so that when the Primary rejoins as the Replica, the EPOCH numbers will not match the Replica and the new Primary. The trapped transactions can be identified by examining the THL output.

    • When a Replica joins a Primary, the existence of the EPOCH prevents the Replica from accepting events that happen to match only the sequence number, but not the corresponding EPOCH.

    Each time a Tungsten Replicator Primary goes online, the EPOCH number is incremented. When the Replica connects, it requests the SEQUENCE and EPOCH, and the Primary confirms that the requested SEQUENCE has the requested EPOCH. If not, the request is rejected and the Replica gets a validation error:

    pendingExceptionMessage: Client handshake failure: Client response validation failed:
    Log epoch numbers do not match: client source ID=west-db2 seqno=408129
    server epoch number=408128 client epoch number=189069

    When this error occurs, the THL should be examined and compared between the Primary and Replica to determine if there really is a mismatch between the two databases. For more information, see .

  • SOURCEID

    The SOURCEID is a string identifying the source of the event stored in the THL. Typically it is the hostname or host identifier.

  • EVENTID

    The EVENTID is a string identifying the source of the event information in the log. Within a MySQL installation
    the EVENTID contains the binary log name and position which provided the original statement or row data.

    Note

    The event ID shown is the end of the corresponding event stored in the THL, not the beginning. When examining the mysqlbinlog for an sequence ID in the THL, you should check the EVENTID of the previous THL sequence number to determine where to start looking within the binary log.

  • TIME

    When the source information is committed to the database, that information is stored into the corresponding binary log. That information is stored in the THL. The time recorded in the THL is the time the data was committed, not the time the data was recorded into the log file.

    The TIME value as stored in the THL is used to compute latency information when reading and applying data on a Replica.

  • METADATA

    Part of the binary EVENT payload stored within the event fragment, the metadata is collected and stored in the fragment based on information generated by the replicator. The information is stored as a series of key/value pairs. Examples of the information stored include:

    • MySQL server ID

    • Source database type

    • Name of the Replicator service that generated the THL

    • Any 'heartbeat' operations sent through the replicator service, including those automatically generated by the service, such as when the Primary goes online

    • The name of the shard to which the event belongs

    • Whether the contained data is safe to be applied through a block commit operation

  • TYPE

    The stored event type. Replicator has the potential to use a number of different stored formats for the THL data. The default type is based on the com.continuent.tungsten.replicator.event.ReplDBMSEvent.

  • OPTIONS

    Part of the EVENT binary payload, the OPTIONS include information about the individual event that have been extracted from the database. These include settings such as the autocommit status, character set and other information, which is used when the information is applied to the database.

    There will be one OPTIONS block for each SQL statement stored in the event.

  • SCHEMA

    Part of the EVENT structure, the SCHEMA provides the database or schema name in which the statement or row data was applied.

  • SHARDID

    When using parallel apply, provides the generated shard ID for the event when it is applied by the parallel applier thread.

  • SQL

    For statement based events, the SQL of the statement that was recorded. Multiple individual SQL statements as part of a transaction can be contained within a single event fragment.

    For example, the MySQL statement:

    mysql> INSERT INTO user VALUES (null, 'Charles', now());
    Query OK, 1 row affected (0.01 sec)

    Stores the following into the THL:

    SEQ# = 3583 / FRAG# = 0 (last frag)
    - FILE = thl.data.0000000001
    - TIME = 2025-01-27 11:49:45.0
    - EPOCH# = 2500
    - EVENTID = mysql-bin.000007:0000000625753960;0
    - SOURCEID = host1
    - METADATA = [mysql_server_id=1687011;dbms_type=mysql;service=firstrep;shard=test]
    - TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
    - SQL(0) = SET INSERT_ID = 3
    - OPTIONS = [##charset = ISO8859_1, autocommit = 1, sql_auto_is_null = 0,
    foreign_key_checks = 1, unique_checks = 1, sql_mode = '', character_set_client = 8,
    collation_connection = 8, collation_server = 8]
    - SCHEMA = test
    - SQL(1) = INSERT INTO user VALUES (null, 'Charles', now()) /* ___SERVICE___ = [firstrep] */

    For row based events, the information is further defined by the individual row data, including the action type (UPDATE, INSERT or DELETE), SCHEMA, TABLE and individual ROW data. For each ROW, there may be one or more COL (column) and identifying KEY event to identify the row on which the action is to be performed.

    The same statement when recorded in ROW format:

    SEQ# = 3582 / FRAG# = 0 (last frag)
    - FILE = thl.data.0000000001
    - TIME = 2025-03-19 11:07:32.0
    - EPOCH# = 2500
    - EVENTID = mysql-bin.000007:0000000625753710;0
    - SOURCEID = host1
    - METADATA = [mysql_server_id=1687011;dbms_type=mysql;service=firstrep;shard=test]
    - TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
    - SQL(0) =
    - ACTION = INSERT
    - SCHEMA = test
    - TABLE = user
    - ROW# = 0
    - COL(1: ) = 2
    - COL(2: ) = Charles
    - COL(3: ) = 2025-03-09 11:07:19.0