D.1.5. The thl Directory

The transaction history log (THL) retains a copy of the SQL statements from each Primary host, and it is the information within the THL that is transferred between hosts and applied to the database. The THL information is written to disk and stored in the thl directory:

shell> ls -al /opt/continuent/thl/alpha/
total 2291984
drwxrwxr-x 2 tungsten tungsten      4096 Apr 16 13:44 .
drwxrwxr-x 3 tungsten tungsten      4096 Apr 15 15:53 ..
-rw-r--r-- 1 tungsten tungsten         0 Apr 15 15:53 disklog.lck
-rw-r--r-- 1 tungsten tungsten 100137585 Apr 15 18:13 thl.data.0000000001
-rw-r--r-- 1 tungsten tungsten 100134069 Apr 15 18:18 thl.data.0000000002
-rw-r--r-- 1 tungsten tungsten 100859685 Apr 15 18:26 thl.data.0000000003
-rw-r--r-- 1 tungsten tungsten 100515215 Apr 15 18:28 thl.data.0000000004
-rw-r--r-- 1 tungsten tungsten 100180770 Apr 15 18:31 thl.data.0000000005
-rw-r--r-- 1 tungsten tungsten 100453094 Apr 15 18:34 thl.data.0000000006
-rw-r--r-- 1 tungsten tungsten 100379260 Apr 15 18:35 thl.data.0000000007
-rw-r--r-- 1 tungsten tungsten 100294561 Apr 16 12:21 thl.data.0000000008
-rw-r--r-- 1 tungsten tungsten 100133258 Apr 16 12:24 thl.data.0000000009
-rw-r--r-- 1 tungsten tungsten 100293278 Apr 16 12:32 thl.data.0000000010
-rw-r--r-- 1 tungsten tungsten 100819317 Apr 16 12:34 thl.data.0000000011
-rw-r--r-- 1 tungsten tungsten 100250972 Apr 16 12:35 thl.data.0000000012
-rw-r--r-- 1 tungsten tungsten 100337285 Apr 16 12:37 thl.data.0000000013
-rw-r--r-- 1 tungsten tungsten 100535387 Apr 16 12:38 thl.data.0000000014
-rw-r--r-- 1 tungsten tungsten 100378358 Apr 16 12:40 thl.data.0000000015
-rw-r--r-- 1 tungsten tungsten 100198421 Apr 16 13:32 thl.data.0000000016
-rw-r--r-- 1 tungsten tungsten 100136955 Apr 16 13:34 thl.data.0000000017
-rw-r--r-- 1 tungsten tungsten 100490927 Apr 16 13:41 thl.data.0000000018
-rw-r--r-- 1 tungsten tungsten 100684346 Apr 16 13:41 thl.data.0000000019
-rw-r--r-- 1 tungsten tungsten 100225119 Apr 16 13:42 thl.data.0000000020
-rw-r--r-- 1 tungsten tungsten 100390819 Apr 16 13:43 thl.data.0000000021
-rw-r--r-- 1 tungsten tungsten 100418115 Apr 16 13:43 thl.data.0000000022
-rw-r--r-- 1 tungsten tungsten 100388812 Apr 16 13:44 thl.data.0000000023
-rw-r--r-- 1 tungsten tungsten  38275509 Apr 16 13:47 thl.data.0000000024

THL files are created on both the Primary and Replicas within the cluster. THL data can be examined using the thl command.

The THL is written into individual files, which are by default, no more than 1 GByte in size each. From the listing above, you can see that each file has a unique file index number. A new file is created when the file size limit is reached, and given the next THL log file number. To determine the sequence number that is stored within log, use the thl command:

shell> thl index
LogIndexEntry thl.data.0000000001(0:106)
LogIndexEntry thl.data.0000000002(107:203)
LogIndexEntry thl.data.0000000003(204:367)
LogIndexEntry thl.data.0000000004(368:464)
LogIndexEntry thl.data.0000000005(465:561)
LogIndexEntry thl.data.0000000006(562:658)
LogIndexEntry thl.data.0000000007(659:755)
LogIndexEntry thl.data.0000000008(756:1251)
LogIndexEntry thl.data.0000000009(1252:1348)
LogIndexEntry thl.data.0000000010(1349:1511)
LogIndexEntry thl.data.0000000011(1512:1609)
LogIndexEntry thl.data.0000000012(1610:1706)
LogIndexEntry thl.data.0000000013(1707:1803)
LogIndexEntry thl.data.0000000014(1804:1900)
LogIndexEntry thl.data.0000000015(1901:1997)
LogIndexEntry thl.data.0000000016(1998:2493)
LogIndexEntry thl.data.0000000017(2494:2590)
LogIndexEntry thl.data.0000000018(2591:2754)
LogIndexEntry thl.data.0000000019(2755:2851)
LogIndexEntry thl.data.0000000020(2852:2948)
LogIndexEntry thl.data.0000000021(2949:3045)
LogIndexEntry thl.data.0000000022(3046:3142)
LogIndexEntry thl.data.0000000023(3143:3239)
LogIndexEntry thl.data.0000000024(3240:3672)

The THL files are retained for seven days by default, although this parameter is configurable. Due to the nature and potential size required to store the information for the THL, you should monitor the disk space and usage.

The purge is continuous and is based on the date the log file was written. Each time the replicator finishes the current THL log file, it checks for files that have exceeded the defined retention configuration and spawns a job within the replicator to delete files older than the retention policy. Old files are only removed when the current THL log file rotates.

D.1.5.1. Purging THL Log Information on a Replica

Warning

Purging the THL on a Replica node can potentially remove information that has not yet been applied to the database. Please check and ensure that the THL data that you are purging has been applied to the database before continuing.

The THL files can be explicitly purged to recover disk space, but you should ensure that the currently applied sequence no to the database is not purged, and that additional hosts are not reading the THL information.

To purge the logs on a Replica node:

  1. Determine the highest sequence number from the THL that you want to delete. To purge the logs up until the latest sequence number, you can use trepctl to determine the highest applied sequence number:

    shell> trepctl services
    Processing services command...
    NAME              VALUE
    ----              -----
    appliedLastSeqno: 3672
    appliedLatency  : 331.0
    role            : slave
    serviceName     : alpha
    serviceType     : local
    started         : true
    state           : ONLINE
    Finished services command...
  2. Shun the Replica datasource and put the replicator into the offline state using cctrl:

    shell> cctrl
    [LOGICAL] /alpha > datasource host1 shun
    [LOGICAL] /alpha > replicator host1 offline

    Important

    NEVER Shun the Primary datasource!

  3. Use the thl command to purge the logs up to the specified transaction sequence number. You will be prompted to confirm the operation:

    shell> thl purge -high 3670
    WARNING: The purge command will break replication if you delete all events or »
       delete events that have not reached all slaves.
    Are you sure you wish to delete these events [y/N]?
    y
    Deleting events where SEQ# <=3670
    2013-04-16 14:09:42,384 [ - main] INFO  thl.THLManagerCtrl Transactions deleted
  4. Recover the host back into the cluster:

    shell> cctrl
    [LOGICAL] /alpha > datasource host1 recover

You can now check the current THL file information:

shell> thl index
LogIndexEntry thl.data.0000000024(3240:3672)

For more information on purging events using thl, see Section 9.22.4, “thl purge Command”.

D.1.5.2. Purging THL Log Information on a Primary

Warning

Purging the THL on a Primary node can potentially remove information that has not yet been applied to the Replica databases. Please check and ensure that the THL data that you are purging has been applied to the database on all Replicas before continuing.

Important

If the situation allows, it may be better to switch the Primary role to a current, up-to-date Replica, then perform the steps to purge THL from a Replica on the old Primary host using Section D.1.5.1, “Purging THL Log Information on a Replica”.

Warning

Follow the below steps with great caution! Failure to follow best practices will result in Replicas unable to apply transactions, forcing a full re-provisioning. For those steps, please see Section 6.6.1.1, “Provision or Reprovision a Replica”.

The THL files can be explicitly purged to recover disk space, but you should ensure that the currently applied sequence no to the database is not purged, and that additional hosts are not reading the THL information.

To purge the logs on a Primary node:

  1. Determine the highest sequence number from the THL that you want to delete. To purge the logs up until the latest sequence number, you can use trepctl to determine the highest applied sequence number:

    shell> trepctl services
    Processing services command...
    NAME              VALUE
    ----              -----
    appliedLastSeqno: 3675
    appliedLatency  : 0.835
    role            : master
    serviceName     : alpha
    serviceType     : local
    started         : true
    state           : ONLINE
    Finished services command...
  2. Set the cluster to Maintenance mode and put the replicator into the offline state using cctrl:

    shell> cctrl
    [LOGICAL] /alpha > set policy maintenance
    [LOGICAL] /alpha > replicator host1 offline
  3. Use the thl command to purge the logs up to the specified transaction sequence number. You will be prompted to confirm the operation:

    shell> thl purge -high 3670
    WARNING: The purge command will break replication if you delete all events or »
       delete events that have not reached all slaves.
    Are you sure you wish to delete these events [y/N]?
    y
    Deleting events where SEQ# <=3670
    2013-04-16 14:09:42,384 [ - main] INFO  thl.THLManagerCtrl Transactions deleted
  4. Set the cluster to Automatic mode and put the replicator online using cctrl:

    shell> cctrl
    [LOGICAL] /alpha > replicator host1 online
    [LOGICAL] /alpha > set policy automatic

You can now check the current THL file information:

shell> thl index
LogIndexEntry thl.data.0000000024(3240:3672)

For more information on purging events using thl, see Section 9.22.4, “thl purge Command”.

D.1.5.3. Moving the THL File Location

The location of the THL directory where THL files are stored can be changed, either by using a symbolic link or by changing the configuration to point to the new directory:

D.1.5.3.1. Relocating THL Storage using Symbolic Links

In an emergency, the directory currently holding the THL information, can be moved using symbolic links to relocate the files to a location with more space.

Moving the THL location requires updating the location for a Replica by temporarily setting the Replica offline, updating the THL location, and re-enabling back into the cluster:

  1. Shun the datasource and switch your node into the offline state using cctrl:

    shell> cctrl -expert
      [LOGICAL:EXPERT] /alpha > datasource host1 shun
      [LOGICAL:EXPERT] /alpha > replicator host1 offline
  2. Create a new directory, or attach a new filesystem and location on which the THL content will be located. You can use a directory on another filesystem or connect to a SAN, NFS or other filesystem where the new directory will be located. For example:

    shell> mkdir /mnt/data/thl
  3. Copy the existing THL directory to the new directory location. For example:

    shell> rsync -r /opt/continuent/thl/* /mnt/data/thl/
  4. Move the existing directory to a temporary location:

    shell> mv /opt/continuent/thl /opt/continuent/old-thl
  5. Create a symbolic link from the new directory to the original directory location:

    shell> ln -s /mnt/data/thl /opt/continuent/thl
  6. Recover the host back into the cluster :

    shell> cctrl -expert
      [LOGICAL:EXPERT] /alpha > datasource host1 recover

To change the THL location on a Primary:

  1. Manually promote an existing Replica to be the new Primary:

    [LOGICAL] /alpha > switch to host2
    SELECTED SLAVE: host2@alpha
    PURGE REMAINING ACTIVE SESSIONS ON CURRENT MASTER 'host1@alpha'
    PURGED A TOTAL OF 0 ACTIVE SESSIONS ON MASTER 'host1@alpha'
    FLUSH TRANSACTIONS ON CURRENT MASTER 'host1@alpha'
    PUT THE NEW MASTER 'host2@alpha' ONLINE
    PUT THE PRIOR MASTER 'host1@alpha' ONLINE AS A SLAVE
    RECONFIGURING SLAVE 'host3@alpha' TO POINT TO NEW MASTER 'host2@alpha'
    SWITCH TO 'host2@alpha' WAS SUCCESSFUL
  2. Update the THL location as provided in the previous sequence.

  3. Switch the updated Replica back to be the Primary:

    [LOGICAL] /alpha > switch to host1
    SELECTED SLAVE: host1@alpha
    PURGE REMAINING ACTIVE SESSIONS ON CURRENT MASTER 'host2@alpha'
    PURGED A TOTAL OF 0 ACTIVE SESSIONS ON MASTER 'host2@alpha'
    FLUSH TRANSACTIONS ON CURRENT MASTER 'host2@alpha'
    PUT THE NEW MASTER 'host1@alpha' ONLINE
    PUT THE PRIOR MASTER 'host2@alpha' ONLINE AS A SLAVE
    RECONFIGURING SLAVE 'host3@alpha' TO POINT TO NEW MASTER 'host1@alpha'
    SWITCH TO 'host1@alpha' WAS SUCCESSFUL

D.1.5.3.2. Relocating THL Storage using Configuration Changes

To permanently change the directory currently holding the THL information can by reconfigured to a new directory location.

To update the location for a Replica by temporarily setting the Replica offline, updating the THL location, and re-enabling back into the cluster:

  1. Shun the datasource and switch your node into the offline state using cctrl:

    shell> cctrl -expert
      [LOGICAL:EXPERT] /alpha > datasource host1 shun
      [LOGICAL:EXPERT] /alpha > replicator host1 offline
  2. Create a new directory, or attach a new filesystem and location on which the THL content will be located. You can use a directory on another filesystem or connect to a SAN, NFS or other filesystem where the new directory will be located. For example:

    shell> mkdir /mnt/data/thl
  3. Copy the existing THL directory to the new directory location. For example:

    shell> rsync -r /opt/continuent/thl/* /mnt/data/thl/
  4. Change the directory location using tpm to update the configuration for a specific host:

    shell> tpm update --thl-directory=/mnt/data/thl --host=host1
  5. Recover the host back into the cluster :

    shell> cctrl -expert
      [LOGICAL:EXPERT] /alpha > datasource host1 recover

To change the THL location on a Primary:

  1. Manually promote an existing Replica to be the new Primary:

    [LOGICAL] /alpha > switch to host2
    SELECTED SLAVE: host2@alpha
    PURGE REMAINING ACTIVE SESSIONS ON CURRENT MASTER 'host1@alpha'
    PURGED A TOTAL OF 0 ACTIVE SESSIONS ON MASTER 'host1@alpha'
    FLUSH TRANSACTIONS ON CURRENT MASTER 'host1@alpha'
    PUT THE NEW MASTER 'host2@alpha' ONLINE
    PUT THE PRIOR MASTER 'host1@alpha' ONLINE AS A SLAVE
    RECONFIGURING SLAVE 'host3@alpha' TO POINT TO NEW MASTER 'host2@alpha'
    SWITCH TO 'host2@alpha' WAS SUCCESSFUL
  2. Update the THL location as provided in the previous sequence.

  3. Switch the updated Replica back to be the Primary:

    [LOGICAL] /alpha > switch to host1
    SELECTED SLAVE: host1@alpha
    PURGE REMAINING ACTIVE SESSIONS ON CURRENT MASTER 'host2@alpha'
    PURGED A TOTAL OF 0 ACTIVE SESSIONS ON MASTER 'host2@alpha'
    FLUSH TRANSACTIONS ON CURRENT MASTER 'host2@alpha'
    PUT THE NEW MASTER 'host1@alpha' ONLINE
    PUT THE PRIOR MASTER 'host2@alpha' ONLINE AS A SLAVE
    RECONFIGURING SLAVE 'host3@alpha' TO POINT TO NEW MASTER 'host1@alpha'
    SWITCH TO 'host1@alpha' WAS SUCCESSFUL

D.1.5.4. Changing the THL Retention Times

THL files are by default retained for seven days, but the retention period can be adjusted according the to requirements of the service. Longer times retain the logs for longer, increasing disk space usage while allowing access to the THL information for longer. Shorter logs reduce disk space usage while reducing the amount of log data available.

Note

The files are automatically managed by Tungsten Cluster. Old THL files are deleted only when new data is written to the current files. If there has been no THL activity, the log files remain until new THL information is written.

Use the tpm update command to apply the --repl-thl-log-retention setting. The replication service will be restarted on each host with updated retention configuration.