13.1. Block Commit

The replicator commits changes read from the THL and commits these changes in Replicas during the applier stage according to the block commit size or interval. These replace the single replicator.global.buffer.size parameter that controls the size of the buffers used within each stage of the replicator.

When applying transactions to the database, the decision to commit a block of transactions is controlled by two parameters:

The default operation is for block commits to take place based on the transaction count. Commits by the timer are disabled. The default block commit size is 10 transactions from the incoming stream of THL data; the block commit interval is zero (0), which indicates that the interval is disabled.

When both parameters are configured, block commit occurs when either value limit is reached. For example,if the event count is set to 10 and the commit interval to 50s, events will be committed by the applier either when the event count hits 10 or every 50 seconds, whichever is reached first. This means, for example, that even if only one transaction exists, when the 50 seconds is up, that single transaction will be applied.

In addition, the execution of implied commits during specific events within the replicator can also be controlled to prevent fragmented block commits by using the replicator.stage.q-to-dbms.blockCommitPolicy property. This property can have either of the following values:

  • strict — Commit block on service name changes, multiple fragments in a transaction, or unsafe_for_block_commit. This is the default setting.

  • lax — Don't commit in any of these cases.

The block commit size can be controlled using the --repl-svc-applier-block-commit-size option to tpm, or through the blockCommitRowCount.

The block commit interval can be controlled using the --repl-svc-applier-block-commit-interval option to tpm, or through the blockCommitInterval. If only a number is supplied, it is used as the interval in milliseconds. Suffix of s, m, h, and d for seconds, minutes, hours and days are also supported.

shell> ./tools/tpm update alpha \
    --repl-svc-applier-block-commit-size=20 \
    --repl-svc-applier-block-commit-interval=100s

Note

The block commit parameters are supported only in applier stages; they have no effect in other stages.

Modification of the block commit interval should be made only when the commit window needs to be altered. The setting can be particularly useful in heterogeneous deployments where the nature and behaviour of the target database is different to that of the source extractor.

For example, when replicating to Oracle, reducing the number of transactions within commits reduces the locks and overheads:

shell> ./tools/tpm update alpha \
    --repl-svc-applier-block-commit-interval=500

This would apply two commits every second, regardless of the block commit size.

When replicating to a data warehouse engine, particularly when using batch loading, such as Redshift, Vertica and Hadoop, larger block commit sizes and intervals may improve performance during the batch loading process:

shell> ./tools/tpm update alpha \
    --repl-svc-applier-block-commit-size=100000 \
    --repl-svc-applier-block-commit-interval=60s

This sets a large block commit size and interval enabling large batch loading.