The replicator commits changes read from the THL and commits these changes
in Replicas during the applier stage according to the block commit size or
interval. These replace the single
replicator.global.buffer.size
parameter that controls the size of the buffers used within each stage of
the replicator.
When applying transactions to the database, the decision to commit a block of transactions is controlled by two parameters:
When the event count reaches the specified event limit (set by
--svc-applier-block-commit-size
)
When the commit timer reaches the specified commit interval (set by
--svc-applier-block-commit-interval
)
The default operation is for block commits to take place based on the transaction count. Commits by the timer are disabled. The default block commit size is 10 transactions from the incoming stream of THL data; the block commit interval is zero (0), which indicates that the interval is disabled.
When both parameters are configured, block commit occurs when either value limit is reached. For example,if the event count is set to 10 and the commit interval to 50s, events will be committed by the applier either when the event count hits 10 or every 50 seconds, whichever is reached first. This means, for example, that even if only one transaction exists, when the 50 seconds is up, that single transaction will be applied.
In addition, the execution of implied commits during specific events within
the replicator can also be controlled to prevent fragmented block commits by
using the
replicator.stage.q-to-dbms.blockCommitPolicy
property. This property can have either of the following values:
strict
— Commit block on
service name changes, multiple fragments in a transaction, or
unsafe_for_block_commit. This is the default setting.
lax
— Don't commit in any of
these cases.
The block commit size can be controlled using the
--repl-svc-applier-block-commit-size
option
to tpm, or through the
blockCommitRowCount
.
The block commit interval can be controlled using the
--repl-svc-applier-block-commit-interval
option to tpm, or through the
blockCommitInterval
. If only a
number is supplied, it is used as the interval in milliseconds. Suffix of s,
m, h, and d for seconds, minutes, hours and days are also supported.
shell> ./tools/tpm update alpha \
--repl-svc-applier-block-commit-size=20 \
--repl-svc-applier-block-commit-interval=100s
The block commit parameters are supported only in applier stages; they have no effect in other stages.
Modification of the block commit interval should be made only when the commit window needs to be altered. The setting can be particularly useful in heterogeneous deployments where the nature and behaviour of the target database is different to that of the source extractor.
For example, when replicating to Oracle, reducing the number of transactions within commits reduces the locks and overheads:
shell> ./tools/tpm update alpha \
--repl-svc-applier-block-commit-interval=500
This would apply two commits every second, regardless of the block commit size.
When replicating to a data warehouse engine, particularly when using batch loading, such as Redshift, Vertica and Hadoop, larger block commit sizes and intervals may improve performance during the batch loading process:
shell> ./tools/tpm update alpha \
--repl-svc-applier-block-commit-size=100000 \
--repl-svc-applier-block-commit-interval=60s
This sets a large block commit size and interval enabling large batch loading.