Channels and Parallel Apply
Parallel apply works by using multiple threads for the final stage of the
replication pipeline. These threads are known as channels. Restart points
for each channel are stored as individual rows in table
trep_commit_seqno
if you are
applying to a relational DBMS server, including MySQL, Oracle, and data
warehouse products like Vertica.
When you set the channels
argument, the
tpm program configures the replication service to
enable the requested number of channels. A value of 1 results in
single-threaded operation.
Do not change the number of channels without setting the replicator offline cleanly. See the procedure later in this page for more information.
How Many Channels Are Enough?
Pick the smallest number of channels that loads the Replica fully. For evenly distributed workloads this means that you should increase channels so that more threads are simultaneously applying updates and soaking up I/O capacity. As long as each shard receives roughly the same number of updates, this is a good approach.
For unevenly distributed workloads, you may want to decrease channels to spread the workload more evenly across them. This ensures that each channel has productive work and minimizes the overhead of updating the channel position in the DBMS.
Once you have maximized I/O on the DBMS server leave the number of channels alone. Note that adding more channels than you have shards does not help performance as it will lead to idle channels that must update their positions in the DBMS even though they are not doing useful work. This actually slows down performance a little bit.
Effect of Channels on Backups
If you back up a Replica that operates with more than one channel, say 30, you can only restore that backup on another Replica that operates with the same number of channels. Otherwise, reloading the backup is the same as changing the number of channels without a clean offline.
When operating Tungsten Replicator in a Tungsten cluster, you should always set the number of channels to be the same for all replicators. Otherwise you may run into problems if you try to restore backups across MySQL instances that load with different locations.
If the replicator has only a single channel enabled, you can restore the backup anywhere. The same applies if you run the backup after the replicator has been taken offline cleanly.