8.10. Using the Parallel Extractor
Version Support: 2.2.1 and later
The parallel extractor functionality was added in Tungsten Replicator 2.2.1,
and initially supported only extraction from Oracle masters.
The parallel extractor reads information from the source database schema in
chunks and then feeds this information into the THL data stream as row-based
INSERT operations. When the slave
connects, these are applied to the slave database as with a normal
INSERT operations. The parallel
extractor is particularly useful in heterogeneous environments such as
Oracle to MySQL where the slave data does already exist on the slave.
The basic provisioning process operates in two stages:
Provisioning data is extracted and inserted into the THL. One event is
used to contain all of the data from a single table. If the table is too
large to be contained in a single event, the data will be distributed
over multiple events.
Once provisioning has finished, data is extracted from the CDC as normal
and added to the THL using the normal THL extraction thread.
This allows existing data to be extracted and processed through the
replicator path, including filters within the applier. Once the initial data
has been extracted, the change data to be applied. A diagram of the
replication scheme at different stages is provided below:
Figure 8.2. Parallel Extractor: Extraction Sequence
The parallel extractor happens in a multi-threaded process that extracts
multiple tables, and multiple ranges from a single table in parallel. A
chunking thread identifies all the tables, and also identifies the keys and
chunks that can be extracted from each table. It then coordinates the
For example, when reading from two different tables in a single schema, the
process might look like the figure below:
Figure 8.3. Parallel Extractor: Extraction Operation
Because multiple threads are used to read information from the tables, the
process is very quick, although it implies additional load on the source
database server, since the queries must load all of the data into memory.
To use the parallel extractor to provision data into the slave, the
configuration must be performed as part of the installation process when
configuring the master replicator for the first time, or when
re-initialiazing the replicator on a master after a trepctl
To setup provisioning with parallel extractor:
Install master Tungsten Replicator using tpm, but do
not enable automatic starting (i.e. do not use the
Install the slave replicator as normal.
On the master:
Start the replicator in
OFFLINE mode using
replicator start offline:
replicator start offline
Put the replicator into the
ONLINE state, using the
trepctl online -provision
If you have an identifiable reference number, such as a the system
change number or MySQL event, then this can be specified on the
command-line to the trepctl online -provision
trepctl online -provision 40748375
During the provisioning process, the replicator will show the status
until all of the data has been read from the existing database.
The master will now start to read the information currently stored and
feed this information through a separate pipeline into the THL.
On the slave, start the replicator, or put the replicator online.
Statements from the master containing the provisioning information
should be replicated into the slave.
If the replicator is placed offline while the parallel extractor is still
extracting data, the extraction process will continue to run and insert
data until the extraction process has been completed.
Once the provisioned data has been inserted, replication will continue from
the position where changes started to occur after the replicator was