3.1. How Heterogeneous Replication Works
Heterogeneous replication works slightly differently compared to the
native MySQL to MySQL replication. This is because SQL statements,
including both Data Manipulation Language (DML) and Data Definition
Language (DDL) cannot be executed on a target system as they were
extracted from the MySQL database. The SQL dialects are different, so that
an SQL statement on MySQL is not the same as an SQL statement on Oracle,
and differences in the dialects mean that either the statement would fail,
or would perform an incorrect operation.
On targets that do not support SQL of any kind, such as MongoDB,
replicating SQL statements would achieve nothing since they cannot be
executed at all.
All heterogeneous replication deployments therefore use row-based
replication. This extracts only the raw row data, not the statement
information. Because it is only row-data, it can be easily re-assembled or
constructed into another format, including statements in other SQL
dialects, native appliers for alternative formats, such as JSON or BSON,
or external CSV formats that enable the data to be loaded in bulk batches
into a variety of different targets.
MySQL to Oracle, Oracle to MySQL, and Oracle to Oracle Replication
Replication between Oracle or MySQL, in either direction, or
Oracle-to-Oracle replication, work as shown in
Figure 3.1, “Topologies: Heterogeneous Operation”.
Figure 3.1. Topologies: Heterogeneous Operation
The process works as follows:
Data is extracted from the source database. The exact method depends
on whether data is being extracted from MySQL or Oracle.
The MySQL server is configured to write transactions into the
MySQL binary log using row-based logging. This generates
information in the log in the form of the individual updated rows,
rather than the statement that was used to perform the update. For
example, instead of recording the statement:
INSERT INTO MSG VALUES (1,'Hello World');
The information is stored as a row entry against the updated
The information is written into the THL as row-based events, with
the event type (insert, update or delete) is appended to the
metadata of the THL event.
For Oracle CDC:
The Oracle Change Data Capture (CDC) system records the row-level
changes made to a table into a change table. Tungsten Replicator
reads the change information from the change tables and generates
row-based transactions within the THL.
In both cases, it is the raw row data that is stored in the THL.
Because the row data, not the SQL statement, has been recorded, the
differences in SQL dialects between the two databases does not need to
be taken into account. In fact, Data Definition Language (DDL) and
other SQL statements are deliberately ignored so that replication does
The row-based transactions stored in the THL are transferred from the
master to the slave.
On the slave (or applier) side, the row-based event data is wrapped
into a suitable SQL statement for the target database environment.
Because the raw row data is available, it can be constructed into any
suitable statement appropriate for the target database.
Native Applier Replication (e.g. MongoDB)
For heterogeneous replication where data is written into a target database
using a native applier, such as MongoDB, the row-based information is
written into the database using the native API. With MongoDB, for example,
data is reformatted into BSON and then applied into MongoDB using the
native insert/update/delete API calls.
For batch appliers, such as Vertica, the row-data is converted into CSV
files in batches. The format of the CSV file includes both the original
row data for all the columns of each table, and metadata on each line that
contain the unique
SEQNO and the operation type
(insert, delete or update). A modified form of the CSV is used in some
cases where the operation type is only an insert or delete, with updates
being translated into a delete followed by an insert of the updated
These temporary CSV files are then loaded into the native environment as
part of the replicator using a custom script that employs the specific
tools of that database that support CSV imports. The raw CSV data is
loaded into a staging table that contains the per-row metadata and the row
Depending on the batch environment, the loading of the data into the final
destination tables is performed either within the same script, or by using
a separate script. Both methods work in the same basic fashion; the base
table is updated using the data from the staging table, with each row
marked to be deleted, deleted, and the latest row (calculated from the
SEQNO) for each primary key) are then
Schema Creation and Replication
Because heterogeneous replication does not replicated SQL statements,
including DDL statements that would normally define and generate the table
structures, a different method must be used.
Tungsten Replicator includes a tool called ddlscan which
can read the schema definition from MySQL or Oracle and translate that
into the schema definition required on the target database. During the
process, differences in supported sizes and datatypes are identified and
either modified to a suitable value, or highlighted as a definition that
must be changed in the generated DDL.
Once this modified form of the DDL has been completed, it can then be
executed against the target database to generate the DDL required for
Tungsten Replicator to apply data. The same basic is used in batch loading
environments where a staging table is required, with the additional
staging columns added to the DDL automatically.
For MongoDB, where no explicitly DDL needs to be generated, the use of
ddlscan is not required.