7.2. Batch Loading for Data Warehouses
Tungsten Replicator normally applies SQL changes to slaves by
constructing SQL statements and executing in the exact order that
transactions appear in the Tungsten History Log (THL). This works well for
OLTP databases like MySQL, Oracle, and MongoDB. However, it is a poor
approach for data warehouses.
Data warehouse products like Vertica or GreenPlum load very slowly through
JDBC interfaces (50 times slower or even more compared to MySQL). Instead,
such databases supply batch loading commands that upload data in parallel.
For instance Vertica uses the
Tungsten Replicator has a batch applier named
SimpleBatchApplier that groups transactions and then loads data. This is
known as "batch apply." You can configure Tungsten to load 10s of thousands
of transactions at once using template that apply the correct commands for
your chosen data warehouse.
While we use the term batch apply Tungsten is not
batch-oriented in the sense of traditional Extract/Transfer/Load tools,
which may run only a small number of batches a day. Tungsten builds batches
automatically as transactions arrive in the log. The mechanism is designed
to be self-adjusting. If small transaction batches cause loading to be
slower, Tungsten will automatically tend to adjust the batch size upwards
until it no longer lags during loading.