5.5. Batch Loading for Data Warehouses

5.5. Batch Loading for Data Warehouses
Prev	^Up	Chapter 5. Deployment: Advanced	Next

5.5. Batch Loading for Data Warehouses

5.5.1. How It Works
5.5.2. Important Limitations
5.5.3. Batch Applier Setup
5.5.4. JavaScript Batchloader Scripts
5.5.5. Staging Tables
5.5.6. Character Sets
5.5.7. Supported CSV Formats
5.5.8. Columns in Generated CSV Files
5.5.9. Batchloading Opcodes
5.5.10. Time Zones
5.5.11. Batch Loading into MySQL
5.5.12. Data File Partitioning

Tungsten Replicator normally applies SQL changes to Targets by constructing SQL statements and executing in the exact order that transactions appear in the Tungsten History Log (THL). This works well for OLTP databases like MySQL, Oracle, and MongoDB. However, it is a poor approach for data warehouses.

Data warehouse products like Vertica or Redshift load very slowly through JDBC interfaces (50 times slower or even more compared to MySQL). Instead, such databases supply batch loading commands that upload data in parallel. For instance Vertica uses the COPY command.

Tungsten Replicator has a batch applier named SimpleBatchApplier that groups transactions and then loads data. This is known as "batch apply." You can configure Tungsten to load 10s of thousands of transactions at once using template that apply the correct commands for your chosen data warehouse.

While we use the term batch apply Tungsten is not batch-oriented in the sense of traditional Extract/Transfer/Load tools, which may run only a small number of batches a day. Tungsten builds batches automatically as transactions arrive in the log. The mechanism is designed to be self-adjusting. If small transaction batches cause loading to be slower, Tungsten will automatically tend to adjust the batch size upwards until it no longer lags during loading.

Prev	Up	Next
5.4.8. Disk vs. Memory Parallel Queues	^Level	5.5.1. How It Works

Continuent Documentation

5.5. Batch Loading for Data Warehouses