5.6.11. Batch Loading into MySQL

Note

All the features discussed in this section are only available from version 6.1.15 of Tungsten Replicator

There are occasions where Batch loading into MySQL may benefit your use case, such as loading large data warehouse environments, or where real-time replication isn't as critical.

A number of specific properties are available for MySQL targets, these are discussed below.

5.6.11.1. Configuring as an Offboard Batch Applier

By Default, when loading into MySQL using the Batch Applier, the process executes LOAD DATA INFILE statements to load the CSV files into the database.

If you wish to install the applier on a remote host, this action would typically fail, therefore you need to enable the following property in the configuration:

property=replicator.applier.dbms.useLoadDataLocalInfile=true

5.6.11.2. Drop Delete Statements

Tungsten Replicator includes a number of useful filters, such as the ability to drop certain DML statements on a schema or table level.

If you wish to drop such statements on a per object basis, then you should continue to use the skipbyevent filter, however if you want to drop ALL DELETE DML, then you can enable the following property:

property=replicator.applier.dbms.skipDeletes=true

Warning

By dropping deletes, you will then subsequently expose yourself to errors should rows be reinserted later with the same Primary or Unique Key values. Typically, this feature would be only enabled when you plan to capture and log key violations. See Section 5.6.11.6, “Log rows violating Primary/Unique Keys” for more information.

5.6.11.3. Configure CHARSET to use on Load

If you wish to specify a different CHARSET to be used when the data is being loaded into the target database, this can be set using the following property, for example:

property=replicator.applier.dbms.loadCharset=utf8mb4

5.6.11.4. Allow DDL Statements to execute

Typically, the batch loader is used for heterogeneous targets, and therefore by default DDL statements will be dropped. However, when applying into MySQL the DDL statements would be valid and can therefore be executed.

To enable this, you should set the following property:

property=replicator.applier.dbms.applyStatements=true

Warning

Any changes to existing tables, or creation of new tables, will only apply to the main base table. You will still need to manually make changes to the relevant staging and error tables (if used)

5.6.11.5. Disable Foreign Keys during load

If you use a lot of foreign keys in your target database, due to the nature of batch loading, this could cause errors when tables may not be loaded in sequence meaning child/parent keys may only be validated after a complete transaction load.

To prevent this from happening, you can enable the property below which will force the batch loader to temporarily disable foreign key checks until after the full transaction has been loaded.

property=replicator.applier.dbms.disableForeignKeys=true

5.6.11.6. Log rows violating Primary/Unique Keys

To prevent the replicator erroring on primary or unique key violations, you can instruct the replicator to log the offending rows in an error table, which will allow you to manually process the rows afterwards.

This is especially useful when you are dropping DELETE statements from the apply process

The following properties can be set to enable this:

property=replicator.applier.dbms.useUpdateOpcode=true
property=replicator.applier.dbms.batchLogDuplicateRows=true

By default, this feature will only check against PRIMARY KEYS, if you wish to also check against UNIQUE keys, you will need the additional property:

property=replicator.applier.dbms.fetchKeysFromDatabase=true

By default, the error rows will be logged into tables called error_xxx_origTableName.

These table will need precreating in the same way that you create the Staging tablesusing ddlscan, but supplying the table prefix, for example:

copy
shell> ddlscan -db hr -template ddl-mysql-staging.vm -opt tablePrefix error_xxx_

You can choose a different prefix if you wish, by replacing the error_xxx with you choice in the above ddlscan statement. If you choose to do this, you will also need to supply the new prefix in your configuration using the following property:

property=replicator.applier.dbms.errorTablePrefix=your-prefix-here_

Warning

If you are loading 10's of thousands of rows per transaction, and your target tables are very large, this process could slow down the apply process as the applier will first need to ensure the row being inserted does not violate any keys. The use of this feature should be fully tested in a load test environment and the risks fully understood before using in production.