All the features discussed in this section are only available from version 6.1.15 of Tungsten Replicator
There are occasions where Batch loading into MySQL may benefit your use case, such as loading large data warehouse environments, or where real-time replication isn't as critical.
A number of specific properties are available for MySQL targets, these are discussed below.
By Default, when loading into MySQL using the Batch Applier, the process executes
LOAD DATA INFILE
statements to load the CSV files into the database.
If you wish to install the applier on a remote host, this action would typically fail, therefore you need to enable the following property in the configuration:
property=replicator.applier.dbms.useLoadDataLocalInfile=true
Tungsten Replicator includes a number of useful filters, such as the ability to drop certain DML statements on a schema or table level.
If you wish to drop such statements on a per object basis, then you should continue to
use the skipbyevent filter, however if you want to drop ALL DELETE
DML, then you can enable the following property:
property=replicator.applier.dbms.skipDeletes=true
By dropping deletes, you will then subsequently expose yourself to errors should rows be reinserted later with the same Primary or Unique Key values. Typically, this feature would be only enabled when you plan to capture and log key violations. See Section 5.6.11.6, “Log rows violating Primary/Unique Keys” for more information.
If you wish to specify a different CHARSET to be used when the data is being loaded into the target database, this can be set using the following property, for example:
property=replicator.applier.dbms.loadCharset=utf8mb4
Typically, the batch loader is used for heterogeneous targets, and therefore by default DDL statements will be dropped. However, when applying into MySQL the DDL statements would be valid and can therefore be executed.
To enable this, you should set the following property:
property=replicator.applier.dbms.applyStatements=true
Any changes to existing tables, or creation of new tables, will only apply to the main base table. You will still need to manually make changes to the relevant staging and error tables (if used)
If you use a lot of foreign keys in your target database, due to the nature of batch loading, this could cause errors when tables may not be loaded in sequence meaning child/parent keys may only be validated after a complete transaction load.
To prevent this from happening, you can enable the property below which will force the batch loader to temporarily disable foreign key checks until after the full transaction has been loaded.
property=replicator.applier.dbms.disableForeignKeys=true
To prevent the replicator erroring on primary or unique key violations, you can instruct the replicator to log the offending rows in an error table, which will allow you to manually process the rows afterwards.
This is especially useful when you are dropping DELETE statements from the apply process
The following properties can be set to enable this:
property=replicator.applier.dbms.useUpdateOpcode=true
property=replicator.applier.dbms.batchLogDuplicateRows=true
By default, this feature will only check against PRIMARY KEYS
, if you
wish to also check against UNIQUE
keys, you will need the additional
property:
property=replicator.applier.dbms.fetchKeysFromDatabase=true
By default, the error rows will be logged into tables called error_xxx_
.
origTableName
These table will need precreating in the same way that you create the Staging tablesusing ddlscan, but supplying the table prefix, for example:
shell> ddlscan -db hr -template ddl-mysql-staging.vm -opt tablePrefix error_xxx_
You can choose a different prefix if you wish, by replacing the error_xxx with you choice in the above ddlscan statement. If you choose to do this, you will also need to supply the new prefix in your configuration using the following property:
property=replicator.applier.dbms.errorTablePrefix=your-prefix-here_
If you are loading 10's of thousands of rows per transaction, and your target tables are very large, this process could slow down the apply process as the applier will first need to ensure the row being inserted does not violate any keys. The use of this feature should be fully tested in a load test environment and the risks fully understood before using in production.