There are the steps to configure a cluster to act as the source for a Cluster-Extractor replicator writing into a datawarehouse:
Enable MySQL ROW-based Binary Logging
All MySQL databases running in clusters replicating to non-MySQL targets must operate in ROW-based replication mode to prevent data drift.
This is required because replication to the datawarehouse environment must send the raw-data, rather than the statements which cannot be applied directly to a target datawarehouse.
You must configure the my.cnf
file to enable
ROW-based binary logging:
binlog-format = ROW
ROW-based binary logging can also be enabled without restarting the MySQL server:
mysql>select @@global.binlog_format\G
*************************** 1. row *************************** @@global.binlog_format: MIXED 1 row in set (0.00 sec) mysql>SET GLOBAL binlog_format = 'ROW';
Query OK, 0 rows affected (0.00 sec) mysql>select @@global.binlog_format\G
*************************** 1. row *************************** @@global.binlog_format: ROW 1 row in set (0.00 sec)
Enable and Configure the Extractor Filters
Heterogeneous mode should be enabled within the cluster.
The extractor filters and two associated properties add the column names and primary key details to the THL. This is required so that the information can be replicated into the datawarehouse correctly.
For example, on every cluster node the lines below would be added to
the /etc/tungsten/tungsten.ini
file, then
tpm update would be executed:
[alpha]
...
repl-svc-extractor-filters=colnames,pkey
property=replicator.filter.pkey.addColumnsToDeletes=true
property=replicator.filter.pkey.addPkeyToInserts=true
For staging deployments, prepend two hyphens to each line and include on the command line.