3.9.2. Replicating from a Cluster to a Datawarehouse - Configuring the Cluster Nodes

There are the steps to configure a cluster to act as the source for a Cluster-Extractor replicator writing into a datawarehouse:

  • Enable MySQL ROW-based Binary Logging

    All MySQL databases running in clusters replicating to non-MySQL targets must operate in ROW-based replication mode to prevent data drift.

    This is required because replication to the datawarehouse environment must send the raw-data, rather than the statements which cannot be applied directly to a target datawarehouse.

    You must configure the my.cnf file to enable ROW-based binary logging:

    binlog-format = ROW

    ROW-based binary logging can also be enabled without restarting the MySQL server:

    mysql> select @@global.binlog_format\G
    *************************** 1. row ***************************
    @@global.binlog_format: MIXED
    1 row in set (0.00 sec)
    
    mysql> SET GLOBAL binlog_format = 'ROW';
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> select @@global.binlog_format\G
    *************************** 1. row ***************************
    @@global.binlog_format: ROW
    1 row in set (0.00 sec)
  • Enable and Configure the Extractor Filters

    Heterogeneous mode should be enabled within the cluster.

    The extractor filters and two associated properties add the column names and primary key details to the THL. This is required so that the information can be replicated into the datawarehouse correctly.

    For example, on every cluster node the lines below would be added to the /etc/tungsten/tungsten.ini file, then tpm update would be executed:

    [alpha]
    ...
    repl-svc-extractor-filters=colnames,pkey
    property=replicator.filter.pkey.addColumnsToDeletes=true
    property=replicator.filter.pkey.addPkeyToInserts=true

    For staging deployments, prepend two hyphens to each line and include on the command line.