11.4.32. PrimaryKey Filter

The PrimaryKey filter adds primary key information to row-based replication data. This is required by heterogeneous environments to ensure that the primary key is identified when updating or deleting tables. Without this information, the primary key to use, for example as the document ID in a document store such as MongoDB, is generated dynamically. In addition, without this filter in place, when performing update or delete operations a full table scan is performed on the target dataserver to determine the record that must be updated.

Pre-configured filter name pkey
Classname com.continuent.tungsten.replicator.filter.PrimaryKeyFilter
Property prefix replicator.filter.pkey
Stage compatibility binlog-to-q
tpm Option compatibility --repl-svc-extractor-filters
Data compatibility Row events
Keeps Cached Data Yes
Cached Refreshed When? Emptied when going OFFLINE; Updated when ALTER statement seen
Metadata Updated Yes; tungsten_filter_primarykey=true
Parameters
Parameter Type Default Description
user string ${replicator.global.extract.db.user} The username for the connection to the database for looking up column definitions
password string ${replicator.global.extract.db.password} The password for the connection to the database for looking up column definitions
url string jdbc:mysql:thin://${replicator.global.extract.db.host}:» ${replicator.global.extract.db.port}/${replicator.schema}?createDB=true JDBC URL of the database connection to use for looking up column definitions
addPkeyToInsert boolean false If set to true, primary keys are added to INSERT operations. This setting is required for batch loading
addColumnsToDeletes boolean false If set to true, full column metadata is added to DELETE operations. This setting is required for batch loading
custompkey filename (empty) If set to the value of a JSON file, the file is used to determine custom primary key specifications in place of database derived versions. See Section 11.4.32.1, “Setting Custom Primary Key Definitions”.
reconnectTimeout integer 3600 Timeout for refreshing connection to database

Note

This filter is designed to be used for testing and with heterogeneous replication where the field name information can be used to construct and build target data structures.

For example, in the following THL fragment, the key information includes data for all columns, which the is the default behavior for UPDATE and DELETE operations.

SEQ# = 142 / FRAG# = 0 (last frag)
- TIME = 2013-08-01 19:31:04.0
- EPOCH# = 122
- EVENTID = mysql-bin.000012:0000000000022187;0
- SOURCEID = host31
- METADATA = [mysql_server_id=1;dbms_type=mysql;service=alpha;shard=test]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [foreign_key_checks = 1, unique_checks = 1]
- SQL(0) =
 - ACTION = UPDATE
 - SCHEMA = test
 - TABLE = salesadv
 - ROW# = 0
  - COL(1: id) = 2
  - COL(2: country) = 1
  - COL(3: city) = 8374
  - COL(4: salesman) = 1
  - COL(5: value) = 89000.00
  - KEY(1: id) = 2
  - KEY(2: country) = 1
  - KEY(3: city) = 8374
  - KEY(4: salesman) = 1
  - KEY(5: value) = 89000.00

When the PrimaryKey filter is enabled, the key information has been optimized to only contain the actual primary keys in the row-based THL record:

SEQ# = 142 / FRAG# = 0 (last frag)
- TIME = 2013-08-01 19:31:04.0
- EPOCH# = 122
- EVENTID = mysql-bin.000012:0000000000022187;0
- SOURCEID = host31
- METADATA = [mysql_server_id=1;dbms_type=mysql;service=alpha;shard=test]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [foreign_key_checks = 1, unique_checks = 1]
- SQL(0) =
 - ACTION = UPDATE
 - SCHEMA = test
 - TABLE = salesadv
 - ROW# = 0
  - COL(1: id) = 2
  - COL(2: country) = 1
  - COL(3: city) = 8374
  - COL(4: salesman) = 1
  - COL(5: value) = 89000.00
  - KEY(1: id) = 2

The final line shows the addition of the primary key id added to THL event.

Important

The filter determines primary key information by examining the DDL for the table, and keeping that information in an internal cache. If the DDL for a table is not known, or an ALTER TABLE statement is identified, the cache information is updated before the THL is then modified with the primary key information.

In the situation where you enable the filter, but have not create primary key information on the tables, it is possible that creating or adding other index types (such as UNIQUE) on a table, could lead to the incorrect primary key information being updated in the THL, particularly if there are active transactions taking place during and/or immediately after the ALTER statement.

The safest way to perform an index update in case remains the same as for any safe DDL update:

  • Put the replicator offline

  • Change the DDL for the table or tables

  • Put the replicator online

The two options, addPkeyToInsert and addColumnsToDeletes add the primary key information to INSERT and DELETE operations respectively. In a heterogeneous environment, these options should be enabled to prevent full-table scans during update and deletes.

11.4.32.1. Setting Custom Primary Key Definitions

Version Support: 6.0.0

Custom primary key configuration support available in 6.0.0 or later.

Not all tables and databases set or provide explicit primary key information, and in some cases, it is not possible to change the index definition on the source table to include primary key information. Without primary key information in the THL, replicating data into heterogeneous targets can fail, because there is no way for the target environment to correctly identify the primary key information, and therefore the specific record or records to be updated.

The pkey filter supports defining custom columns to make up a primary key in this situation, avoiding the need to explicitly set index information within the database.

The custom primary key setting works as follows:

  • When processing a THL entry, if the custom primary key configuration file has been set and exists, and the incoming schema/table name has been defined in the configuraiton file, the custom pkey configuration is used.

  • Otherwise, the primary key information is taken from the database if it exists.

If neither of these conditions is met, then no primary key data is added to the THL during process.

To configure a custom primary key for one or more tables:

  1. Copy the sample primary key configuration file, located within the distribution as tungsten/tungsten-replicator/support/filters-config/custompkey.json into the share directory within the installation. For example, /opt/continuent/share.

  2. Update the configuration to include the specific primary key settings for the incoming table. The format of the file is JSON, using a structured layout based on the common JSON filter configuration format (see Section 11.5, “Standard JSON Filter Configuration”). The sample file contains an example for the schema test and the table msg:

    {
        "__default": {
            "IGNORE" : "pkey"
        },
        "test" : {
            "msg" : {
                "msg" : "pkey"
            }
        }
    }

    In the above example, msg is the name of the column to be specified as the primary key. The pkey indicates that this column must be a primary key field. You can also specify multiple columns:

    {
        "__default": {
            "IGNORE" : "pkey"
        },
        "test" : {
            "msg" : {
                "id"  : "pkey",
                "msg" : "pkey"
            }
        }
    }

    To include another table within the same schema:

    {
        "__default": {
            "IGNORE" : "pkey"
        },
        "test" : {
            "msg" : {
                "msg" : "pkey"
            },
            "orders" : {
                "orderid" : "pkey"
            }
        }
    }

    Note

    The __default section must remain in place, although it has no impact on processing, it is used to ensure the file is valid for the filter configuration.

  3. Update the configuration of your installtion through tpm to specify the custom primary key file in the properties:

    shell> tpm update alpha --property=replicator.filter.pkey.custompkey=/opt/continuent/share/custompkey.json

Once this process has been completed, the replicator will add the custom primary key fields to the THL during processing. For example:

- SQL(0) =
 - ACTION = INSERT
 - SCHEMA = test
 - TABLE = msg
 - ROW# = 0
  - COL(1: id) = 6
  - COL(2: msg) = new test
  - KEY(1: id) = NULL