Continuent Documentation

Search:

11.4.30. PrimaryKey Filter

The PrimaryKey adds primary key information to row-based replication data. This is required by heterogeneous environments to ensure that the primary key is identified when updating or deleting tables. Without this information, the primary to use, for example as the document ID in a document store such as MongoDB, is generated dynamically. In addition, without this filter in place, when performing update or delete operations a full table scan is performed on the target dataserver to determine the record that must be updated.

Pre-configured filter name pkey
Classname com.continuent.tungsten.replicator.filter.PrimaryKeyFilter
Property prefix replicator.filter.pkey
Stage compatibility binlog-to-q
tpm Option compatibility --repl-svc-extractor-filters
Data compatibility Row events
Keeps Cached Data Yes
Cached Refreshed When? Emptied when going OFFLINE; Updated when ALTER statement seen
Metadata Updated Yes; tungsten_filter_primarykey=true
Parameters
Parameter Type Default Description
user string ${replicator.global.extract.db.user} The username for the connection to the database for looking up column definitions
password string ${replicator.global.extract.db.password} The password for the connection to the database for looking up column definitions
url string jdbc:mysql:thin://${replicator.global.extract.db.host}:${replicator.global.extract.db.port}/${replicator.schema}?createDB=true JDBC URL of the database connection to use for looking up column definitions
addPkeyToInsert boolean false If set to true, primary keys are added to INSERT operations. This setting is required for batch loading
addColumnsToDeletes boolean false If set to true, full column metadata is added to DELETE operations. This setting is required for batch loading

Note

This filter is designed to be used for testing and with heterogeneous replication where the field name information can be used to construct and build target data structures.

For example, in the following THL fragment, the key information includes data for all columns, which the is the default behavior for UPDATE and DELETE operations.

SEQ# = 142 / FRAG# = 0 (last frag)
- TIME = 2013-08-01 19:31:04.0
- EPOCH# = 122
- EVENTID = mysql-bin.000012:0000000000022187;0
- SOURCEID = host31
- METADATA = [mysql_server_id=1;dbms_type=mysql;service=alpha;shard=test]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [foreign_key_checks = 1, unique_checks = 1]
- SQL(0) =
 - ACTION = UPDATE
 - SCHEMA = test
 - TABLE = salesadv
 - ROW# = 0
  - COL(1: id) = 2
  - COL(2: country) = 1
  - COL(3: city) = 8374
  - COL(4: salesman) = 1
  - COL(5: value) = 89000.00
  - KEY(1: id) = 2
  - KEY(2: country) = 1
  - KEY(3: city) = 8374
  - KEY(4: salesman) = 1
  - KEY(5: value) = 89000.00

When the PrimaryKey is enabled, the key information has been optimized to only contain the actual primary keys are added to the row-based THL record:

SEQ# = 142 / FRAG# = 0 (last frag)
- TIME = 2013-08-01 19:31:04.0
- EPOCH# = 122
- EVENTID = mysql-bin.000012:0000000000022187;0
- SOURCEID = host31
- METADATA = [mysql_server_id=1;dbms_type=mysql;service=alpha;shard=test]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [foreign_key_checks = 1, unique_checks = 1]
- SQL(0) =
 - ACTION = UPDATE
 - SCHEMA = test
 - TABLE = salesadv
 - ROW# = 0
  - COL(1: id) = 2
  - COL(2: country) = 1
  - COL(3: city) = 8374
  - COL(4: salesman) = 1
  - COL(5: value) = 89000.00
  - KEY(1: id) = 2

The final line shows the addition of the primary key id added to THL event.

Important

The filter determines primary key information by examining the DDL for the table, and keeping that information in an internal cache. If the DDL for a table is not known, or an ALTER TABLE statement is identified, the cache information is updated before the THL is then modified with the primary key information.

In the situation where you enable the filter, but have not create primary key information on the tables, it is possible that creating or adding other index types (such as UNIQUE) on a table, could lead to the incorrect primary key information being updated in the THL, particularly if there are active transactions taking place during and/or immediately after the ALTER statement.

The safest way to perform an index update in case remains the same as for any safe DDL update:

  • Put the replicator offline

  • Change the DDL for the table or tables

  • Put the replicator online

The two options, addPkeyToInsert and addColumnsToDeletes add the primary key information to INSERT and DELETE operations respectively. In a heterogeneous environment, these options should be enabled to prevent full-table scans during update and deletes.