12.3.33. PrimaryKey Filter

12.3.33. PrimaryKey Filter
Prev	^Up	12.3. Filter Reference	Next

12.3.33. PrimaryKey Filter

12.3.33.1. Setting Custom Primary Key Definitions

The PrimaryKey filter adds primary key information to row-based replication data. This is required by heterogeneous environments to ensure that the primary key is identified when updating or deleting tables. Without this information, the primary key to use, for example as the document ID in a document store such as MongoDB, is generated dynamically. In addition, without this filter in place, when performing update or delete operations a full table scan is performed on the target dataserver to determine the record that must be updated.

Pre-configured filter name		`pkey`
Classname		`com.continuent.tungsten.replicator.filter.PrimaryKeyFilter`
Property prefix		`replicator.filter.pkey`
Stage compatibility		`binlog-to-q`
tpm Option compatibility		`--repl-svc-extractor-filters`
Data compatibility		Row events
Keeps Cached Data		Yes
Cached Refreshed When?		Emptied when going `OFFLINE`; Updated when `ALTER` statement seen
Metadata Updated		Yes; `tungsten_filter_primarykey=true`
Parameters
Parameter	Type	Default	Description
`user`	string	`${replicator.global.extract.db.user}`	The username for the connection to the database for looking up column definitions
`password`	string	`${replicator.global.extract.db.password}`	The password for the connection to the database for looking up column definitions
`url`	string	`jdbc:mysql:thin://${replicator.global.extract.db.host}:» ${replicator.global.extract.db.port}/${replicator.schema}?createDB=true`	JDBC URL of the database connection to use for looking up column definitions
`addPkeyToInsert`	`boolean`	`false`	If set to true, primary keys are added to `INSERT` operations. This setting is required for batch loading
`addColumnsToDeletes`	`boolean`	`false`	If set to true, full column metadata is added to `DELETE` operations. This setting is required for batch loading
`custompkey`	`filename`	`(empty)`	If set to the value of a JSON file, the file is used to determine custom primary key specifications in place of database derived versions. See Section 12.3.33.1, “Setting Custom Primary Key Definitions”.
`reconnectTimeout`	integer	3600	Timeout for refreshing connection to database

Note

This filter is designed to be used for testing and with heterogeneous replication where the field name information can be used to construct and build target data structures.

For example, in the following THL fragment, the key information includes data for all columns, which the is the default behavior for UPDATE and DELETE operations.

SEQ# = 142 / FRAG# = 0 (last frag)
- TIME = 2013-08-01 19:31:04.0
- EPOCH# = 122
- EVENTID = mysql-bin.000012:0000000000022187;0
- SOURCEID = host31
- METADATA = [mysql_server_id=1;dbms_type=mysql;service=alpha;shard=test]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [foreign_key_checks = 1, unique_checks = 1]
- SQL(0) =
 - ACTION = UPDATE
 - SCHEMA = test
 - TABLE = salesadv
 - ROW# = 0
  - COL(1: id) = 2
  - COL(2: country) = 1
  - COL(3: city) = 8374
  - COL(4: salesman) = 1
  - COL(5: value) = 89000.00
  - KEY(1: id) = 2
  - KEY(2: country) = 1
  - KEY(3: city) = 8374
  - KEY(4: salesman) = 1
  - KEY(5: value) = 89000.00

When the PrimaryKey filter is enabled, the key information has been optimized to only contain the actual primary keys in the row-based THL record:

SEQ# = 142 / FRAG# = 0 (last frag)
- TIME = 2013-08-01 19:31:04.0
- EPOCH# = 122
- EVENTID = mysql-bin.000012:0000000000022187;0
- SOURCEID = host31
- METADATA = [mysql_server_id=1;dbms_type=mysql;service=alpha;shard=test]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [foreign_key_checks = 1, unique_checks = 1]
- SQL(0) =
 - ACTION = UPDATE
 - SCHEMA = test
 - TABLE = salesadv
 - ROW# = 0
  - COL(1: id) = 2
  - COL(2: country) = 1
  - COL(3: city) = 8374
  - COL(4: salesman) = 1
  - COL(5: value) = 89000.00
  - KEY(1: id) = 2

The final line shows the addition of the primary key id added to THL event.

Important

The filter determines primary key information by examining the DDL for the table, and keeping that information in an internal cache. If the DDL for a table is not known, or an ALTER TABLE statement is identified, the cache information is updated before the THL is then modified with the primary key information.

In the situation where you enable the filter, but have not create primary key information on the tables, it is possible that creating or adding other index types (such as UNIQUE) on a table, could lead to the incorrect primary key information being updated in the THL, particularly if there are active transactions taking place during and/or immediately after the ALTER statement.

The safest way to perform an index update in case remains the same as for any safe DDL update:

Put the replicator offline
Change the DDL for the table or tables
Put the replicator online

The two options, addPkeyToInsert and addColumnsToDeletes add the primary key information to INSERT and DELETE operations respectively. In a heterogeneous environment, these options should be enabled to prevent full-table scans during update and deletes.

12.3.33.1. Setting Custom Primary Key Definitions

Version Support: 6.0.0

Custom primary key configuration support available in 6.0.0 or later.

Not all tables and databases set or provide explicit primary key information, and in some cases, it is not possible to change the index definition on the source table to include primary key information. Without primary key information in the THL, replicating data into heterogeneous targets can fail, because there is no way for the target environment to correctly identify the primary key information, and therefore the specific record or records to be updated.

The pkey filter supports defining custom columns to make up a primary key in this situation, avoiding the need to explicitly set index information within the database.

The custom primary key setting works as follows:

When processing a THL entry, if the custom primary key configuration file has been set and exists, and the incoming schema/table name has been defined in the configuraiton file, the custom pkey configuration is used.
Otherwise, the primary key information is taken from the database if it exists.

If neither of these conditions is met, then no primary key data is added to the THL during process.

To configure a custom primary key for one or more tables:

Copy the sample primary key configuration file, located within the distribution as tungsten/tungsten-replicator/support/filters-config/custompkey.json into the share directory within the installation. For example, /opt/continuent/share.
Update the configuration to include the specific primary key settings for the incoming table. The format of the file is JSON, using a structured layout based on the common JSON filter configuration format (see Section 12.4, “Standard JSON Filter Configuration”). The sample file contains an example for the schema test and the table msg:
```
{
    "__default": {
        "IGNORE" : "pkey"
    },
    "test" : {
        "msg" : {
            "msg" : "pkey"
        }
    }
}
```
In the above example, msg is the name of the column to be specified as the primary key. The pkey indicates that this column must be a primary key field. You can also specify multiple columns:
```
{
    "__default": {
        "IGNORE" : "pkey"
    },
    "test" : {
        "msg" : {
            "id"  : "pkey",
            "msg" : "pkey"
        }
    }
}
```
To include another table within the same schema:
```
{
    "__default": {
        "IGNORE" : "pkey"
    },
    "test" : {
        "msg" : {
            "msg" : "pkey"
        },
        "orders" : {
            "orderid" : "pkey"
        }
    }
}
```
Note
The __default section must remain in place, although it has no impact on processing, it is used to ensure the file is valid for the filter configuration.
Update the configuration of your installtion through tpm to specify the custom primary key file in the properties:
```
shell> tpm update alpha --property=replicator.filter.pkey.custompkey=/opt/continuent/share/custompkey.json
```

Once this process has been completed, the replicator will add the custom primary key fields to the THL during processing. For example:

- SQL(0) =
 - ACTION = INSERT
 - SCHEMA = test
 - TABLE = msg
 - ROW# = 0
  - COL(1: id) = 6
  - COL(2: msg) = new test
  - KEY(1: id) = NULL

Prev	Up	Next
12.3.32. OptimizeUpdates Filter	^Level	12.3.34. PrintEvent Filter

Continuent Documentation