The PrimaryKey
filter adds primary key information to
row-based replication data. This is required by heterogeneous environments
to ensure that the primary key is identified when updating or deleting
tables. Without this information, the primary key to use, for example as the
document ID in a document store such as MongoDB, is generated dynamically.
In addition, without this filter in place, when performing update or
delete operations a full table scan is performed on the target dataserver
to determine the record that must be updated.
Pre-configured filter name |
pkey
| ||
Classname |
com.continuent.tungsten.replicator.filter.PrimaryKeyFilter
| ||
Property prefix |
replicator.filter.pkey
| ||
Stage compatibility |
binlog-to-q
| ||
tpm Option compatibility |
--repl-svc-extractor-filters
| ||
Data compatibility | Row events | ||
Keeps Cached Data | Yes | ||
Cached Refreshed When? |
Emptied when going OFFLINE ;
Updated when ALTER statement
seen
| ||
Metadata Updated |
Yes;
tungsten_filter_primarykey=true
| ||
Parameters | |||
Parameter | Type | Default | Description |
user
| string |
${replicator.global.extract.db.user}
| The username for the connection to the database for looking up column definitions |
password
| string |
${replicator.global.extract.db.password}
| The password for the connection to the database for looking up column definitions |
url
| string |
jdbc:mysql:thin://${replicator.global.extract.db.host}:»
${replicator.global.extract.db.port}/${replicator.schema}?createDB=true
| JDBC URL of the database connection to use for looking up column definitions |
addPkeyToInsert
|
boolean
|
false
|
If set to true, primary keys are added to
INSERT operations. This
setting is required for batch loading
|
addColumnsToDeletes
|
boolean
|
false
|
If set to true, full column metadata is added to
DELETE operations. This
setting is required for batch loading
|
custompkey
|
filename
|
(empty)
| If set to the value of a JSON file, the file is used to determine custom primary key specifications in place of database derived versions. See Section 12.4.32.1, “Setting Custom Primary Key Definitions”. |
reconnectTimeout
| integer | 3600 | Timeout for refreshing connection to database |
This filter is designed to be used for testing and with heterogeneous replication where the field name information can be used to construct and build target data structures.
For example, in the following THL fragment, the key information includes
data for all columns, which the is the default behavior for
UPDATE
and
DELETE
operations.
SEQ# = 142 / FRAG# = 0 (last frag) - TIME = 2013-08-01 19:31:04.0 - EPOCH# = 122 - EVENTID = mysql-bin.000012:0000000000022187;0 - SOURCEID = host31 - METADATA = [mysql_server_id=1;dbms_type=mysql;service=alpha;shard=test] - TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent - OPTIONS = [foreign_key_checks = 1, unique_checks = 1] - SQL(0) = - ACTION = UPDATE - SCHEMA = test - TABLE = salesadv - ROW# = 0 - COL(1: id) = 2 - COL(2: country) = 1 - COL(3: city) = 8374 - COL(4: salesman) = 1 - COL(5: value) = 89000.00 - KEY(1: id) = 2 - KEY(2: country) = 1 - KEY(3: city) = 8374 - KEY(4: salesman) = 1 - KEY(5: value) = 89000.00
When the PrimaryKey
filter is enabled, the key information
has been optimized to only contain the actual primary keys in
the row-based THL record:
SEQ# = 142 / FRAG# = 0 (last frag) - TIME = 2013-08-01 19:31:04.0 - EPOCH# = 122 - EVENTID = mysql-bin.000012:0000000000022187;0 - SOURCEID = host31 - METADATA = [mysql_server_id=1;dbms_type=mysql;service=alpha;shard=test] - TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent - OPTIONS = [foreign_key_checks = 1, unique_checks = 1] - SQL(0) = - ACTION = UPDATE - SCHEMA = test - TABLE = salesadv - ROW# = 0 - COL(1: id) = 2 - COL(2: country) = 1 - COL(3: city) = 8374 - COL(4: salesman) = 1 - COL(5: value) = 89000.00 - KEY(1: id) = 2
The final line shows the addition of the primary key
id
added to THL event.
The filter determines primary key information by examining the DDL for
the table, and keeping that information in an internal cache. If the DDL
for a table is not known, or an ALTER
TABLE
statement is identified, the cache information is
updated before the THL is then modified with the primary key
information.
In the situation where you enable the filter, but have not create
primary key information on the tables, it is possible that creating or
adding other index types (such as
UNIQUE
) on a table, could lead to
the incorrect primary key information being updated in the THL,
particularly if there are active transactions taking place during and/or
immediately after the ALTER
statement.
The safest way to perform an index update in case remains the same as for any safe DDL update:
Put the replicator offline
Change the DDL for the table or tables
Put the replicator online
The two options, addPkeyToInsert
and addColumnsToDeletes
add the
primary key information to INSERT
and DELETE
operations
respectively. In a heterogeneous environment, these options should be
enabled to prevent full-table scans during update and deletes.
Custom primary key configuration support available in 6.0.0 or later.
Not all tables and databases set or provide explicit primary key information, and in some cases, it is not possible to change the index definition on the source table to include primary key information. Without primary key information in the THL, replicating data into heterogeneous targets can fail, because there is no way for the target environment to correctly identify the primary key information, and therefore the specific record or records to be updated.
The pkey
filter supports defining
custom columns to make up a primary key in this situation, avoiding the
need to explicitly set index information within the database.
The custom primary key setting works as follows:
When processing a THL entry, if the custom primary key configuration file has been set and exists, and the incoming schema/table name has been defined in the configuraiton file, the custom pkey configuration is used.
Otherwise, the primary key information is taken from the database if it exists.
If neither of these conditions is met, then no primary key data is added to the THL during process.
To configure a custom primary key for one or more tables:
Copy the sample primary key configuration file, located within the
distribution as
tungsten/tungsten-replicator/support/filters-config/custompkey.json
into the share
directory within the
installation. For example,
/opt/continuent/share
.
Update the configuration to include the specific primary key
settings for the incoming table. The format of the file is JSON,
using a structured layout based on the common JSON filter
configuration format (see
Section 12.5, “Standard JSON Filter Configuration”). The sample file
contains an example for the schema
test
and the table
msg
:
{ "__default": { "IGNORE" : "pkey" }, "test" : { "msg" : { "msg" : "pkey" } } }
In the above example, msg
is
the name of the column to be specified as the primary key. The
pkey
indicates that this
column must be a primary key field. You can also specify multiple
columns:
{ "__default": { "IGNORE" : "pkey" }, "test" : { "msg" : { "id" : "pkey", "msg" : "pkey" } } }
To include another table within the same schema:
{ "__default": { "IGNORE" : "pkey" }, "test" : { "msg" : { "msg" : "pkey" }, "orders" : { "orderid" : "pkey" } } }
The __default
section must
remain in place, although it has no impact on processing, it is
used to ensure the file is valid for the filter configuration.
Update the configuration of your installtion through tpm to specify the custom primary key file in the properties:
shell> tpm update alpha --property=replicator.filter.pkey.custompkey=/opt/continuent/share/custompkey.json
Once this process has been completed, the replicator will add the custom primary key fields to the THL during processing. For example:
- SQL(0) = - ACTION = INSERT - SCHEMA = test - TABLE = msg - ROW# = 0 - COL(1: id) = 6 - COL(2: msg) = new test - KEY(1: id) = NULL