The dropcolumn
filter enables
columns in the THL to be dropped. This can be useful when replicating
Personal Identification Information, such as email addresses, phone
number, personal identification numbers and others are within the THL but
need to be filtered out on the Target.
Pre-configured filter name |
dropcolumn
| ||
JavaScript Filter File |
tungsten-replicator/support/filters-javascript/dropcolumn.js
| ||
Property prefix |
replicator.filter.dropcolumn
| ||
Stage compatibility |
binlog-to-q ,
q-to-dbms
| ||
tpm Option compatibility |
--svc-extractor-filters ,
--svc-applier-filters
| ||
Data compatibility | Any event | ||
Parameters | |||
Parameter | Type | Default | Description |
definitionsFile
| Filename |
~/dropcolumn.json
| Location of the definitions file for dropping columns |
fillGaps
| Boolean |
false
| When true, re-order THL Column Index IDs to sequential numbers |
The filter is available by default as
dropcolumn
, and the filter is
configured through a JSON file that defines the list of columns to be
dropped. The filter relies on the
colnames filter being
enabled.
To enable the filter:
shell> tpm update --svc-extractor-filters=colnames,dropcolumn \
--property=replicator.filter.dropcolumn.definitionsFile=/opt/continuent/share/dropcolumn.json
A sample configuration file is provided in
tungsten-replicator/support/filters-config/dropcolumn.json
.
The format of the file is a JSON array of schema/table/column
specifications:
[ { "schema": "vip", "table": "clients", "columns": [ "personal_code", "birth_date", "email" ] }, ... ]
Where:
schema
specifies the name of the
schema on which to apply the filtering. If
*
is given, all schemas are
matched.
table
specifies the name of the
table on which to apply the filtering. If
*
is given, all tables are
matched.
columns
is an array of column
names to be matched.
For example:
[ { "schema": "vip", "table": "clients", "columns": [ "personal_code", "birth_date", "email" ] }, ... ]
Filters the columns email
,
birth_date
, and
personal_code
within the
clients
table in the
vip
schema.
To filter the telephone
column in
any table and any schema:
[ { "schema": "*", "table": "*", "columns": [ "telephone" ] } ]
Care should be taken when dropping columns on the Target and Source when the column order is different or when the names of the column differ:
If the column order is same, even if dropcolumn.js is used, leave the
default setting for the property
replicator.applier.dbms.getColumnMetadataFromDB=true
.
If the column order is different on the Source and Target, set
replicator.applier.dbms.getColumnMetadataFromDB=false
If slave's column names are different, regardless of differences in
the order, use the default property setting
replicator.applier.dbms.getColumnMetadataFromDB=true
If the filter is enabled on the extractor, the columns will be removed from the THL and the resulting THL Index will appear to have gaps in the THL column index ID, for example:
... - SQL(0) = - ACTION = INSERT - SCHEMA = sample - TABLE = demotable - ROW# = 0 - COL(1: id) = 11 - COL(2: col_a) = UK - COL(4: col_c) = 5678 - COL(5: col_d) = ABC - COL(7: col_g) = 2019-09-05 07:21:48.0 - KEY(1: id) = NULL ...
For JDBC targets, this is expected and required to ensure accurate column mapping, however for Batch Apliers the gap in the Index ID's will cause the applier to fail with a CSV Column Mismatch error, therefore, if your target is a Batch target (eg Hadoop, Redshift, Vertica) you need to add the following property to your configuration:
property=replicator.filter.dropcolumn.fillGaps=true
With this property in place, the resulting THL from the above example would now look like the following:
... - SQL(0) = - ACTION = INSERT - SCHEMA = sample - TABLE = demotable - ROW# = 0 - COL(1: id) = 11 - COL(2: col_a) = UK - COL(3: col_c) = 5678 - COL(4: col_d) = ABC - COL(5: col_g) = 2019-09-05 07:21:48.0 - KEY(1: id) = NULL ...