dropcolumn.js Filter
The dropcolumn filter enables columns in the THL to be dropped. This can be useful when replicating Personal Identification Information,
such as email addresses, phone number, personal identification numbers and others are within the THL but need to be filtered out on the Target.
| Pre-configured filter name | dropcolumn |
| JavaScript Filter File | tungsten-replicator/support/filters-javascript/dropcolumn.js |
| Property prefix | replicator.filter.dropcolumn |
| Stage compatibility | binlog-to-q, q-to-dbms |
tpm Option compatibility | svc-extractor-filters, svc-applier-filters |
| Data compatibility | Any event |
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
definitionsFile | Filename | ~/dropcolumn.json | Location of the definitions file for dropping columns |
fillGaps | Boolean | false | When true, re-order THL Column Index IDs to sequential numbers |
The filter is available by default as dropcolumn, and the filter is configured through a JSON file that defines the list of columns to
be dropped. The filter relies on the colnames filter being enabled.
To enable the filter, add the following to your configuration and execute tpm update:
svc-extractor-filters=colnames,dropcolumn
property=replicator.filter.dropcolumn.definitionsFile=/opt/continuent/share/dropcolumn.json
A sample configuration file is provided in tungsten-replicator/support/filters-config/dropcolumn.json.
The format of the file is a JSON array of schema/table/column specifications:
[
{
"schema": "vip",
"table": "clients",
"columns": [
"personal_code",
"birth_date",
"email"
]
},
...
]
Where:
schemaspecifies the name of the schema on which to apply the filtering. If*is given, all schemas are matched.tablespecifies the name of the table on which to apply the filtering. If*is given, all tables are matched.columnsis an array of column names to be matched.
For example:
[
{
"schema": "vip",
"table": "clients",
"columns": [
"personal_code",
"birth_date",
"email"
]
},
...
]
Filters the columns email, birth_date, and personal_code within the clients table in the
vip schema.
To filter the telephone column in any table and any schema:
[
{
"schema": "*",
"table": "*",
"columns": [
"telephone"
]
}
]
Care should be taken when dropping columns on the Target and Source when the column order is different or when the names of the column differ:
- If the column order is same, even if dropcolumn.js is used, leave the default setting for the property
replicator.applier.dbms.getColumnMetadataFromDB=true. - If the column order is different on the Source and Target, set
replicator.applier.dbms.getColumnMetadataFromDB=false - If the replica's column names are different, regardless of differences in the order, use the default property setting
replicator.applier.dbms.getColumnMetadataFromDB=true
If the filter is enabled on the extractor, the columns will be removed from the THL and the resulting THL Index will appear to have gaps in the THL column index ID, for example:
...
- SQL(0) =
- ACTION = INSERT
- SCHEMA = sample
- TABLE = demotable
- ROW# = 0
- COL(1: id) = 11
- COL(2: col_a) = UK
- COL(4: col_c) = 5678
- COL(5: col_d) = ABC
- COL(7: col_g) = 2019-09-05 07:21:48.0
- KEY(1: id) = NULL
...
For JDBC targets, this is expected and required to ensure accurate column mapping, however for Batch Appliers the gap in the Index ID's will cause the applier to fail with a CSV Column Mismatch error, therefore, if your target is a Batch target (eg Hadoop, Redshift, Vertica) you need to add the following property to your configuration:
property=replicator.filter.dropcolumn.fillGaps=true
With this property in place, the resulting THL from the above example would now look like the following:
...
- SQL(0) =
- ACTION = INSERT
- SCHEMA = sample
- TABLE = demotable
- ROW# = 0
- COL(1: id) = 11
- COL(2: col_a) = UK
- COL(3: col_c) = 5678
- COL(4: col_d) = ABC
- COL(5: col_g) = 2019-09-05 07:21:48.0
- KEY(1: id) = NULL
...