Common Reference

dropcolumn.js Filter

The dropcolumn filter enables columns in the THL to be dropped. This can be useful when replicating Personal Identification Information, such as email addresses, phone number, personal identification numbers and others are within the THL but need to be filtered out on the Target.


Pre-configured filter name	dropcolumn
JavaScript Filter File	`tungsten-replicator/support/filters-javascript/dropcolumn.js`
Property prefix	`replicator.filter.dropcolumn`
Stage compatibility	`binlog-to-q`, `q-to-dbms`
`tpm` Option compatibility	`svc-extractor-filters`, `svc-applier-filters`
Data compatibility	Any event

Parameters

Parameter	Type	Default	Description
`definitionsFile`	Filename	`~/dropcolumn.json`	Location of the definitions file for dropping columns
`fillGaps`	Boolean	false	When true, re-order THL Column Index IDs to sequential numbers

The filter is available by default as dropcolumn, and the filter is configured through a JSON file that defines the list of columns to be dropped. The filter relies on the colnames filter being enabled.

To enable the filter, add the following to your configuration and execute tpm update:

svc-extractor-filters=colnames,dropcolumn
property=replicator.filter.dropcolumn.definitionsFile=/opt/continuent/share/dropcolumn.json

A sample configuration file is provided in tungsten-replicator/support/filters-config/dropcolumn.json.

The format of the file is a JSON array of schema/table/column specifications:

[
  {
      "schema": "vip",
      "table": "clients",
      "columns": [
          "personal_code",
          "birth_date",
          "email"
      ]
  },
  ...
]

Where:

schema specifies the name of the schema on which to apply the filtering. If * is given, all schemas are matched.
table specifies the name of the table on which to apply the filtering. If * is given, all tables are matched.
columns is an array of column names to be matched.

For example:

[
  {
      "schema": "vip",
      "table": "clients",
      "columns": [
          "personal_code",
          "birth_date",
          "email"
      ]
  },
  ...
]

Filters the columns email, birth_date, and personal_code within the clients table in the vip schema.

To filter the telephone column in any table and any schema:

[
  {
      "schema": "*",
      "table": "*",
      "columns": [
          "telephone"
      ]
  }
]

Care should be taken when dropping columns on the Target and Source when the column order is different or when the names of the column differ:

If the column order is same, even if dropcolumn.js is used, leave the default setting for the property replicator.applier.dbms.getColumnMetadataFromDB=true.
If the column order is different on the Source and Target, set replicator.applier.dbms.getColumnMetadataFromDB=false
If the replica's column names are different, regardless of differences in the order, use the default property setting replicator.applier.dbms.getColumnMetadataFromDB=true

If the filter is enabled on the extractor, the columns will be removed from the THL and the resulting THL Index will appear to have gaps in the THL column index ID, for example:

...
- SQL(0) =
- ACTION = INSERT
- SCHEMA = sample
- TABLE = demotable
- ROW# = 0
- COL(1: id) = 11
- COL(2: col_a) = UK
- COL(4: col_c) = 5678
- COL(5: col_d) = ABC
- COL(7: col_g) = 2019-09-05 07:21:48.0
- KEY(1: id) = NULL
...

For JDBC targets, this is expected and required to ensure accurate column mapping, however for Batch Appliers the gap in the Index ID's will cause the applier to fail with a CSV Column Mismatch error, therefore, if your target is a Batch target (eg Hadoop, Redshift, Vertica) you need to add the following property to your configuration:

property=replicator.filter.dropcolumn.fillGaps=true

With this property in place, the resulting THL from the above example would now look like the following:

...
- SQL(0) =
- ACTION = INSERT
- SCHEMA = sample
- TABLE = demotable
- ROW# = 0
- COL(1: id) = 11
- COL(2: col_a) = UK
- COL(3: col_c) = 5678
- COL(4: col_d) = ABC
- COL(5: col_g) = 2019-09-05 07:21:48.0
- KEY(1: id) = NULL
...