11.4.11. dropcolumn.js Filter

The dropcolumn filter enables columns in the THL to be dropped. This can be useful when replicating Personal Identification Information, such as email addresses, phone number, personal identification numbers and others are within the THL but need to be filtered out on the Target.

Pre-configured filter name dropcolumn
JavaScript Filter File tungsten-replicator/support/filters-javascript/dropcolumn.js
Property prefix replicator.filter.dropcolumn
Stage compatibility binlog-to-q, q-to-dbms
tpm Option compatibility --svc-extractor-filters, --svc-applier-filters
Data compatibility Any event
Parameters
Parameter Type Default Description
definitionsFile Filename ~/dropcolumn.json Location of the definitions file for dropping columns
fillGaps Boolean false When true, re-order THL Column Index IDs to sequential numbers

The filter is available by default as dropcolumn, and the filter is configured through a JSON file that defines the list of columns to be dropped. The filter relies on the colnames filter being enabled.

To enable the filter:

shell> tpm update --svc-extractor-filters=colnames,dropcolumn \
    --property=replicator.filter.dropcolumn.definitionsFile=/opt/continuent/share/dropcolumn.json

A sample configuration file is provided in tungsten-replicator/support/filters-config/dropcolumn.json. The format of the file is a JSON array of schema/table/column specifications:

[
    {
        "schema": "vip",
        "table": "clients",
        "columns": [
            "personal_code",
            "birth_date",
            "email"
        ]
    },
    ...
]

Where:

  • schema specifies the name of the schema on which to apply the filtering. If * is given, all schemas are matched.

  • table specifies the name of the table on which to apply the filtering. If * is given, all tables are matched.

  • columns is an array of column names to be matched.

For example:

[
    {
        "schema": "vip",
        "table": "clients",
        "columns": [
            "personal_code",
            "birth_date",
            "email"
        ]
    },
    ...
]

Filters the columns email, birth_date, and personal_code within the clients table in the vip schema.

To filter the telephone column in any table and any schema:

[
    {
        "schema": "*",
        "table": "*",
        "columns": [
            "telephone"
        ]
    }
]

Care should be taken when dropping columns on the Target and Source when the column order is different or when the names of the column differ:

If the filter is enabled on the extractor, the columns will be removed from the THL and the resulting THL Index will appear to have gaps in the THL column index ID, for example:

...
- SQL(0) =
 - ACTION = INSERT
 - SCHEMA = sample
 - TABLE = demotable
 - ROW# = 0
  - COL(1: id) = 11
  - COL(2: col_a) = UK
  - COL(4: col_c) = 5678
  - COL(5: col_d) = ABC
  - COL(7: col_g) = 2019-09-05 07:21:48.0
  - KEY(1: id) = NULL
...

For JDBC targets, this is expected and required to ensure accurate column mapping, however for Batch Apliers the gap in the Index ID's will cause the applier to fail with a CSV Column Mismatch error, therefore, if your target is a Batch target (eg Hadoop, Redshift, Vertica) you need to add the following property to your configuration:

property=replicator.filter.dropcolumn.fillGaps=true

With this property in place, the resulting THL from the above example would now look like the following:

...
- SQL(0) =
 - ACTION = INSERT
 - SCHEMA = sample
 - TABLE = demotable
 - ROW# = 0
  - COL(1: id) = 11
  - COL(2: col_a) = UK
  - COL(3: col_c) = 5678
  - COL(4: col_d) = ABC
  - COL(5: col_g) = 2019-09-05 07:21:48.0
  - KEY(1: id) = NULL
...