5.6.8. Columns in Generated CSV Files

The CSV generated when using the batch loading process creates a number of special columns that are designed to hold the appropriate information for loading the staging data into the target system.

There are four fields supported:

  • opcode — The operation code, a one- or two-letter code indicating the operation type. For more information on the supported codes, see Section 5.6.9, “Batchloading Opcodes”.

  • seqno — Contains the current THL event (sequence) number for the row data being loaded. The sequence number generated is specific to the THL event number.

  • row_id — Contains a unique row ID (a monotonically incrementing number) which is unique to this CSV file for the table data being loaded. This can be useful for systems where the sequence number alone is not enough to identify an incoming row, even with the incoming primary key information.

  • commit_timestamp — the timestamp of when the data was originally committed by the source database, taken from the TIME within the THL event.

  • service — the service name of the replicator service that performed the loading and generated the CSV. This field is not enabled by default, but is provided to allow for data concentration into a BigData target while enabling identification of the source service and/or database that generated the data.

These fields are placed before the actual data for the corresponding table, for example, with the default setting, the following CSV is generated, the last three columns are specific to the table data:

"I","74","1","2017-05-26 13:00:11.000","655337","Dr No","kat"

The configuration of the list of fields, and the order in which they appear, is controlled by the replicator.applier.dbms.stageColumnNames property. By default, all four fields, in the order shown above, are used:

replicator.applier.dbms.stageColumnNames=opcode,seqno,row_id,commit_timestamp

The actual names used (and passed to the JavaScript environment) are also controlled by another property, replicator.applier.dbms.stageColumnPrefix. This value is prepended to each column within the JS environment, and expected by the various tools. For example, with the default tungsten_ the true name for the opcode is tungsten_opcode.

Warning

Modifying the list of fields generated by the CSV writer may stop batchloading from working. Unless otherwise noted, the default batchloading scripts all expect to see the default four columns (opcode, seqno, row_id and commit_timestamp.