The CSV generated when using the batch loading process creates a number of special columns that are designed to hold the appropriate information for loading the staging data into the target system.
There are four fields supported:
opcode
— The operation
code, a one- or two-letter code indicating the operation type. For
more information on the supported codes, see
Section 5.6.9, “Batchloading Opcodes”.
seqno
— Contains the
current THL event (sequence) number for the row data being loaded. The
sequence number generated is specific to the THL event number.
row_id
— Contains a
unique row ID (a monotonically incrementing number) which is unique to
this CSV file for the table data being loaded. This can be useful for
systems where the sequence number alone is not enough to identify an
incoming row, even with the incoming primary key information.
commit_timestamp
— the
timestamp of when the data was originally committed by the source
database, taken from the TIME
within the THL
event.
service
— the service
name of the replicator service that performed the loading and
generated the CSV. This field is not enabled by default, but is
provided to allow for data concentration into a BigData target while
enabling identification of the source service and/or database that
generated the data.
These fields are placed before the actual data for the corresponding table, for example, with the default setting, the following CSV is generated, the last three columns are specific to the table data:
"I","74","1","2017-05-26 13:00:11.000","655337","Dr No","kat"
The configuration of the list of fields, and the order in which they appear, is controlled by the replicator.applier.dbms.stageColumnNames property. By default, all four fields, in the order shown above, are used:
replicator.applier.dbms.stageColumnNames=opcode,seqno,row_id,commit_timestamp
The actual names used (and passed to the JavaScript environment) are also
controlled by another property,
replicator.applier.dbms.stageColumnPrefix.
This value is prepended to each column within the JS environment, and
expected by the various tools. For example, with the default
tungsten_
the true name for the
opcode
is
tungsten_opcode
.
Modifying the list of fields generated by the CSV writer may stop
batchloading from working. Unless otherwise noted, the default
batchloading scripts all expect to see the default four columns
(opcode
,
seqno
,
row_id
and
commit_timestamp
.