S3 Replication Operation
The S3 applier makes use of the JavaScript based batch loading system (see "JavaScript Batchloader Scripts"). This constructs change data from the source-database. The change data is then written into csv files, optionally compressed, and loaded into an S3 bucket.
The full replication of information operates as follows:
Data is extracted from the source database using the standard extractor, for example by reading the row change data from the binlog in MySQL.
The
ColumnNameFilterfilter is used to extract column name information from the database. This enables the row-change information to be tagged with the corresponding column information. The data changes, and corresponding row names, are stored in the THL.The
PrimaryKeyFilterfilter is used to extract primary key data from the source tables.On the Applier replicator, the THL data is read and written into batch-files in the character-separated value format.
The information in these files is change data, and contains not only the original row values from the source tables, but also metadata about the operation performed (i.e.
INSERT,DELETEorUPDATE, and the primary key of for each table. AllUPDATEstatements are recorded as aDELETEof the existing data, and anINSERTof the new data.In addition to these core operation types, the batch applier can also be configured to record
UPDATEoperations that result inINSERTorDELETErows.
The staging files created by the replicator are in a specific format that incorporates change and operation information in addition to the original row data.
- The format of the files is a character separated values file, with each row separated by a newline, and individual fields separated by the
character
0x01. - The content of the file consists of the full row data extracted from the Source, plus metadata describing the operation for each row, the sequence number, and then the full row information.
| Operation | Sequence No | Table-specific primary key | DateTime | Table-columns... |
|---|---|---|---|---|
| OPTYPE | SEQNO that generated this row | PRIMARYKEY | DATATIME of source table commit |
The operation field will match one of the following values
| Operation | Description | Notes |
|---|---|---|
| I | Row is an INSERT of new data | |
| D | Row is DELETE of existing data | |
| UI | Row is an UPDATE which caused INSERT of data | |
| UD | Row is an UPDATE which caused DELETE of data |
For example, the MySQL row from an INSERT of:
| 3 | #1 Single | 2006 | Cats and Dogs (#1.4) |
Is represented within the CSV files generated as:
"I","5","3","2014-07-31 14:29:17.000","3","#1 Single","2006","Cats and Dogs (#1.4)"
The character separator, and whether to use quoting, are configurable within the replicator when it is deployed. For S3, the default behavior is to generate quoted and comma separated fields.
As the target for the Amazon S3 Applier is not a relational database in the sense of traditional Tungsten replication, the replicator stores its apply position as a JSON structure on the local filesystem.
This allows the replicator to know its starting position in the case of a restart.
The file is located in the following directory: /opt/continuent/metadata/applier/serviceName and is called commitseqno.0
The contents of the file will look something like the following, and should NOT be edited unless advised to do so by Continuent Support
{
"sourceId" : "ext01",
"epochNumber" : "0",
"fragno" : "0",
"eventId" : "mysql-bin.000002:0000000000134613;27",
"seqno" : "427",
"lastFrag" : "true",
"extractedTstamp" : "1687439308000",
"appliedLatency" : "0",
"shardId" : "demo"
}