Amazon Redshift DDL Generation for Amazon Redshift Deployments
In order for the data to be written into the Redshift tables, the tables must be generated. Tungsten Replicator does not replicate the
DDL statements between the source and applier between heterogeneous deployments due to differences in the format of the DDL statements. The
supplied ddlscan tool can translate the DDL from the source database into suitable DDL for the target database.
For each database being replicated, DDL must be generated twice, once for the staging tables where the change data is loaded, and again for the live tables. To generate the necessary DDL:
To generate the staging table DDL,
ddlscanmust be executed on the Extractor host. After the replicator has been installed, theddlscantool can automatically pick up the configuration to connect to the host, or it can be specified on the command line.On the source host for each database that is being replicated, run
ddlscanusing theddl-mysql-redshift-staging.vmshell> ddlscan -db test -template ddl-mysql-redshift-staging.vmDROP TABLE stage_xxx_test.stage_xxx_msg;CREATE TABLE stage_xxx_test.stage_xxx_msg(tungsten_opcode CHAR(2),tungsten_seqno INT,tungsten_row_id INT,tungsten_commit_timestamp TIMESTAMP,id INT,msg CHAR(80),PRIMARY KEY (tungsten_opcode, tungsten_seqno, tungsten_row_id));Check the output to ensure that no errors have been generated during the process. These may indicate datatype limitations that should be identified before continuing. The generated output should be captured and then executed on the Redshift host to create the table.
Once the staging tables have been created, execute
ddlscanagain using the base table template,ddl-mysql-redshift.vmshell> ddlscan -db test -template ddl-mysql-redshift.vmDROP TABLE test.msg;CREATE TABLE test.msg(id INT,msg CHAR(80),PRIMARY KEY (id));Once again, check the output for errors, then capture the output and execute the generated DDL against the Redshift instance.
The DDL templates translate datatypes as directly as possible, with the following caveats:
The length of MySQL
VARCHARlength is quadrupled, because MySQL counts characters, while Redshift counts bytes.There is no
TIMEdatatype in Redshift, instead,TIMEcolumns are converted toVARCHAR(17).Primary keys from MySQL are applied into Redshift where possible.
Once the DDL has been generated within the Redshift instance, the replicator will be ready to be installed.