8.6.2. Supported Templates and Usage

Table 8.5. ddlscan Supported Templates

fileDescription
ddl-check-pkeys.vmReports which tables are without primary key definitions
ddl-mysql-hive-0.10.vmGenerates DDL from a MySQL host suitable for the base tables in a Hadoop/Hive Environment
ddl-mysql-hive-0.10-staging.vmGenerates DDL from a MySQL host suitable for the staging tables in a Hadoop/Hive Environment
ddl-mysql-hive-metadata.vmGenerates metadata as JSON to be used within a Hadoop/Hive Environment
ddl-mysql-oracle.vmGenerates Oracle schema from a MySQL schema
ddl-mysql-oracle-cdc.vmGenerates Oracle tables with CDC capture information from a MySQL schema
ddl-mysql-redshift.vmGenerates DDL from a MySQL host suitable for the base tables in Amazon Redshift.
ddl-mysql-redshift-staging.vmGenerates DDL from a MySQL host suitable for the staging tables in Amazon Redshift.
ddl-mysql-vertica.vmGenerates DDL suitable for the base tables in HP Vertica
ddl-mysql-vertica-staging.vmGenerates DDL suitable for the staging tables in HP Vertica
ddl-oracle-mysql.vmGenerates DDL for MySQL tables from an Oracle schema
ddl-oracle-mysql-pk-only.vmGenerates Primary Key DDL statements from an Oracle database for MySQL

8.6.2.1. ddl-check-pkeys.vm

The ddl-check-pkeys.vm template can be used to check whether specific tables within a schema do not have a primary key:

shell> ddlscan -template ddl-check-pkeys.vm \
 -user tungsten -pass password -db sales \
 -url jdbc:mysql://localhost:13306/sales
/*
SQL generated on Thu Sep 04 10:23:52 BST 2014 by ./ddlscan utility of Tungsten

url = jdbc:mysql://localhost:13306/sales
user = tungsten
dbName = sales
*/

/* ERROR: sales.dummy1 has no primary key! *//*
SQL generated on Thu Sep 04 10:23:52 BST 2014 by ./ddlscan utility of Tungsten

url = jdbc:mysql://localhost:13306/sales
user = tungsten
dbName = sales
*/

/* ERROR: sales.dummy2 has no primary key! *//*
SQL generated on Thu Sep 04 10:23:52 BST 2014 by ./ddlscan utility of Tungsten

url = jdbc:mysql://localhost:13306/sales
user = tungsten
dbName = sales
*/

For certain environments, particularly heterogeneous replication, the lack of primary keys can lead to inefficient replication, or even fail to replicate data at all.

8.6.2.2. ddl-mysql-hive-0.10.vm

Generates DDL suitable for a carbon-copy form of the table from the MySQL host:

shell> ddlscan -user tungsten -url 'jdbc:mysql://tr-hadoop1:13306/test' -pass password \
 -template ddl-mysql-hive-0.10.vm -db test
--
-- SQL generated on Thu Sep 11 12:57:11 BST 2014 by Tungsten ddlscan utility
--
-- url = jdbc:mysql://tr-hadoop1:13306/test
-- user = tungsten
-- dbName = test
--


DROP TABLE IF EXISTS test.sales;

CREATE TABLE test.sales
(
 id INT,
 salesman STRING,
 planet STRING,
 value DOUBLE )
;

Wherever possible, the closest Hive equivalent datatype is used for each source datatype, as follows:

The template supports the following optional parameters to change behavior:

  • -opt schemaPrefix

    A prefix to be placed in front of all schemas. For example, if called with schemaPrefix set to mysql_:

    shell> ddlscan ... -opt schemaPrefix mysql_

    The schema name will be prefixed, translating the schema name from sales into mysql_sales.

  • -opt tablePrefix

    A prefix to be placed in front of all schemas. For example, if called with tablePrefix set to mysql_:

    shell> ddlscan ... -opt tablePrefix mysql_

    The table name will be prefixed, translating the tablename from sales into mysql_sales.

8.6.2.3. ddl-mysql-hive-0.10-staging.vm

Staging tables within Hive define the original table columns with additional columns to track the operation type, sequence number, timestamp and unique key for each row. For example, the table sales in MySQL:

mysql> describe sales;
+----------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| salesman | char(20) | YES | | NULL | |
| planet | char(20) | YES | | NULL | |
| value | float | YES | | NULL | |
+----------+----------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

Generates the following Hive-compatible DDL when using this template:

shell> ddlscan -user tungsten -url 'jdbc:mysql://tr-hadoop1:13306/test' -pass password \
 -template ddl-mysql-hive-0.10-staging.vm -db test
--
-- SQL generated on Thu Sep 11 12:31:45 BST 2014 by Tungsten ddlscan utility
--
-- url = jdbc:mysql://tr-hadoop1:13306/test
-- user = tungsten
-- dbName = test
--


DROP TABLE IF EXISTS test.stage_xxx_sales;

CREATE EXTERNAL TABLE test.stage_xxx_sales
(
 tungsten_opcode STRING ,
 tungsten_seqno INT ,
 tungsten_row_id INT ,
 tungsten_commit_timestamp TIMESTAMP ,
 id INT,
 salesman STRING,
 planet STRING,
 value DOUBLE )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' ESCAPED BY '\\'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION '/user/tungsten/staging/test/sales' ;

Wherever possible, the closest Hive equivalent datatype is used for each source datatype, see ddl-mysql-hive-0.10.vm for more information.

8.6.2.4. ddl-mysql-hive-metadata.vm

The Hadoop tools require information about the schema in JSON format so that the table names and primary key information can be used when materializing data from the staging tables into the base tables. This template generates that information in JSON format:

shell> ddlscan -user tungsten -url 'jdbc:mysql://tr-hadoop1:13306/test' -pass password \
 -template ddl-mysql-hive-metadata.vm -db test


{
 "tables": [
 {
 "schema": "test",
 "name": "sales",
 "keys": ["id"],
 "columns": [
 {"name": "id", "type": "INT"},
 {"name": "salesman", "type": "STRING"},
 {"name": "planet", "type": "STRING"},
 {"name": "value", "type": "DOUBLE"}
 ]
 }
 ]
}

8.6.2.5. ddl-mysql-oracle.vm

When translating MySQL tables to Oracle compatible schema, the following datatypes are migrated to their closest Oracle equivalent:

The following additional transformations happen automatically:

  • Table names are translated to uppercase.

  • Column names are translated to uppercase.

  • If a column name is a reserved word in Oracle, then the column name has an underscore character appended (for example, TABLE becomes TABLE_).

In addition to the above translations, errors will be raised for the following conditions:

  • If the table name starts with a number.

  • If the table name exceeds 30 characters in length.

  • If the table name is a reserved word in Oracle.

Warnings will be raised for the following conditions:

  • If the column or column name started with a number.

  • If the column name exceeds 30 characters in length, the column name will be truncated.

  • If the column name is a reserved word in Oracle.

8.6.2.6. ddl-mysql-oracle-cdc.vm

The ddl-mysql-oracle-cdc.vm template generates identical tables in Oracle, from their MySQL equivalent, but with additional columns for CDC capture. For example:

shell> ddlscan -user tungsten -url 'jdbc:mysql://tr-hadoop1:13306/test' -pass password \
 -template ddl-mysql-oracle-cdc.vm -db test
/*
SQL generated on Thu Sep 11 13:17:05 BST 2014 by ./ddlscan utility of Tungsten

url = jdbc:mysql://tr-hadoop1:13306/test
user = tungsten
dbName = test
*/

DROP TABLE test.sales;
CREATE TABLE test.sales
(
 id NUMBER(10, 0) NOT NULL,
 salesman CHAR,
 planet CHAR,
 value FLOAT,
 CDC_OP_TYPE VARCHAR(1), /* CDC column */
 CDC_TIMESTAMP TIMESTAMP, /* CDC column */
 CDC_SEQUENCE_NUMBER NUMBER PRIMARY KEY /* CDC column */
);

For information on the datatypes translated, see ddl-mysql-oracle.vm.

8.6.2.7. ddl-mysql-redshift.vm

The ddl-mysql-redshift.vm template generates DDL for Amazon Redshift tables from MySQL schemas. For example:

CREATE TABLE test.all_mysql_types
(
 my_id INT,
 my_bit BOOLEAN /* BIT(1) */,
 my_tinyint SMALLINT /* TINYINT(4) */,
 my_boolean SMALLINT /* TINYINT(1) */,
 my_smallint SMALLINT,
 my_mediumint INT /* MEDIUMINT(9) */,
 my_int INT,
 my_bigint BIGINT,
 my_decimal_10_5 DECIMAL(10,5),
 my_float FLOAT,
 my_double DOUBLE PRECISION /* DOUBLE */,
 my_date DATE,
 my_datetime DATETIME,
 my_timestamp TIMESTAMP,
 my_time VARCHAR(17) /* WARN: no pure TIME type in Redshift */,
 my_year YEAR(4) /* ERROR: unrecognized (type=0, length=0) */,
 my_char_10 CHAR(10),
 my_varchar_10 VARCHAR(40) /* VARCHAR(10) */,
 my_tinytext VARCHAR(65535) /* WARN: MySQL TINYTEXT translated to max VARCHAR */,
 my_text VARCHAR(65535) /* WARN: MySQL TEXT translated to max VARCHAR */,
 my_mediumtext VARCHAR(65535) /* WARN: MySQL MEDIUMTEXT translated to max VARCHAR */,
 my_longtext VARCHAR(65535) /* WARN: MySQL LONGTEXT translated to max VARCHAR */,
 my_enum_abc VARCHAR(1) /* ENUM('A','B','C') */,
 my_set_def VARCHAR(65535) /* SET('D','E','F') */,
 PRIMARY KEY (my_id)
);

Columns are translated as follows:

Oracle Datatype Redshift Datatype
BIGINT BIGINT
BINARY BINARY, CHAR in 5.2.1 and later
BIT(1) BOOLEAN
BIT CHAR
BLOB VARBINARY VARCHAR in 5.2.1 and later
CHAR CHAR
DATE DATE
DATETIME DATETIME
DECIMAL DECIMAL
DOUBLE DOUBLE PRECISION
ENUM VARCHAR
FLOAT FLOAT
INT INT
LONGBLOB VARBINARY CHAR in 5.2.1 and later
LONGTEXT VARCHAR
MEDIUMBLOB VARBINARY CHAR in 5.2.1 and later
MEDIUMINT INT
MEDIUMTEXT VARCHAR
SET VARCHAR
SMALLINT SMALLINT
TEXT VARCHAR
TIME VARCHAR
TIMESTAMP TIMESTAMP
TINYBLOB VARBINARY CHAR in 5.2.1 and later
TINYINT SMALLINT
TINYTEXT VARCHAR
VARBINARY VARBINARY CHAR in 5.2.1 and later
VARCHAR VARCHAR

In addition to these explicit changes, the following other considerations are taken into account:

  • When translating the DDL for CHAR and VARCHAR columns, the actual column size is increased by a factor of four. This is because Redshift tables always stored data using 32-bit UTF characters and column sizes are in bytes, not characters. Therefore a CHAR(20) column is created as CHAR(80) within Redshift.

  • TEXT columns are converted to a Redshift VARCHAR of 65535 in length (the maximum allowed).

  • BLOB columns are converted to a Redshift VARBINARY of 65000 in length (the maximum allowed).

  • BIT columns with a size of 1 are converted to Redshift BOOLEAN columns, larger sizes are converted to CHAR columns of 64 bytes in length.

  • TIME columns are converted to a Redshift VARCHAR of 17 bytes in length since no explicit TIME type exists.

8.6.2.8. ddl-mysql-redshift-staging.vm

The ddl-mysql-redshift-staging.vm template generates DDL for Amazon Redshift tables from MySQL schemas. For example:

CREATE TABLE test.stage_xxx_all_mysql_types
(
 tungsten_opcode CHAR(2),
 tungsten_seqno INT,
 tungsten_row_id INT,
 tungsten_commit_timestamp TIMESTAMP,
 my_id INT,
 my_bit BOOLEAN /* BIT(1) */,
 my_tinyint SMALLINT /* TINYINT(4) */,
 my_boolean SMALLINT /* TINYINT(1) */,
 my_smallint SMALLINT,
 my_mediumint INT /* MEDIUMINT(9) */,
 my_int INT,
 my_bigint BIGINT,
 my_decimal_10_5 DECIMAL(10,5),
 my_float FLOAT,
 my_double DOUBLE PRECISION /* DOUBLE */,
 my_date DATE,
 my_datetime DATETIME,
 my_timestamp TIMESTAMP,
 my_time VARCHAR(17) /* WARN: no pure TIME type in Redshift */,
 my_year YEAR(4) /* ERROR: unrecognized (type=0, length=0) */,
 my_char_10 CHAR(10),
 my_varchar_10 VARCHAR(40) /* VARCHAR(10) */,
 my_tinytext VARCHAR(65535) /* WARN: MySQL TINYTEXT translated to max VARCHAR */,
 my_text VARCHAR(65535) /* WARN: MySQL TEXT translated to max VARCHAR */,
 my_mediumtext VARCHAR(65535) /* WARN: MySQL MEDIUMTEXT translated to max VARCHAR */,
 my_longtext VARCHAR(65535) /* WARN: MySQL LONGTEXT translated to max VARCHAR */,
 my_enum_abc VARCHAR(1) /* ENUM('A','B','C') */,
 my_set_def VARCHAR(65535) /* SET('D','E','F') */,
 PRIMARY KEY (tungsten_opcode, tungsten_seqno, tungsten_row_id)
);

The actual translation of datatypes is identical to that found in ddl-mysql-redshift.vm.

8.6.2.9. ddl-mysql-vertica.vm

The ddl-mysql-vertica.vm template generates DDL for generating tables within an HP Vertica database from an existing MySQL database schema. For example:

shell> ddlscan -user tungsten -url 'jdbc:mysql://tr-hadoop1:13306/test' -pass password \
 -template ddl-mysql-vertica.vm -db test
/*
SQL generated on Thu Sep 11 14:20:14 BST 2014 by ./ddlscan utility of Tungsten

url = jdbc:mysql://tr-hadoop1:13306/test
user = tungsten
dbName = test
*/
CREATE SCHEMA test;

DROP TABLE test.sales;

CREATE TABLE test.sales
(
 id INT ,
 salesman CHAR(20) ,
 planet CHAR(20) ,
 value FLOAT ) ORDER BY id;

Because Vertica does not explicitly support primary keys, a default projection for the key order is created based on the primary key of the source table.

The templates translates different datatypes as follows:

In addition, the following considerations should be taken into account:

  • DECIMAL MySQL type is not supported.

  • TEXT types in MySQL are converted to a VARCHAR in Vertica of the maximum supported size.

  • BLOB types in MySQL are converted to a VARBINARY in Vertica of the maximum supported size.

  • SET types in MySQL are converted to a VARCHAR in Vertica of 4000 characters, designed to work in tandem with the settostring filter.

  • ENUM types in MySQL are converted to a VARCHAR in Vertica of the size of the longest ENUM value, designed to work in tandem with the enumtostring filter.

8.6.2.10. ddl-mysql-vertica-staging.vm

The ddl-mysql-vertica-staging.vm template generates DDL for HP Vertica staging tables. These include the full table definition, in addition to three columns used to define the staging data, including the operation code, sequence number and unique row ID. For example:

shell> ddlscan -user tungsten -url 'jdbc:mysql://tr-hadoop1:13306/test' -pass password \
 -template ddl-mysql-vertica-staging.vm -db test
/*
SQL generated on Thu Sep 11 14:22:06 BST 2014 by ./ddlscan utility of Tungsten

url = jdbc:mysql://tr-hadoop1:13306/test
user = tungsten
dbName = test
*/
CREATE SCHEMA test;

DROP TABLE test.stage_xxx_sales;

CREATE TABLE test.stage_xxx_sales
(
 tungsten_opcode CHAR(1) ,
 tungsten_seqno INT ,
 tungsten_row_id INT ,
 id INT ,
 salesman CHAR(20) ,
 planet CHAR(20) ,
 value FLOAT ) ORDER BY tungsten_seqno, tungsten_row_id;

8.6.2.11. ddl-oracle-mysql.vm

The ddl-oracle-mysql.vm template generates the DDL required to create a schema within MySQL based on the existing Oracle schema. For example:

shell> ddlscan -service sales -template ddl-oracle-mysql.vm -db sales
/*
SQL generated on Thu Sep 11 04:29:08 PDT 2014 by ./ddlscan utility of Tungsten

url = jdbc:oracle:thin:@//tr-fromoracle1:1521/ORCL
user = SALES_PUB
dbName = sales
*/
/* ERROR: no tables found! Is database and tables option specified correctly? */

[tungsten@tr-fromoracle1 ~]$ ddlscan -service sales -template ddl-oracle-mysql.vm -db SALES
/*
SQL generated on Thu Sep 11 04:29:16 PDT 2014 by ./ddlscan utility of Tungsten

url = jdbc:oracle:thin:@//tr-fromoracle1:1521/ORCL
user = SALES_PUB
dbName = SALES
*/

DROP TABLE IF EXISTS sales.sample;
CREATE TABLE sales.sample
(
 id DECIMAL(38) /* NUMBER(38, ?) */ NOT NULL,
 msg CHAR(80),
 PRIMARY KEY (id)
) ENG

Columns are translated as follows:

The following additional transformations happen automatically:

  • If a column name is a reserved word in MySQL, then the column name has an underscore character appended (for example, TABLE becomes TABLE_).

An error is raised in the following conditions:

  • If the size of a FLOAT is larger than 53 points of precision.

8.6.2.12. ddl-oracle-mysql-pk-only.vm

The ddl-oracle-mysql-pk-only.vm template generates alter table statements to add the primary key, as determined from the Oracle primary key or index information. For example:

shell> ddlscan -service hadoop -template ddl-oracle-mysql-pk-only.vm -db HADOOP
/*
SQL generated on Thu Sep 11 06:17:28 PDT 2014 by ./ddlscan utility of Tungsten

url = jdbc:oracle:thin:@//tr-fromoracle1:1521/ORCL
user = HADOOP_PUB
dbName = HADOOP
*/

ALTER TABLE hadoop.sample ADD PRIMARY KEY (ID);

Note that it does not generate table DDL, only statements to alter existing tables with primary key information.