ddl-check-pkeys.vm
ddl-mysql-hive-0.10.vm
ddl-mysql-hive-0.10-staging.vm
ddl-mysql-hive-metadata.vm
ddl-mysql-oracle.vm
ddl-mysql-oracle-cdc.vm
ddl-mysql-redshift.vm
ddl-mysql-redshift-staging.vm
ddl-mysql-vertica.vm
ddl-mysql-vertica-staging.vm
ddl-oracle-mysql.vm
ddl-oracle-mysql-pk-only.vm
Table 9.22. ddlscan Supported Templates
file | Description |
---|---|
ddl-check-pkeys.vm | Reports which tables are without primary key definitions |
ddl-mysql-hive-0.10.vm | Generates DDL from a MySQL host suitable for the base tables in a Hadoop/Hive Environment |
ddl-mysql-hive-0.10-staging.vm | Generates DDL from a MySQL host suitable for the staging tables in a Hadoop/Hive Environment |
ddl-mysql-hive-metadata.vm | Generates metadata as JSON to be used within a Hadoop/Hive Environment |
ddl-mysql-oracle.vm | Generates Oracle schema from a MySQL schema |
ddl-mysql-oracle-cdc.vm | Generates Oracle tables with CDC capture information from a MySQL schema |
ddl-mysql-redshift.vm | Generates DDL from a MySQL host suitable for the base tables in Amazon Redshift. |
ddl-mysql-redshift-staging.vm | Generates DDL from a MySQL host suitable for the staging tables in Amazon Redshift. |
ddl-mysql-vertica.vm | Generates DDL suitable for the base tables in HP Vertica |
ddl-mysql-vertica-staging.vm | Generates DDL suitable for the staging tables in HP Vertica |
ddl-oracle-mysql.vm | Generates DDL for MySQL tables from an Oracle schema |
ddl-oracle-mysql-pk-only.vm | Generates Primary Key DDL statements from an Oracle database for MySQL |
The ddl-check-pkeys.vm
template can be used to check whether specific tables within a
schema do not have a primary key:
shell> ddlscan -template ddl-check-pkeys.vm \
-user tungsten -pass password -db sales \
-url jdbc:mysql://localhost:13306/sales
/*
SQL generated on Thu Sep 04 10:23:52 BST 2014 by ./ddlscan utility of Tungsten
url = jdbc:mysql://localhost:13306/sales
user = tungsten
dbName = sales
*/
/* ERROR: sales.dummy1 has no primary key! *//*
SQL generated on Thu Sep 04 10:23:52 BST 2014 by ./ddlscan utility of Tungsten
url = jdbc:mysql://localhost:13306/sales
user = tungsten
dbName = sales
*/
/* ERROR: sales.dummy2 has no primary key! *//*
SQL generated on Thu Sep 04 10:23:52 BST 2014 by ./ddlscan utility of Tungsten
url = jdbc:mysql://localhost:13306/sales
user = tungsten
dbName = sales
*/
For certain environments, particularly heterogeneous replication, the lack of primary keys can lead to inefficient replication, or even fail to replicate data at all.
Generates DDL suitable for a carbon-copy form of the table from the MySQL host:
shell> ddlscan -user tungsten -url 'jdbc:mysql://tr-hadoop1:13306/test' -pass password \
-template ddl-mysql-hive-0.10.vm -db test
--
-- SQL generated on Thu Sep 11 12:57:11 BST 2014 by Tungsten ddlscan utility
--
-- url = jdbc:mysql://tr-hadoop1:13306/test
-- user = tungsten
-- dbName = test
--
DROP TABLE IF EXISTS test.sales;
CREATE TABLE test.sales
(
id INT,
salesman STRING,
planet STRING,
value DOUBLE )
;
Wherever possible, the closest Hive equivalent datatype is used for each source datatype, as follows:
The template supports the following optional parameters to change behavior:
A prefix to be placed in front of all schemas. For example,
if called with
schemaPrefix
set
to mysql_
:
shell> ddlscan ... -opt schemaPrefix mysql_
The schema name will be prefixed, translating the schema
name from sales
into
mysql_sales
.
A prefix to be placed in front of all schemas. For example,
if called with
tablePrefix
set to
mysql_
:
shell> ddlscan ... -opt tablePrefix mysql_
The table name will be prefixed, translating the tablename
from sales
into
mysql_sales
.
Staging tables within Hive define the original table columns
with additional columns to track the operation type, sequence
number, timestamp and unique key for each row. For example, the
table sales
in MySQL:
mysql> describe sales;
+----------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| salesman | char(20) | YES | | NULL | |
| planet | char(20) | YES | | NULL | |
| value | float | YES | | NULL | |
+----------+----------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
Generates the following Hive-compatible DDL when using this template:
shell> ddlscan -user tungsten -url 'jdbc:mysql://tr-hadoop1:13306/test' -pass password \
-template ddl-mysql-hive-0.10-staging.vm -db test
--
-- SQL generated on Thu Sep 11 12:31:45 BST 2014 by Tungsten ddlscan utility
--
-- url = jdbc:mysql://tr-hadoop1:13306/test
-- user = tungsten
-- dbName = test
--
DROP TABLE IF EXISTS test.stage_xxx_sales;
CREATE EXTERNAL TABLE test.stage_xxx_sales
(
tungsten_opcode STRING ,
tungsten_seqno INT ,
tungsten_row_id INT ,
tungsten_commit_timestamp TIMESTAMP ,
id INT,
salesman STRING,
planet STRING,
value DOUBLE )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' ESCAPED BY '\\'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION '/user/tungsten/staging/test/sales' ;
Wherever possible, the closest Hive equivalent datatype is used
for each source datatype, see
ddl-mysql-hive-0.10.vm
for more information.
The Hadoop tools require information about the schema in JSON format so that the table names and primary key information can be used when materializing data from the staging tables into the base tables. This template generates that information in JSON format:
shell> ddlscan -user tungsten -url 'jdbc:mysql://tr-hadoop1:13306/test' -pass password \
-template ddl-mysql-hive-metadata.vm -db test
{
"tables": [
{
"schema": "test",
"name": "sales",
"keys": ["id"],
"columns": [
{"name": "id", "type": "INT"},
{"name": "salesman", "type": "STRING"},
{"name": "planet", "type": "STRING"},
{"name": "value", "type": "DOUBLE"}
]
}
]
}
When translating MySQL tables to Oracle compatible schema, the following datatypes are migrated to their closest Oracle equivalent:
The following additional transformations happen automatically:
Table names are translated to uppercase.
Column names are translated to uppercase.
If a column name is a reserved word in Oracle, then the
column name has an underscore character appended (for
example, TABLE
becomes
TABLE_
).
In addition to the above translations, errors will be raised for the following conditions:
If the table name starts with a number.
If the table name exceeds 30 characters in length.
If the table name is a reserved word in Oracle.
Warnings will be raised for the following conditions:
If the column or column name started with a number.
If the column name exceeds 30 characters in length, the column name will be truncated.
If the column name is a reserved word in Oracle.
The
ddl-mysql-oracle-cdc.vm
template generates identical tables in Oracle, from their MySQL
equivalent, but with additional columns for CDC capture. For
example:
shell> ddlscan -user tungsten -url 'jdbc:mysql://tr-hadoop1:13306/test' -pass password \
-template ddl-mysql-oracle-cdc.vm -db test
/*
SQL generated on Thu Sep 11 13:17:05 BST 2014 by ./ddlscan utility of Tungsten
url = jdbc:mysql://tr-hadoop1:13306/test
user = tungsten
dbName = test
*/
DROP TABLE test.sales;
CREATE TABLE test.sales
(
id NUMBER(10, 0) NOT NULL,
salesman CHAR,
planet CHAR,
value FLOAT,
CDC_OP_TYPE VARCHAR(1), /* CDC column */
CDC_TIMESTAMP TIMESTAMP, /* CDC column */
CDC_SEQUENCE_NUMBER NUMBER PRIMARY KEY /* CDC column */
);
For information on the datatypes translated, see
ddl-mysql-oracle.vm
.
The
ddl-mysql-redshift.vm
template generates DDL for Amazon Redshift tables from MySQL
schemas. For example:
CREATE TABLE test.all_mysql_types ( my_id INT, my_bit BOOLEAN /* BIT(1) */, my_tinyint SMALLINT /* TINYINT(4) */, my_boolean SMALLINT /* TINYINT(1) */, my_smallint SMALLINT, my_mediumint INT /* MEDIUMINT(9) */, my_int INT, my_bigint BIGINT, my_decimal_10_5 DECIMAL(10,5), my_float FLOAT, my_double DOUBLE PRECISION /* DOUBLE */, my_date DATE, my_datetime DATETIME, my_timestamp TIMESTAMP, my_time VARCHAR(17) /* WARN: no pure TIME type in Redshift */, my_year YEAR(4) /* ERROR: unrecognized (type=0, length=0) */, my_char_10 CHAR(10), my_varchar_10 VARCHAR(40) /* VARCHAR(10) */, my_tinytext VARCHAR(65535) /* WARN: MySQL TINYTEXT translated to max VARCHAR */, my_text VARCHAR(65535) /* WARN: MySQL TEXT translated to max VARCHAR */, my_mediumtext VARCHAR(65535) /* WARN: MySQL MEDIUMTEXT translated to max VARCHAR */, my_longtext VARCHAR(65535) /* WARN: MySQL LONGTEXT translated to max VARCHAR */, my_enum_abc VARCHAR(1) /* ENUM('A','B','C') */, my_set_def VARCHAR(65535) /* SET('D','E','F') */, PRIMARY KEY (my_id) );
Columns are translated as follows:
Oracle Datatype | Redshift Datatype |
---|---|
BIGINT
|
BIGINT
|
BINARY
|
BINARY , CHAR in 5.2.1 and later
|
BIT(1)
|
BOOLEAN
|
BIT
|
CHAR
|
BLOB
|
VARBINARY VARCHAR in 5.2.1 and later
|
CHAR
|
CHAR
|
DATE
|
DATE
|
DATETIME
|
DATETIME
|
DECIMAL
|
DECIMAL
|
DOUBLE
|
DOUBLE PRECISION
|
ENUM
|
VARCHAR
|
FLOAT
|
FLOAT
|
INT
|
INT
|
LONGBLOB
|
VARBINARY CHAR in 5.2.1 and later
|
LONGTEXT
|
VARCHAR
|
MEDIUMBLOB
|
VARBINARY CHAR in 5.2.1 and later
|
MEDIUMINT
|
INT
|
MEDIUMTEXT
|
VARCHAR
|
SET
|
VARCHAR
|
SMALLINT
|
SMALLINT
|
TEXT
|
VARCHAR
|
TIME
|
VARCHAR
|
TIMESTAMP
|
TIMESTAMP
|
TINYBLOB
|
VARBINARY CHAR in 5.2.1 and later
|
TINYINT
|
SMALLINT
|
TINYTEXT
|
VARCHAR
|
VARBINARY
|
VARBINARY CHAR in 5.2.1 and later
|
VARCHAR
|
VARCHAR
|
In addition to these explicit changes, the following other considerations are taken into account:
When translating the DDL for
CHAR
and
VARCHAR
columns, the
actual column size is increased by a factor of four. This is
because Redshift tables always stored data using 32-bit UTF
characters and column sizes are in bytes, not characters.
Therefore a CHAR(20)
column is created as
CHAR(80)
within
Redshift.
TEXT
columns are
converted to a Redshift
VARCHAR
of
65535 in length (the maximum allowed).
BLOB
columns are
converted to a Redshift
VARBINARY
of 65000 in length (the maximum allowed).
BIT
columns with a
size of 1 are converted to Redshift
BOOLEAN
columns, larger sizes are converted to
CHAR
columns of 64 bytes in length.
TIME
columns are
converted to a Redshift
VARCHAR
of
17 bytes in length since no explicit
TIME
type exists.
The
ddl-mysql-redshift-staging.vm
template generates DDL for Amazon Redshift tables from MySQL
schemas. For example:
CREATE TABLE test.stage_xxx_all_mysql_types ( tungsten_opcode CHAR(2), tungsten_seqno INT, tungsten_row_id INT, tungsten_commit_timestamp TIMESTAMP, my_id INT, my_bit BOOLEAN /* BIT(1) */, my_tinyint SMALLINT /* TINYINT(4) */, my_boolean SMALLINT /* TINYINT(1) */, my_smallint SMALLINT, my_mediumint INT /* MEDIUMINT(9) */, my_int INT, my_bigint BIGINT, my_decimal_10_5 DECIMAL(10,5), my_float FLOAT, my_double DOUBLE PRECISION /* DOUBLE */, my_date DATE, my_datetime DATETIME, my_timestamp TIMESTAMP, my_time VARCHAR(17) /* WARN: no pure TIME type in Redshift */, my_year YEAR(4) /* ERROR: unrecognized (type=0, length=0) */, my_char_10 CHAR(10), my_varchar_10 VARCHAR(40) /* VARCHAR(10) */, my_tinytext VARCHAR(65535) /* WARN: MySQL TINYTEXT translated to max VARCHAR */, my_text VARCHAR(65535) /* WARN: MySQL TEXT translated to max VARCHAR */, my_mediumtext VARCHAR(65535) /* WARN: MySQL MEDIUMTEXT translated to max VARCHAR */, my_longtext VARCHAR(65535) /* WARN: MySQL LONGTEXT translated to max VARCHAR */, my_enum_abc VARCHAR(1) /* ENUM('A','B','C') */, my_set_def VARCHAR(65535) /* SET('D','E','F') */, PRIMARY KEY (tungsten_opcode, tungsten_seqno, tungsten_row_id) );
The actual translation of datatypes is identical to that found
in
ddl-mysql-redshift.vm
.
The
ddl-mysql-vertica.vm
template generates DDL for generating tables within an HP
Vertica database from an existing MySQL database schema. For
example:
shell> ddlscan -user tungsten -url 'jdbc:mysql://tr-hadoop1:13306/test' -pass password \
-template ddl-mysql-vertica.vm -db test
/*
SQL generated on Thu Sep 11 14:20:14 BST 2014 by ./ddlscan utility of Tungsten
url = jdbc:mysql://tr-hadoop1:13306/test
user = tungsten
dbName = test
*/
CREATE SCHEMA test;
DROP TABLE test.sales;
CREATE TABLE test.sales
(
id INT ,
salesman CHAR(20) ,
planet CHAR(20) ,
value FLOAT ) ORDER BY id;
Because Vertica does not explicitly support primary keys, a default projection for the key order is created based on the primary key of the source table.
The templates translates different datatypes as follows:
MySQL Datatype | Vertica Datatype |
---|---|
DATETIME
|
DATETIME
|
TIMESTAMP
|
TIMESTAMP
|
DATE
|
DATE
|
TIME
|
TIME
|
TINYINT
|
TINYINT
|
SMALLINT
|
SMALLINT
|
MEDIUMINT
|
INT
|
INT
|
INT
|
BIGINT
|
INT
|
VARCHAR
|
VARCHAR
|
CHAR
|
CHAR
|
BINARY
|
BINARY
|
VARBINARY
|
VARBINARY
|
TEXT , TINYTEXT , MEDIUMTEXT , LONGTEXT
|
VARCHAR(65000)
|
BLOB , TINYBLOB , MEDIUMBLOB , LONGBLOB
|
VARBINARY(65000)
|
FLOAT
|
FLOAT
|
DOUBLE
|
DOUBLE PRECISION
|
ENUM
|
VARCHAR
|
SET
|
VARCHAR(4000)
|
BIT(1)
|
BOOLEAN
|
BIT
|
CHAR(64)
|
In addition, the following considerations should be taken into account:
DECIMAL
MySQL type
is not supported.
TEXT
types in MySQL
are converted to a
VARCHAR
in
Vertica of the maximum supported size.
BLOB
types in MySQL
are converted to a
VARBINARY
in
Vertica of the maximum supported size.
SET
types in MySQL
are converted to a
VARCHAR
in
Vertica of 4000 characters, designed to work in tandem with
the settostring
filter.
ENUM
types in MySQL
are converted to a
VARCHAR
in
Vertica of the size of the longest
ENUM
value, designed
to work in tandem with the
enumtostring
filter.
The
ddl-mysql-vertica-staging.vm
template generates DDL for HP Vertica staging tables. These
include the full table definition, in addition to three columns
used to define the staging data, including the operation code,
sequence number and unique row ID. For example:
shell> ddlscan -user tungsten -url 'jdbc:mysql://tr-hadoop1:13306/test' -pass password \
-template ddl-mysql-vertica-staging.vm -db test
/*
SQL generated on Thu Sep 11 14:22:06 BST 2014 by ./ddlscan utility of Tungsten
url = jdbc:mysql://tr-hadoop1:13306/test
user = tungsten
dbName = test
*/
CREATE SCHEMA test;
DROP TABLE test.stage_xxx_sales;
CREATE TABLE test.stage_xxx_sales
(
tungsten_opcode CHAR(1) ,
tungsten_seqno INT ,
tungsten_row_id INT ,
id INT ,
salesman CHAR(20) ,
planet CHAR(20) ,
value FLOAT ) ORDER BY tungsten_seqno, tungsten_row_id;
The ddl-oracle-mysql.vm
template generates the DDL required to create a schema within
MySQL based on the existing Oracle schema. For example:
shell> ddlscan -service sales -template ddl-oracle-mysql.vm -db sales /* SQL generated on Thu Sep 11 04:29:08 PDT 2014 by ./ddlscan utility of Tungsten url = jdbc:oracle:thin:@//tr-fromoracle1:1521/ORCL user = SALES_PUB dbName = sales */ /* ERROR: no tables found! Is database and tables option specified correctly? */ [tungsten@tr-fromoracle1 ~]$ ddlscan -service sales -template ddl-oracle-mysql.vm -db SALES /* SQL generated on Thu Sep 11 04:29:16 PDT 2014 by ./ddlscan utility of Tungsten url = jdbc:oracle:thin:@//tr-fromoracle1:1521/ORCL user = SALES_PUB dbName = SALES */ DROP TABLE IF EXISTS sales.sample; CREATE TABLE sales.sample ( id DECIMAL(38) /* NUMBER(38, ?) */ NOT NULL, msg CHAR(80), PRIMARY KEY (id) ) ENG
Columns are translated as follows:
Oracle Datatype | MySQL Datatype |
---|---|
DATE
|
DATETIME
|
NUMBER(0)
|
NUMERIC
|
NUMBER(n) where n < 19
|
INT
|
NUMBER(n) where n > 19
|
BIGINT
|
NUMBER(n) where n < 3
|
TINYINT
|
NUMBER(n) where n < 5
|
SMALLINT
|
NUMBER(n) where n < 7
|
MEDIUMINT
|
NUMBER(n) where n < 10
|
INT
|
NUMBER(n) where n < 19
|
BIGINT
|
NUMBER
|
DECIMAL
|
FLOAT
|
FLOAT
|
VARCHAR
|
VARCHAR
|
LONG
|
LONGTEXT
|
BFILE
|
VARCHAR(1024)
|
CHAR
|
CHAR
|
CLOB
|
LONGTEXT
|
BLOB
|
LONGBLOB
|
LONG RAW
|
LONGBLOB
|
TIMESTAMP
|
TIMESTAMP
|
RAW
|
VARBINARY
|
The following additional transformations happen automatically:
If a column name is a reserved word in MySQL, then the
column name has an underscore character appended (for
example, TABLE
becomes
TABLE_
).
An error is raised in the following conditions:
If the size of a
FLOAT
is larger than
53 points of precision.
The
ddl-oracle-mysql-pk-only.vm
template generates alter table statements to add the primary
key, as determined from the Oracle primary key or index
information. For example:
shell> ddlscan -service hadoop -template ddl-oracle-mysql-pk-only.vm -db HADOOP /* SQL generated on Thu Sep 11 06:17:28 PDT 2014 by ./ddlscan utility of Tungsten url = jdbc:oracle:thin:@//tr-fromoracle1:1521/ORCL user = HADOOP_PUB dbName = HADOOP */ ALTER TABLE hadoop.sample ADD PRIMARY KEY (ID);
Note that it does not generate table DDL, only statements to alter existing tables with primary key information.