Preparing the hosts for the replication process requires setting some key configuration parameters within the MySQL server to ensure that data is stored and written correctly. On the Vertica side, the database and schema must be created using the existing schema definition so that the databases and tables exist within Vertica.
Source Host
Configure the source and target hosts following the prerequisites outlined in Appendix B, Prerequisites then follow the appropriate steps for the required extractor topology outlined in Chapter 3, Deploying MySQL Extractors.
Vertica Host
On the Vertica host, you need to perform some preparation of the destination database, first creating the database, and then creating the tables that are to be replicated.
Create a database (if you want to use a different one than those already configured), and a schema that will contain the Tungsten data about the current replication position:
shell>vsql -Udbadmin -wsecret bigdata
Welcome to vsql, the Vertica Analytic Database v5.1.1-0 interactive terminal. Type: \h for help with SQL commands \? for help with vsql commands \g or terminate with semicolon to execute query \q to quit bigdata=>create schema tungsten_alpha;
The schema will be used only by Tungsten Replicator to store metadata about the replication process.
Locate the Vertica JDBC driver. This can be downloaded separately from
the Vertica website. The driver will need to be copied into the
Tungsten Replicator lib
directory.
shell> cp vertica-jdbc-7.1.2-0.jar tungsten-replicator-7.1.4-10/tungsten-replicator/lib/
You need to create tables within Vertica according to the databases and tables that need to be replicated; the tables are not automatically created for you. From a Tungsten Replicator deployment directory, the ddlscan command can be used to identify the existing tables, and create table definitions for use within Vertica.
To use ddlscan, the template for Vertica must be specified, along with the user/password information to connect to the source database to collect the schema definitions. The tool should be run from the templates directory.
The tool will need to be executed twice, the first time generates the live table definitions:
shell>cd tungsten-replicator-7.1.4-10
shell>cd tungsten-replicator/samples/extensions/velocity/
shell>ddlscan -user tungsten -url 'jdbc:mysql:thin://host1:13306/access_log' -pass password \ -template ddl-mysql-vertica.vm -db access_log
/* SQL generated on Fri Sep 06 14:37:40 BST 2013 by ./ddlscan utility of Tungsten url = jdbc:mysql:thin://host1:13306/access_log user = tungsten dbName = access_log */ CREATE SCHEMA access_log; DROP TABLE access_log.access_log; CREATE TABLE access_log.access_log ( id INT , userid INT , datetime INT , session CHAR(30) , operation CHAR(80) , opdata CHAR(80) ) ORDER BY id; ...
The output should be redirected to a file and then used to create tables within Vertica:
shell> ddlscan -user tungsten -url 'jdbc:mysql:thin://host1:13306/access_log' -pass password \
-template ddl-mysql-vertica.vm -db access_log >access_log.ddl
The output of the command should be checked to ensure that the table definitions are correct.
The file can then be applied to Vertica:
shell> cat access_log.ddl | vsql -Udbadmin -wsecret bigdata
This generates the table definitions for live data. The process should be repeated to create the table definitions for the staging data by using te staging template:
shell> ddlscan -user tungsten -url 'jdbc:mysql:thin://host1:13306/access_log' -pass password \
-template ddl-mysql-vertica-staging.vm -db access_log >access_log.ddl-staging
Then applied to Vertica:
shell> cat access_log.ddl-staging | vsql -Udbadmin -wsecret bigdata
The process should be repeated for each database that will be replicated.
Once the preparation of the MySQL and Vertica databases are ready, you can proceed to installing Tungsten Replicator