6.4.1. Preparing Hosts for Vertica Deployments

Preparing the hosts for the replication process requires setting some key configuration parameters within the MySQL server to ensure that data is stored and written correctly. On the Vertica side, the database and schema must be created using the existing schema definition so that the databases and tables exist within Vertica.

MySQL Host

The data replicated from MySQL can be any data, although there are some known limitations and assumptions made on the way the information is transferred.

The following are required for replication to Vertica:

  • MySQL must be using Row-based replication for information to be replicated to Vertica. For the best results, you should change the global binary log format, ideally in the configuration file (my.cnf):

    binlog-format = row

    Alternatively, the global binlog format can be changed by executing the following statement:

    mysql> SET GLOBAL binlog-format = ROW;

    For MySQL 5.6.2 and later, you must enable full row log images:

    binlog-row-image = full

    This information will be forgotten when the MySQL server is restarted; placing the configuration in the my.cnf file will ensure this option is permanently enabled.

  • Table format should be updated to UTF8 by updating the MySQL configuration (my.cnf):

    character-set-server=utf8
    collation-server=utf8_general_ci

    Tables must also be configured as UTF8 tables, and existing tables should be updated to UTF8 support before they are replicated to prevent character set corruption issues.

  • To prevent timezone configuration storing zone adjusted values and exporting this information to the binary log and Vertica, fix the timezone configuration to use UTC within the configuration file (my.cnf):

    default-time-zone='+00:00'

Vertica Host

On the Vertica host, you need to perform some preparation of the destination database, first creating the database, and then creating the tables that are to be replicated.

  • Create a database (if you want to use a different one than those already configured), and a schema that will contain the Tungsten data about the current replication position:

    shell> vsql -Udbadmin -wsecret bigdata
    Welcome to vsql, the Vertica Analytic Database v5.1.1-0 interactive terminal.
    
    Type:  \h for help with SQL commands
           \? for help with vsql commands
           \g or terminate with semicolon to execute query
           \q to quit
    
    bigdata=> create schema tungsten_alpha;

    The schema will be used only by Tungsten Replication to store metadata about the replication process.

  • Locate the Vertica JDBC driver. This can be downloaded separately from the Vertica website. The driver will need to be copied into the Tungsten Replication lib directory.

    shell> cp vertica-jdbc-7.1.2-0.jar tungsten-replicator-5.0.1-136/tungsten-replicator/lib/
  • You need to create tables within Vertica according to the databases and tables that need to be replicated; the tables are not automatically created for you. From a Tungsten Replication deployment directory, the ddlscan command can be used to identify the existing tables, and create table definitions for use within Vertica.

    To use ddlscan, the template for Vertica must be specified, along with the user/password information to connect to the source database to collect the schema definitions. The tool should be run from the templates directory.

    The tool will need to be executed twice, the first time generates the live table definitions:

    shell> cd tungsten-replicator-5.0.1-136
    shell> cd tungsten-replicator/samples/extensions/velocity/
    shell> ddlscan -user tungsten -url 'jdbc:mysql:thin://host1:13306/access_log' -pass password \
        -template ddl-mysql-vertica.vm -db access_log
    /*
    SQL generated on Fri Sep 06 14:37:40 BST 2013 by ./ddlscan utility of Tungsten
    
    url = jdbc:mysql:thin://host1:13306/access_log
    user = tungsten
    dbName = access_log
    */
    CREATE SCHEMA access_log;
    
    DROP TABLE access_log.access_log;
    
    CREATE TABLE access_log.access_log
    (
      id INT ,
      userid INT ,
      datetime INT ,
      session CHAR(30) ,
      operation CHAR(80) ,
      opdata CHAR(80)  ) ORDER BY id;
    ...

    The output should be redirected to a file and then used to create tables within Vertica:

    shell> ddlscan -user tungsten -url 'jdbc:mysql:thin://host1:13306/access_log' -pass password \
        -template ddl-mysql-vertica.vm -db access_log >access_log.ddl

    The output of the command should be checked to ensure that the table definitions are correct.

    The file can then be applied to Vertica:

    shell> cat access_log.ddl | vsql -Udbadmin -wsecret bigdata

    This generates the table definitions for live data. The process should be repeated to create the table definitions for the staging data by using te staging template:

    shell> ddlscan -user tungsten -url 'jdbc:mysql:thin://host1:13306/access_log' -pass password \
        -template ddl-mysql-vertica-staging.vm -db access_log >access_log.ddl-staging

    Then applied to Vertica:

    shell> cat access_log.ddl-staging | vsql -Udbadmin -wsecret bigdata

    The process should be repeated for each database that will be replicated.

Once the preparation of the MySQL and Vertica databases are ready, you can proceed to installing Tungsten Replication