9.8. The load-reduce-check Tool

Important

The load-reduce-check tool is not part of the standard replicator distribution. The tool is part of the continuent-tools-hadoop repository, available from Github.

The load-reduce-check tool provides a single command to perform the final steps to convert data loaded through the Hadoop applier into a final, Hive-compatible table providing a carbon copy of the data within Hive as extracted from the source database.

The four steps, each of which can be enabled or disabled individually are:

  1. Section 9.8.1, “Generating Staging DDL”

    Accesses the source database, reads the schema definition, and generates the necessary DDL for the staging tables within Hive. Tables are by default prefixed with stage_xxx_, and created in a Hive schema matching the source schema.

  2. Section 9.8.2, “Generating Live DDL”

    Accesses the source database, reads the schema definition, and generates the necessary DDL for the tables within Hive. Tables are created with an identical table and schema name to the source schema.

  3. Section 9.8.3, “Materializing a View”

    Execute a view materialization, where the data in any existing table, and the staging table are merged into the final table data. This step is identical to the process executed when running the materialize tool.

  4. Section 9.8.4, “Compare Loaded Data”

    Compares the data within the source and materialized tables and reports any differences.

The load-reduce-check tool