Skip to main content
Common Reference

The load-reduce-check Command

Applies to: Tungsten Replicator

The load-reduce-check tool provides a single command to perform the final steps to convert data loaded through the Hadoop applier into a final, Hive-compatible table providing a carbon copy of the data within Hive as extracted from the source database.

See "deployment-applier-hadoop" for more details on configuring the Hadoop Applier.

The four steps, each of which can be enabled or disabled individually are:

  1. Generate staging DDL using the ddlscan tool.

    Accesses the source database, reads the schema definition, and generates the necessary DDL for the staging tables within Hive. Tables are by default prefixed with stage_xxx_, and created in a Hive schema matching the source schema.

  2. Generate live DDL using the ddlscan tool.

    Accesses the source database, reads the schema definition, and generates the necessary DDL for the tables within Hive. Tables are created with an identical table and schema name to the source schema.

  3. Materializing a view

    Execute a view materialization, where the data in any existing table, and the staging table are merged into the final table data. This step is identical to the process executed when running the materialize tool.

  4. Compare Loaded Data

    Compares the data within the source and materialized tables and reports any differences.