The load-reduce-check Command
The load-reduce-check tool provides a single command to perform the final steps to convert data loaded through the
Hadoop applier into a final, Hive-compatible table providing a carbon copy of the data within Hive as extracted from the source database.
See "deployment-applier-hadoop" for more details on configuring the Hadoop Applier.
The four steps, each of which can be enabled or disabled individually are:
Generate staging DDL using the
ddlscantool.Accesses the source database, reads the schema definition, and generates the necessary DDL for the staging tables within Hive. Tables are by default prefixed with
stage_xxx_, and created in a Hive schema matching the source schema.Generate live DDL using the
ddlscantool.Accesses the source database, reads the schema definition, and generates the necessary DDL for the tables within Hive. Tables are created with an identical table and schema name to the source schema.
Materializing a view
Execute a view materialization, where the data in any existing table, and the staging table are merged into the final table data. This step is identical to the process executed when running the
materializetool.Compare Loaded Data
Compares the data within the source and materialized tables and reports any differences.