4.6. Deploying the Hadoop Applier

4.6. Deploying the Hadoop Applier
Prev	^Up	Chapter 4. Deploying Appliers	Next

4.6. Deploying the Hadoop Applier

4.6.1. Hadoop Replication Operation
4.6.2. Preparing for Hadoop Replication
4.6.3. Replicating into Kerberos Secured HDFS
4.6.4. Install Hadoop Replication

Replicating data into Hadoop is achieved by generating character-separated values from ROW-based information that is applied directly to the Hadoop HDFS using a batch loading process. Files are written directly to the HDFS using the Hadoop client libraries. A separate process is then used to merge existing data, and the changed information extracted from the Source database.

Deployment of the Hadoop replication is similar to other heterogeneous installations; two separate installations are created:

Service Alpha on the extractor, extracts the information from the MySQL binary log into THL.
Service Alpha on the applier, reads the information from the remote replicator as THL, applying it to Hadoop. The applier works in two stages:

Figure 4.7. Topologies: Replicating to Hadoop

Basic requirements for replication into Hadoop:

Hadoop Replication is supported on the following Hadoop distributions and releases:
- Cloudera Enterprise 4.4, Cloudera Enterprise 5.0 (Certified) up to Cloudera Enterprise 5.8
- HortonWorks DataPlatform 2.0
- Amazon Elastic MapReduce
- IBM InfoSphere BigInsights 2.1 and 3.0
- MapR 3.0, 3.1, and 5.x
- Pivotal HD 2.0
- Apache Hadoop 2.1.0, 2.2.0
Source tables must have primary keys. Without a primary key, Tungsten Replicator is unable to determine the row to be updated when the data reaches Hadoop.

Prev	Up	Next
4.5.5. Management and Monitoring of MongoDB Deployments	^Level	4.6.1. Hadoop Replication Operation

Continuent Documentation

4.6. Deploying the Hadoop Applier