Deploying the Hadoop Applier
Replicating data into Hadoop is achieved by generating character-separated values from ROW-based information that is applied directly to the Hadoop HDFS using a batch loading process. Files are written directly to the HDFS using the Hadoop client libraries. A separate process is then used to merge existing data, and the changed information extracted from the Source database.
Deployment of the Hadoop replication is similar to other heterogeneous installations; two separate installations are created:
- Service Alpha on the extractor, extracts the information from the MySQL binary log into THL.
- Service Alpha on the applier, reads the information from the remote replicator as THL, applying it to Hadoop. The applier works in two stages

Basic requirements for replication into Hadoop:
- Hadoop Replication is supported on the following Hadoop distributions and releases:
- Cloudera Enterprise 4.4, Cloudera Enterprise 5.0 (Certified) up to Cloudera Enterprise 5.8
- HortonWorks DataPlatform 2.0
- Amazon Elastic MapReduce
- IBM InfoSphere BigInsights 2.1 and 3.0
- MapR 3.0, 3.1, and 5.x
- Pivotal HD 2.0
- Apache Hadoop 2.1.0, 2.2.0
- Source tables must have primary keys. Without a primary key, Tungsten Replicator is unable to determine the row to be updated when the data reaches Hadoop.