Replicating data into Hadoop is achieved by generating character-separated
values from ROW-based information that is applied directly to the Hadoop
HDFS using a batch loading
process. Files are written directly to the HDFS using the Hadoop client
libraries. A separate process is then used to merge existing data, and the
changed information extracted from the master database.
Deployment of the Hadoop replication is similar to other heterogeneous
installations; two separate installations are created:
Service Alpha on the master extracts the information from the MySQL
binary log into THL.
Service Alpha on the slave reads the information from the remote
replicator as THL, applying it to Hadoop. The applier works in two
Figure 3.8. Topologies: MySQL to Hadoop
Basic requirements for replication into Hadoop:
Hadoop Replication is supported on the following Hadoop distributions