Apache Cassandra provides high performance and fault tolerant storage for large, distributed, data sets.
Replication to Cassandra is performed using the batch applier, and operates as follows:
Data is extracted from the source database into THL.
When extracting the data from the THL, the replicator writes the data
into CSV files according to the name of the source tables. The files
contain all of the row-based data, including the global transaction ID
generated by Tungsten Clustering during replication, and the operation type
DELETE, etc) as part of the CSV
The CSV data is then loaded into Cassandra into staging tables.
SQL statements are then executed to perform updates on the live version of the tables, using the CSV, batch loaded, information, deleting old rows, and inserting the new data when performing updates to work effectively within the confines of Cassandra operation.
Setting up replication requires setting up both the master and slave components as two different configurations, one for MySQL and the other for Cassandra. Replication also requires some additional steps to ensure that the Cassandra host is ready to accept the replicated data that has been extracted. Tungsten Clustering uses all the tools required to perform these operations during the installation and setup.