6.5. Deploying MySQL to Amazon Redshift Replication
Amazon Redshift is a cloud-based data warehouse service that integrates with
other Amazon services, such as S3, to provide an SQL-like interface to the
loaded data. Replication for Amazon Redshift moves data from MySQL or Oracle
datastores, through S3, and into the Redshift environment in real-time,
avoiding the need to manually export and import the data.
Replication to Amazon Redshift operates as follows:
Data is extracted from the source database into THL.
When extracting the data from the THL, the Amazon Redshift replicator
writes the data into CSV files according to the name of the source
tables. The files contain all of the row-based data, including the
global transaction ID generated by Tungsten Replication during replication,
and the operation type (insert, delete, etc) as part of the CSV data.
The generated CSV files are loaded into Amazon S3 using the
s3cmd command. This enables easy access to your
Amazon S3 installation and simplifies the loading.
The CSV data is loaded from S3 into Redshift staging tables using the
which imports raw CSV into Redshift tables.
SQL statements are then executed within Redshift to perform updates on
the live version of the tables, using the CSV, batch loaded,
information, deleting old rows, and inserting the new data when
performing updates to work effectively within the confines of Amazon
Figure 6.6. Topologies: MySQL to Amazon Redshift
Setting up replication requires setting up both the master and slave
components as two different configurations, one for MySQL and the other for
Amazon Redshift. Replication also requires some additional steps to ensure
that the Amazon Redshift host is ready to accept the replicated data that
has been extracted. Tungsten Replication provides all the tools required to
perform these operations during the installation and setup.