Tungsten Replicator

Redshift Preparation for Amazon Redshift Deployments

On the Amazon Redshift host, you need to perform some preparation of the destination database, first creating the database, and then creating the tables that are to be replicated. Setting up this process requires the configuration of a number of components outside of Tungsten Replicator in order to support the loading.

An existing Amazon Web Services (AWS) account, and either the AWS Access Key and Secret Key, or configured IAM Roles, required to interact with the account through the API. For information on creating IAM Roles, see "Configuring Identity Access Management within AWS"
A configured Amazon S3 service. If the S3 service has not already been configured, visit the AWS console and sign up for the Amazon S3 service.
The s3cmd or the aws tools installed and configured. The s3cmd can be downloaded from "s3cmd on s3tools.org"
If using the s3cmd, you should then configure the command to automatically connect to the Amazon S3 service without requiring further authentication, the .s3cfg in the tungsten users home directory should be configured as follows:
- Using Access Keys:
```
[default]
access_key = ACCESS_KEY
secret_key = SECRET_KEY
```
- Using IAM Roles:
  Leave values blank - copy example as is
```
[default]
access_key = 
secret_key = 
security_token =
```
Create an S3 bucket that will be used to hold the CSV files that are generated by the replicator. This can be achieved either through the web interface, or via the command-line, for example:
```
shell> s3cmd mb s3://tungsten-csv
```
A running Redshift instance must be available, and the port and IP address of the Tungsten Cluster that will be replicating into Redshift must have been added to the Redshift instance security credentials.
Make a note of the user and password that has been provided with access to the Redshift instance, as these will be needed when installing the applier. Also make a note of the Redshift instance address, as this will need to be provided to the applier configuration.
Create an s3-config-servicename.json file based on the sample provided within cluster-home/samples/conf/s3-config-servicename.json within the Tungsten Replicator staging directory, or using the example below.
Once created, the file will be copied into the /opt/continuent/share directory to be used by the batch applier script.
If multiple services are being created, one file must be created for each service.
The following example shows the use of Access and Secret Keys:
```
{
"awsS3Path" : "s3://your-bucket-for-redshift/redshift-test",
"awsAccessKey" : "access-key-id",
"awsSecretKey" : "secret-access-key",
"cleanUpS3Files" : "true"
}
```
The following example shows the use of IAM Roles:
```
{
"awsS3Path" : "s3://your-bucket-for-redshift/redshift-test",
"awsIAMRole" : "arn:iam-role",
"cleanUpS3Files" : "true"
}
```
The allowed options for this file are as follows:
- awsS3Path - the location within your S3 storage where files should be loaded.
- awsAccessKey - the S3 access key to access your S3 storage. Not required if awsIAMRole is used.
- awsSecretKey - the S3 secret key associated with the Access Key. Not required if awsIAMRole is used.
- awsIAMRole - the IAM role configured to allow Redshift to interact with S3. Not required if awsAccessKey and awsSecretKey are in use.
- multiServiceTarget (true/false) - to indicate if there are multiple appliers writing into the single Redshift Target, for example when the source is Tungsten Cluster Composite Active/Active or a Tungsten Replicator Fan-In Topology (Default: false).
- singleLockTable (true/false) - to indicate the table lock behaviour when multiServiceTarget is true. Will be ignored if multiServiceTarget set to false (Default: true)
- lockTablePrefix - the prefix for the lock tables when singleLockTable is false. (Default: lock_xxx_)
- s3Binary - the binary to use for loading csv file up to S3. (Valid Values: s3cmd, s4cmd, aws) (Default: s3cmd)
- redshiftCopyOptions - allows the passing of additional valid syntax to be added to the Redshift COPY command during csv loading from S3 into Redshift Staging Tables.
  A list of valid parameters can be found in the "Redshift documentation"
- cleanUpS3Files - a boolean value used to identify whether the CSV files loaded into S3 should be deleted after they have been imported and merged. If set to true, the files are automatically deleted once the files have been successfully imported into the Redshift staging tables. If set to false, files are not automatically removed.
- gzipS3Files - setting to true will result in the csv files being gzipped prior to loading into S3 (Default: false)
- storeCDCIn - a definition table that stores the change data from the load, in addition to importing to staging and base tables. The {schema} and {table} variables will be automatically replaced with the corresponding schema and table name. For more information on keeping CDC information, see "Keeping CDC Information".