4.6.4. Install Hadoop Replication

4.6.4. Install Hadoop Replication
Prev	^Up	4.6. Deploying the Hadoop Applier	Next

4.6.4. Install Hadoop Replication

4.6.4.1. Applier Replicator Service
4.6.4.2. Generating Materialized Views
4.6.4.3. Accessing Generated Tables in Hive
4.6.4.4. Management and Monitoring of Hadoop Deployments
4.6.4.5. Troubleshooting Hadoop Replication

Installation of the Hadoop replication consists of multiple stages:

Configure the source and target hosts following the prerequisites outlined in Appendix B, Prerequisites then follow the appropriate steps for the required extractor topology outlined in Chapter 3, Deploying MySQL Extractors.
Install the Applier replicator which will apply information to the target Hadoop environment.
Once the installation of the Extractor and Applier components have been completed, materialization of tables and views can be performed.

4.6.4.1. Applier Replicator Service

The applier replicator service reads information from the THL of the source and applies this to a local instance of Hadoop.

Important

Installation must take place on a node within the Hadoop cluster. Writing to a remote HDFS filesystem is not currently supported.

Before installing the applier, the following additions need adding to the extractor configuration. Apply the following parameters, update the extractor and then install the applier
Add the following to /etc/tungsten/tungsten.ini
```
[alpha]
...
enable-batch-service=true
enable-heterogeneous-service=true

shell> tpm update
```
The applier can now be configured.
Unpack the Tungsten Replicator distribution in a staging directory:
```
shell> tar zxf tungsten-replicator-6.1.25-6.tar.gz
```
Change into the staging directory:
```
shell> cd tungsten-replicator-6.1.25-6
```
Configure the installation using tpm:
```
shell> vi /etc/tungsten/tungsten.ini
```
```
[defaults]
user=tungsten
install-directory=/opt/continuent
profile-script=~/.bash_profile
skip-validation-check=HostsFileCheck
skip-validation-check=InstallerMasterSlaveCheck
skip-validation-check=DatasourceDBPort
skip-validation-check=DirectDatasourceDBPort
skip-validation-check=ReplicationServicePipelines
rest-api-admin-user=apiuser
rest-api-admin-password=secret
replicator-rest-api-address=0.0.0.0

[alpha]
master=host1
members=host2
property=replicator.datasource.global.csvType=hive
property=replicator.stage.q-to-dbms.blockCommitInterval=1s
property=replicator.stage.q-to-dbms.blockCommitRowCount=1000
replication-password=secret
replication-user=tungsten
batch-enabled=true
batch-load-language=js 
batch-load-template=hadoop
datasource-type=file
```
Configuration group defaults
The description of each of the options is shown below; click the icon to hide this detail:
Click the icon to show a detailed description of each argument.
- user=tungsten
  OS System User, for example tungsten. DO NOT use root
- install-directory=/opt/continuent
  Path to the directory where the active deployment will be installed. The configured directory will contain the software, THL and relay log information unless configured otherwise.
- profile-script=~/.bash_profile
  Append commands to include env.sh in this profile script
- skip-validation-check=HostsFileCheck
  The skip-validation-check disables a given validation check. If any validation check fails, the installation, validation or configuration will automatically stop.
  Warning
  Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
  You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
```
ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » 
 uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck)
```
  The check in this case is MySQLDefaultTableTypeCheck, and could be ignored using skip-validation-check=MySQLDefaultTableTypeCheck.
  Values can be passed as a comma-separated list, or single skip-validation-check entries for each check to be skipped.
  Setting both skip-validation-check and enable-validation-check is equivalent to explicitly disabling the specified check.
  This property must be specific within the [defaults] stanza
- skip-validation-check=InstallerMasterSlaveCheck
  The skip-validation-check disables a given validation check. If any validation check fails, the installation, validation or configuration will automatically stop.
  Warning
  Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
  You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
```
ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » 
 uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck)
```
  The check in this case is MySQLDefaultTableTypeCheck, and could be ignored using skip-validation-check=MySQLDefaultTableTypeCheck.
  Values can be passed as a comma-separated list, or single skip-validation-check entries for each check to be skipped.
  Setting both skip-validation-check and enable-validation-check is equivalent to explicitly disabling the specified check.
  This property must be specific within the [defaults] stanza
- skip-validation-check=DatasourceDBPort
  The skip-validation-check disables a given validation check. If any validation check fails, the installation, validation or configuration will automatically stop.
  Warning
  Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
  You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
```
ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » 
 uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck)
```
  The check in this case is MySQLDefaultTableTypeCheck, and could be ignored using skip-validation-check=MySQLDefaultTableTypeCheck.
  Values can be passed as a comma-separated list, or single skip-validation-check entries for each check to be skipped.
  Setting both skip-validation-check and enable-validation-check is equivalent to explicitly disabling the specified check.
  This property must be specific within the [defaults] stanza
- skip-validation-check=DirectDatasourceDBPort
  The skip-validation-check disables a given validation check. If any validation check fails, the installation, validation or configuration will automatically stop.
  Warning
  Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
  You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
```
ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » 
 uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck)
```
  The check in this case is MySQLDefaultTableTypeCheck, and could be ignored using skip-validation-check=MySQLDefaultTableTypeCheck.
  Values can be passed as a comma-separated list, or single skip-validation-check entries for each check to be skipped.
  Setting both skip-validation-check and enable-validation-check is equivalent to explicitly disabling the specified check.
  This property must be specific within the [defaults] stanza
- skip-validation-check=ReplicationServicePipelines
  The skip-validation-check disables a given validation check. If any validation check fails, the installation, validation or configuration will automatically stop.
  Warning
  Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
  You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
```
ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » 
 uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck)
```
  The check in this case is MySQLDefaultTableTypeCheck, and could be ignored using skip-validation-check=MySQLDefaultTableTypeCheck.
  Values can be passed as a comma-separated list, or single skip-validation-check entries for each check to be skipped.
  Setting both skip-validation-check and enable-validation-check is equivalent to explicitly disabling the specified check.
  This property must be specific within the [defaults] stanza
- rest-api-admin-user=apiuser
  Specify the initial Admin Username for API access.
- rest-api-admin-password=secret
  Specify the initial Admin User Password for API access. Use rest-api-admin-pass in versions prior to 7.1.2.
- replicator-rest-api-address=0.0.0.0
  Address for the API to bind too.
Configuration group alpha
The description of each of the options is shown below; click the icon to hide this detail:
Click the icon to show a detailed description of each argument.
- master=host1
  The hostname of the primary (extractor) within the current service.
- members=host2
  Hostnames for the dataservice members
- replication-password=secret
  The password to be used when connecting to the database using the corresponding --replication-user.
- replication-user=tungsten
  For databases that require authentication, the username to use when connecting to the database.
- batch-enabled=true
  Should the replicator service use a batch applier
- batch-load-language=js
  Which script language to use for batch loading
- batch-load-template=hadoop
  Value for the loadBatchTemplate property
- datasource-type=file
  For Replicator Extractors and Cluster nodes, this value can only be set to mysql. Other options are applicable to standalone hetergeneous replicator appliers (includeing cluster-slave appliers) only.
Once the prerequisites and configuring of the installation has been completed, the software can be installed:
```
shell> ./tools/tpm install
```

If the installation process fails, check the output of the /tmp/tungsten-configure.log file for more information about the root cause.

Once the service has been installed it can be monitored using the trepctl command. See Section 4.6.4.4, “Management and Monitoring of Hadoop Deployments” for more information. If there are problems during installation, see Section 4.6.4.5, “Troubleshooting Hadoop Replication”.

4.6.4.3. Accessing Generated Tables in Hive

If not already completed, the schema generation process described in Section 4.6.2.2, “Schema Generation” should have been followed. This creates the necessary Hive schema and staging schema definitions.

Once the tables have been created through ddlscan you can query the stage tables:

hive> select * from stage_xxx_movies_large limit 10;
OK
I	10	1	57475	All in the Family	1971	Archie Feels Left Out (#4.17)
I	10	2	57476	All in the Family	1971	Archie Finds a Friend (#6.18)
I	10	3	57477	All in the Family	1971	Archie Gets the Business: Part 1 (#8.1)
I	10	4	57478	All in the Family	1971	Archie Gets the Business: Part 2 (#8.2)
I	10	5	57479	All in the Family	1971	Archie Gives Blood (#1.4)
I	10	6	57480	All in the Family	1971	Archie Goes Too Far (#3.17)
I	10	7	57481	All in the Family	1971	Archie in the Cellar (#4.10)
I	10	8	57482	All in the Family	1971	Archie in the Hospital (#3.15)
I	10	9	57483	All in the Family	1971	Archie in the Lock-Up (#2.3)
I	10	10	57484	All in the Family	1971	Archie Is Branded (#3.20)

Prev	Up	Next
4.6.3. Replicating into Kerberos Secured HDFS	^Level	4.6.4.2. Generating Materialized Views

Continuent Documentation