4.6.4. Install Hadoop Replication

4.6.4. Install Hadoop Replication
Prev	^Up	4.6. Deploying the Hadoop Applier	Next

4.6.4. Install Hadoop Replication

4.6.4.1. Applier Replicator Service
4.6.4.2. Generating Materialized Views
4.6.4.3. Accessing Generated Tables in Hive
4.6.4.4. Management and Monitoring of Hadoop Deployments
4.6.4.5. Troubleshooting Hadoop Replication

Installation of the Hadoop replication consists of multiple stages:

Configure the source and target hosts following the prerequisites outlined in Appendix B, Prerequisites then follow the appropriate steps for the required extractor topology outlined in Chapter 3, Deploying MySQL Extractors.
Install the Applier replicator which will apply information to the target Hadoop environment.
Once the installation of the Extractor and Applier components have been completed, materialization of tables and views can be performed.

4.6.4.1. Applier Replicator Service

The applier replicator service reads information from the THL of the source and applies this to a local instance of Hadoop.

Important

Installation must take place on a node within the Hadoop cluster. Writing to a remote HDFS filesystem is not currently supported.

Before installing the applier, the following additions need adding to the extractor configuration. Apply the following parameters, update the extractor and then install the applier
- For Staging Install:
  copy
```
shell> cd tungsten-replicator-7.0.3-141
shell> ./tools/tpm configure alpha \
  --enable-batch-service=true
shell> ./tools/tpm update
```
- For INI Installs: Add the following the /etc/tungsten/tungsten.ini
  copy
```
[alpha]
...Existing Replicator Config...
enable-batch-service=true


shell> tpm update
```
The applier can now be configured.
Unpack the Tungsten Replicator distribution in staging directory:
copy
```
shell> tar zxf tungsten-replicator-7.0.3-141.tar.gz
```
Change into the staging directory:
copy
```
shell> cd tungsten-replicator-7.0.3-141
```
Configure the installation using tpm:
Show Staging
Show INI
copy
```
shell> ./tools/tpm configure defaults \
    --reset \
    --user=tungsten \
    --install-directory=/opt/continuent \
    --profile-script=~/.bash_profile \
    --skip-validation-check=HostsFileCheck \
    --skip-validation-check=InstallerMasterSlaveCheck \
    --skip-validation-check=DatasourceDBPort \
    --skip-validation-check=DirectDatasourceDBPort \
    --skip-validation-check=ReplicationServicePipelines \
    --rest-api-admin-user=apiuser \
    --rest-api-admin-pass=secret

shell> ./tools/tpm configure alpha \
    --master=host1 \
    --members=host2 \
    --property=replicator.datasource.global.csvType=hive \
    --property=replicator.stage.q-to-dbms.blockCommitInterval=1s \
    --property=replicator.stage.q-to-dbms.blockCommitRowCount=1000 \
    --replication-password=secret \
    --replication-user=tungsten \
    --batch-enabled=true \
    --batch-load-language=js  \
    --batch-load-template=hadoop \
    --datasource-type=file
```
copy
```
shell> vi /etc/tungsten/tungsten.ini
```
copy
```
[defaults]
user=tungsten
install-directory=/opt/continuent
profile-script=~/.bash_profile
skip-validation-check=HostsFileCheck
skip-validation-check=InstallerMasterSlaveCheck
skip-validation-check=DatasourceDBPort
skip-validation-check=DirectDatasourceDBPort
skip-validation-check=ReplicationServicePipelines
rest-api-admin-user=apiuser
rest-api-admin-pass=secret

[alpha]
master=host1
members=host2
property=replicator.datasource.global.csvType=hive
property=replicator.stage.q-to-dbms.blockCommitInterval=1s
property=replicator.stage.q-to-dbms.blockCommitRowCount=1000
replication-password=secret
replication-user=tungsten
batch-enabled=true
batch-load-language=js 
batch-load-template=hadoop
datasource-type=file
```
Configuration group defaults
The description of each of the options is shown below; click the icon to hide this detail:
Click the icon to show a detailed description of each argument.
- --reset
  reset
  For staging configurations, deletes all pre-existing configuration information between updating with the new configuration values.
- --user=tungsten
  user=tungsten
  System User
- --install-directory=/opt/continuent
  install-directory=/opt/continuent
  Path to the directory where the active deployment will be installed. The configured directory will contain the software, THL and relay log information unless configured otherwise.
- --profile-script=~/.bash_profile
  profile-script=~/.bash_profile
  Append commands to include env.sh in this profile script
- --skip-validation-check=HostsFileCheck
  skip-validation-check=HostsFileCheck
  The --skip-validation-check disables a given validation check. If any validation check fails, the installation, validation or configuration will automatically stop.
  Warning
  Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
  You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
```
...
ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » 
 uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck)
...
```
  The check in this case is MySQLDefaultTableTypeCheck, and could be ignored using --skip-validation-check=MySQLDefaultTableTypeCheck.
  Setting both --skip-validation-check and --enable-validation-check is equivalent to explicitly disabling the specified check.
- --skip-validation-check=InstallerMasterSlaveCheck
  skip-validation-check=InstallerMasterSlaveCheck
  The --skip-validation-check disables a given validation check. If any validation check fails, the installation, validation or configuration will automatically stop.
  Warning
  Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
  You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
```
...
ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » 
 uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck)
...
```
  The check in this case is MySQLDefaultTableTypeCheck, and could be ignored using --skip-validation-check=MySQLDefaultTableTypeCheck.
  Setting both --skip-validation-check and --enable-validation-check is equivalent to explicitly disabling the specified check.
- --skip-validation-check=DatasourceDBPort
  skip-validation-check=DatasourceDBPort
  The --skip-validation-check disables a given validation check. If any validation check fails, the installation, validation or configuration will automatically stop.
  Warning
  Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
  You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
```
...
ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » 
 uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck)
...
```
  The check in this case is MySQLDefaultTableTypeCheck, and could be ignored using --skip-validation-check=MySQLDefaultTableTypeCheck.
  Setting both --skip-validation-check and --enable-validation-check is equivalent to explicitly disabling the specified check.
- --skip-validation-check=DirectDatasourceDBPort
  skip-validation-check=DirectDatasourceDBPort
  The --skip-validation-check disables a given validation check. If any validation check fails, the installation, validation or configuration will automatically stop.
  Warning
  Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
  You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
```
...
ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » 
 uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck)
...
```
  The check in this case is MySQLDefaultTableTypeCheck, and could be ignored using --skip-validation-check=MySQLDefaultTableTypeCheck.
  Setting both --skip-validation-check and --enable-validation-check is equivalent to explicitly disabling the specified check.
- --skip-validation-check=ReplicationServicePipelines
  skip-validation-check=ReplicationServicePipelines
  The --skip-validation-check disables a given validation check. If any validation check fails, the installation, validation or configuration will automatically stop.
  Warning
  Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
  You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
```
...
ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » 
 uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck)
...
```
  The check in this case is MySQLDefaultTableTypeCheck, and could be ignored using --skip-validation-check=MySQLDefaultTableTypeCheck.
  Setting both --skip-validation-check and --enable-validation-check is equivalent to explicitly disabling the specified check.
- --rest-api-admin-user=apiuser
  rest-api-admin-user=apiuser
  Optional: Must be specified along with rest-api-admin-pass if you wish to access the full API features and use the Dashboard GUI for cluster installations.
- --rest-api-admin-pass=secret
  rest-api-admin-pass=secret
  Optional: Must be specified along with rest-api-admin-user if you wish to access the full API features.
Configuration group alpha
The description of each of the options is shown below; click the icon to hide this detail:
Click the icon to show a detailed description of each argument.
- --master=host1
  master=host1
  The hostname of the primary (extractor) within the current service.
- --members=host2
  members=host2
  Hostnames for the dataservice members
- --replication-password=secret
  replication-password=secret
  The password to be used when connecting to the database using the corresponding --replication-user.
- --replication-user=tungsten
  replication-user=tungsten
  For databases that required authentication, the username to use when connecting to the database using the corresponding connection method (native, JDBC, etc.).
- --batch-enabled=true
  batch-enabled=true
  Should the replicator service use a batch applier
- --batch-load-language=js
  batch-load-language=js
  Which script language to use for batch loading
- --batch-load-template=hadoop
  batch-load-template=hadoop
  Value for the loadBatchTemplate property
- --datasource-type=file
  datasource-type=file
  Database type
Note
If you plan to make full use of the REST API (which is enabled by default) you will need to also configure a username and password for API access. This must be done by specifying the following options in your configuration:
```
rest-api-admin-user=tungsten
rest-api-admin-pass=secret
```
Once the prerequisites and configuring of the installation has been completed, the software can be installed:
copy
```
shell> ./tools/tpm install
```

If the installation process fails, check the output of the /tmp/tungsten-configure.log file for more information about the root cause.

Once the service has been installed it can be monitored using the trepctl command. See Section 4.6.4.4, “Management and Monitoring of Hadoop Deployments” for more information. If there are problems during installation, see Section 4.6.4.5, “Troubleshooting Hadoop Replication”.

cd tungsten-replicator-7.0.3-141
./tools/tpm configure alpha \
  --enable-batch-service=true
./tools/tpm update

[alpha]
...Existing Replicator Config...
enable-batch-service=true

tpm update
tar zxf tungsten-replicator-7.0.3-141.tar.gz
cd tungsten-replicator-7.0.3-141
./tools/tpm configure defaults \
    --reset \
    --user=tungsten \
    --install-directory=/opt/continuent \
    --profile-script=~/.bash_profile \
    --skip-validation-check=HostsFileCheck \
    --skip-validation-check=InstallerMasterSlaveCheck \
    --skip-validation-check=DatasourceDBPort \
    --skip-validation-check=DirectDatasourceDBPort \
    --skip-validation-check=ReplicationServicePipelines \
    --rest-api-admin-user=apiuser \
    --rest-api-admin-pass=secret

./tools/tpm configure alpha \
    --master=host1 \
    --members=host2 \
    --property=replicator.datasource.global.csvType=hive \
    --property=replicator.stage.q-to-dbms.blockCommitInterval=1s \
    --property=replicator.stage.q-to-dbms.blockCommitRowCount=1000 \
    --replication-password=secret \
    --replication-user=tungsten \
    --batch-enabled=true \
    --batch-load-language=js  \
    --batch-load-template=hadoop \
    --datasource-type=file
vi /etc/tungsten/tungsten.ini
[defaults]
user=tungsten
install-directory=/opt/continuent
profile-script=~/.bash_profile
skip-validation-check=HostsFileCheck
skip-validation-check=InstallerMasterSlaveCheck
skip-validation-check=DatasourceDBPort
skip-validation-check=DirectDatasourceDBPort
skip-validation-check=ReplicationServicePipelines
rest-api-admin-user=apiuser
rest-api-admin-pass=secret

[alpha]
master=host1
members=host2
property=replicator.datasource.global.csvType=hive
property=replicator.stage.q-to-dbms.blockCommitInterval=1s
property=replicator.stage.q-to-dbms.blockCommitRowCount=1000
replication-password=secret
replication-user=tungsten
batch-enabled=true
batch-load-language=js 
batch-load-template=hadoop
datasource-type=file
./tools/tpm install

Show Copy-friendly Text

4.6.4.3. Accessing Generated Tables in Hive

If not already completed, the schema generation process described in Section 4.6.2.2, “Schema Generation” should have been followed. This creates the necessary Hive schema and staging schema definitions.

Once the tables have been created through ddlscan you can query the stage tables:

copy

hive> select * from stage_xxx_movies_large limit 10;
OK
I	10	1	57475	All in the Family	1971	Archie Feels Left Out (#4.17)
I	10	2	57476	All in the Family	1971	Archie Finds a Friend (#6.18)
I	10	3	57477	All in the Family	1971	Archie Gets the Business: Part 1 (#8.1)
I	10	4	57478	All in the Family	1971	Archie Gets the Business: Part 2 (#8.2)
I	10	5	57479	All in the Family	1971	Archie Gives Blood (#1.4)
I	10	6	57480	All in the Family	1971	Archie Goes Too Far (#3.17)
I	10	7	57481	All in the Family	1971	Archie in the Cellar (#4.10)
I	10	8	57482	All in the Family	1971	Archie in the Hospital (#3.15)
I	10	9	57483	All in the Family	1971	Archie in the Lock-Up (#2.3)
I	10	10	57484	All in the Family	1971	Archie Is Branded (#3.20)

select * from stage_xxx_movies_large limit 10;

Show Copy-friendly Text

cd tungsten-replicator-7.0.3-141
./tools/tpm configure alpha \
  --enable-batch-service=true
./tools/tpm update

[alpha]
...Existing Replicator Config...
enable-batch-service=true

tpm update
tar zxf tungsten-replicator-7.0.3-141.tar.gz
cd tungsten-replicator-7.0.3-141
./tools/tpm configure defaults \
    --reset \
    --user=tungsten \
    --install-directory=/opt/continuent \
    --profile-script=~/.bash_profile \
    --skip-validation-check=HostsFileCheck \
    --skip-validation-check=InstallerMasterSlaveCheck \
    --skip-validation-check=DatasourceDBPort \
    --skip-validation-check=DirectDatasourceDBPort \
    --skip-validation-check=ReplicationServicePipelines \
    --rest-api-admin-user=apiuser \
    --rest-api-admin-pass=secret

./tools/tpm configure alpha \
    --master=host1 \
    --members=host2 \
    --property=replicator.datasource.global.csvType=hive \
    --property=replicator.stage.q-to-dbms.blockCommitInterval=1s \
    --property=replicator.stage.q-to-dbms.blockCommitRowCount=1000 \
    --replication-password=secret \
    --replication-user=tungsten \
    --batch-enabled=true \
    --batch-load-language=js  \
    --batch-load-template=hadoop \
    --datasource-type=file
vi /etc/tungsten/tungsten.ini
[defaults]
user=tungsten
install-directory=/opt/continuent
profile-script=~/.bash_profile
skip-validation-check=HostsFileCheck
skip-validation-check=InstallerMasterSlaveCheck
skip-validation-check=DatasourceDBPort
skip-validation-check=DirectDatasourceDBPort
skip-validation-check=ReplicationServicePipelines
rest-api-admin-user=apiuser
rest-api-admin-pass=secret

[alpha]
master=host1
members=host2
property=replicator.datasource.global.csvType=hive
property=replicator.stage.q-to-dbms.blockCommitInterval=1s
property=replicator.stage.q-to-dbms.blockCommitRowCount=1000
replication-password=secret
replication-user=tungsten
batch-enabled=true
batch-load-language=js 
batch-load-template=hadoop
datasource-type=file
./tools/tpm install
select * from stage_xxx_movies_large limit 10;

Show Copy-friendly Text

Prev	Up	Next
4.6.3. Replicating into Kerberos Secured HDFS	^Level	4.6.4.2. Generating Materialized Views

Continuent Documentation