Installation of the Hadoop replication consists of multiple stages:
Configure the source and target hosts following the prerequisites outlined in Appendix B, Prerequisites then follow the appropriate steps for the required extractor topology outlined in Chapter 3, Deploying MySQL Extractors.
Install the Applier replicator which will apply information to the target Hadoop environment.
Once the installation of the Extractor and Applier components have been completed, materialization of tables and views can be performed.
The applier replicator service reads information from the THL of the source and applies this to a local instance of Hadoop.
Installation must take place on a node within the Hadoop cluster. Writing to a remote HDFS filesystem is not currently supported.
Before installing the applier, the following additions need adding to the extractor configuration. Apply the following parameters, update the extractor and then install the applier
For Staging Install:
shell>cd tungsten-replicator-7.0.3-141
shell>./tools/tpm configure alpha \ --enable-batch-service=true
shell>./tools/tpm update
For INI Installs:
Add the following the /etc/tungsten/tungsten.ini
[alpha] ...Existing Replicator Config... enable-batch-service=true
shell>tpm update
The applier can now be configured.
Unpack the Tungsten Replicator distribution in staging directory:
shell> tar zxf tungsten-replicator-7.0.3-141.tar.gz
Change into the staging directory:
shell> cd tungsten-replicator-7.0.3-141
Configure the installation using tpm:
shell>./tools/tpm configure defaults \ --reset \ --user=tungsten \ --install-directory=/opt/continuent \ --profile-script=~/.bash_profile \ --skip-validation-check=HostsFileCheck \ --skip-validation-check=InstallerMasterSlaveCheck \ --skip-validation-check=DatasourceDBPort \ --skip-validation-check=DirectDatasourceDBPort \ --skip-validation-check=ReplicationServicePipelines \ --rest-api-admin-user=apiuser \ --rest-api-admin-pass=secret
shell>./tools/tpm configure alpha \ --master=host1 \ --members=host2 \ --property=replicator.datasource.global.csvType=hive \ --property=replicator.stage.q-to-dbms.blockCommitInterval=1s \ --property=replicator.stage.q-to-dbms.blockCommitRowCount=1000 \ --replication-password=secret \ --replication-user=tungsten \ --batch-enabled=true \ --batch-load-language=js \ --batch-load-template=hadoop \ --datasource-type=file
shell> vi /etc/tungsten/tungsten.ini
[defaults] user=tungsten install-directory=/opt/continuent profile-script=~/.bash_profile skip-validation-check=HostsFileCheck skip-validation-check=InstallerMasterSlaveCheck skip-validation-check=DatasourceDBPort skip-validation-check=DirectDatasourceDBPort skip-validation-check=ReplicationServicePipelines rest-api-admin-user=apiuser rest-api-admin-pass=secret
[alpha] master=host1 members=host2 property=replicator.datasource.global.csvType=hive property=replicator.stage.q-to-dbms.blockCommitInterval=1s property=replicator.stage.q-to-dbms.blockCommitRowCount=1000 replication-password=secret replication-user=tungsten batch-enabled=true batch-load-language=js batch-load-template=hadoop datasource-type=file
Configuration group defaults
The description of each of the options is shown below; click the icon to hide this detail:
For staging configurations, deletes all pre-existing configuration information between updating with the new configuration values.
System User
--install-directory=/opt/continuent
install-directory=/opt/continuent
Path to the directory where the active deployment will be installed. The configured directory will contain the software, THL and relay log information unless configured otherwise.
--profile-script=~/.bash_profile
profile-script=~/.bash_profile
Append commands to include env.sh in this profile script
--skip-validation-check=HostsFileCheck
skip-validation-check=HostsFileCheck
The --skip-validation-check
disables a given validation check. If any validation check
fails, the installation, validation or configuration will
automatically stop.
Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
... ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck) ...
The check in this case is
MySQLDefaultTableTypeCheck
,
and could be ignored using
--skip-validation-check=MySQLDefaultTableTypeCheck
.
Setting both
--skip-validation-check
and
--enable-validation-check
is
equivalent to explicitly disabling the specified check.
--skip-validation-check=InstallerMasterSlaveCheck
skip-validation-check=InstallerMasterSlaveCheck
The --skip-validation-check
disables a given validation check. If any validation check
fails, the installation, validation or configuration will
automatically stop.
Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
... ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck) ...
The check in this case is
MySQLDefaultTableTypeCheck
,
and could be ignored using
--skip-validation-check=MySQLDefaultTableTypeCheck
.
Setting both
--skip-validation-check
and
--enable-validation-check
is
equivalent to explicitly disabling the specified check.
--skip-validation-check=DatasourceDBPort
skip-validation-check=DatasourceDBPort
The --skip-validation-check
disables a given validation check. If any validation check
fails, the installation, validation or configuration will
automatically stop.
Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
... ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck) ...
The check in this case is
MySQLDefaultTableTypeCheck
,
and could be ignored using
--skip-validation-check=MySQLDefaultTableTypeCheck
.
Setting both
--skip-validation-check
and
--enable-validation-check
is
equivalent to explicitly disabling the specified check.
--skip-validation-check=DirectDatasourceDBPort
skip-validation-check=DirectDatasourceDBPort
The --skip-validation-check
disables a given validation check. If any validation check
fails, the installation, validation or configuration will
automatically stop.
Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
... ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck) ...
The check in this case is
MySQLDefaultTableTypeCheck
,
and could be ignored using
--skip-validation-check=MySQLDefaultTableTypeCheck
.
Setting both
--skip-validation-check
and
--enable-validation-check
is
equivalent to explicitly disabling the specified check.
--skip-validation-check=ReplicationServicePipelines
skip-validation-check=ReplicationServicePipelines
The --skip-validation-check
disables a given validation check. If any validation check
fails, the installation, validation or configuration will
automatically stop.
Using this option enables you to bypass the specified check, although skipping a check may lead to an invalid or non-working configuration.
You can identify a given check if an error or warning has been raised during configuration. For example, the default table type check:
... ERROR >> centos >> The datasource root@centos:3306 (WITH PASSWORD) » uses MyISAM as the default storage engine (MySQLDefaultTableTypeCheck) ...
The check in this case is
MySQLDefaultTableTypeCheck
,
and could be ignored using
--skip-validation-check=MySQLDefaultTableTypeCheck
.
Setting both
--skip-validation-check
and
--enable-validation-check
is
equivalent to explicitly disabling the specified check.
Configuration group alpha
The description of each of the options is shown below; click the icon to hide this detail:
The hostname of the primary (extractor) within the current service.
Hostnames for the dataservice members
The password to be used when connecting to the database using
the corresponding
--replication-user
.
For databases that required authentication, the username to use when connecting to the database using the corresponding connection method (native, JDBC, etc.).
Should the replicator service use a batch applier
Which script language to use for batch loading
Value for the loadBatchTemplate property
Database type
If you plan to make full use of the REST API (which is enabled by default) you will need to also configure a username and password for API access. This must be done by specifying the following options in your configuration:
rest-api-admin-user=tungsten rest-api-admin-pass=secret
Once the prerequisites and configuring of the installation has been completed, the software can be installed:
shell> ./tools/tpm install
If the installation process fails, check the output of the
/tmp/tungsten-configure.log
file
for more information about the root cause.
Once the service has been installed it can be monitored using the trepctl command. See Section 4.6.4.4, “Management and Monitoring of Hadoop Deployments” for more information. If there are problems during installation, see Section 4.6.4.5, “Troubleshooting Hadoop Replication”.
If not already completed, the schema generation process described in Section 4.6.2.2, “Schema Generation” should have been followed. This creates the necessary Hive schema and staging schema definitions.
Once the tables have been created through ddlscan you can query the stage tables:
hive> select * from stage_xxx_movies_large limit 10;
OK
I 10 1 57475 All in the Family 1971 Archie Feels Left Out (#4.17)
I 10 2 57476 All in the Family 1971 Archie Finds a Friend (#6.18)
I 10 3 57477 All in the Family 1971 Archie Gets the Business: Part 1 (#8.1)
I 10 4 57478 All in the Family 1971 Archie Gets the Business: Part 2 (#8.2)
I 10 5 57479 All in the Family 1971 Archie Gives Blood (#1.4)
I 10 6 57480 All in the Family 1971 Archie Goes Too Far (#3.17)
I 10 7 57481 All in the Family 1971 Archie in the Cellar (#4.10)
I 10 8 57482 All in the Family 1971 Archie in the Hospital (#3.15)
I 10 9 57483 All in the Family 1971 Archie in the Lock-Up (#2.3)
I 10 10 57484 All in the Family 1971 Archie Is Branded (#3.20)