6.9.2. Installing Elasticsearch Replication

Installation of the Elasticsearch replication requires special configuration of the master and slave hosts so that each is configured for the correct datasource type.

There are two methods for installation:

  • Staging method, using command-line options to the set the configuration.

  • INI method, using a fixed file of options to the set the configuration.

Either method can be used for the installation. For more information, see Section 10.1, “Comparing Staging and INI tpm Methods”.

6.9.2.1. Installing Elasticsearch Replication (Staging Method)

Configuration of the replication deployment to Elasticsearch can be made using a single tpm staging-based deployment. However, because the configuration must be different for the master and slave hosts, the configuration must be performed in multiple steps.

  1. Unpack the Tungsten Replicator distribution in staging directory:

    shell> tar zxf tungsten-replicator-5.2.1-263.tar.gz
  2. Change into the staging directory:

    shell> cd tungsten-replicator-5.2.1-263

    If you are replicating out of an existing MySQL cluster, you will need to ensure that MySQL strings are converted into a native format instead of the internal format used by the replicator. A filter, convertstringfrommysql handles this conversion. Prepare the required filter configuration file as follows on the Elasticsearch applier slave host(s) only:

    shell> mkdir -p /opt/replicator/share/
    shell> cp tungsten-replicator/support/filters-config/convertstringfrommysql.json /opt/replicator/share/
  3. Configure the default parameters for the replicator service:

    shell> ./tools/tpm configure defaults \
    	--install-directory=/opt/replicator \
    	--disable-relay-logs=true \
    	--mysql-allow-intensive-checks \
    	"--profile-script=~/.bash_profile" \
    	--skip-validation-check=HostsFileCheck \
    	--skip-validation-check=MySQLUnsupportedDataTypesCheck \
    	--start-and-report \
    	--user=tungsten
    

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

  4. Configure the parameters for the master replicator service that will be installed on host mysqlhost:

    shell> ./tools/tpm configure alpha \
    	--hosts=mysqlhost \
    	--topology=master-slave \
    	--master=mysqlhost \
    	--members=mysqlhost \
    	--replication-user=tungsten \
    	--replication-password=secret \
    	--replication-port=3306 \
    	--enable-heterogeneous-service=true
    

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    If your MySQL source is a Cluster, add the following to your current cluster configuration instead:

    shell> ./tools/tpm configure alpha \
    	--repl-svc-extractor-filters=colnames,pkey \
    	--property=replicator.filter.pkey.addColumnsToDeletes=true \
    	--property=replicator.filter.pkey.addPkeyToInserts=true
    
    

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

  5. Configure the parameters for the slave replicator service that will be installed on host elastichost:

    shell> ./tools/tpm configure alpha \
    	--hosts=elastichost \
    	--topology=master-slave \
    	--master=mysqlhost \
    	--members=mysqlhost,elastichost \
    	--datasource-type=elasticsearch \
    	--property=replicator.stage.q-to-dbms.filters=casetransform \
    	--property=replicator.filter.casetransform.to_upper_case=false \ 
    	--replication-host=elastichost \
    	--replication-password=null \
    	--replication-port=9200 \
    	--replication-user=root
    

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    If your MySQL source is a Cluster, use this configuration instead:

    shell> ./tools/tpm configure beta \
    	--hosts=elastichost \
    	--topology=cluster-slave \
    	--relay-source=alpha \
    	--relay=elastichost \
    	--datasource-type=elasticsearch \
    	--property=replicator.stage.remote-to-thl.filters=convertstringfrommysql \
    	--property=replicator.filter.convertstringfrommysql.definitionsFile=/opt/replicator/share/convertstringfrommysql.json \
    	--property=replicator.stage.q-to-dbms.filters=casetransform \
    	--property=replicator.filter.casetransform.to_upper_case=false \ 
    	--replication-host=elastichost \
    	--replication-password=null \
    	--replication-port=9200 \
    	--replication-user=root
    
    

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

  6. Install the services:

    shell> ./tools/tpm install

    If your MySQL source is an existing Cluster, update the configuration instead:

    shell> ./tools/tpm update

If you encounter problems during the installation, check the output of the /tmp/tungsten-configure.log file for more information about the root cause.

Once the service is configured and running, the service can be monitored as normal using the trepctl command. See Section 6.9.3, “Management and Monitoring of Elasticsearch Deployments” for more information.

6.9.2.2. Installing Elasticsearch Replication (INI Method)

INI configuration of the replication deployment to Elasticsearch is done by specifying a different INI configuration for the master and slave hosts.

  1. Configure the ini parameters on the host that will run the master replicator service:

    
    [defaults]
    install-directory=/opt/replicator
    disable-relay-logs=true
    mysql-allow-intensive-checks=true
    profile-script=~/.bash_profile
    skip-validation-check=HostsFileCheck
    skip-validation-check=MySQLUnsupportedDataTypesCheck
    start-and-report=true
    user=tungsten
    
    [alpha]
    topology=master-slave
    master=mysqlhost
    members=mysqlhost,elastichost
    replication-user=tungsten
    replication-password=secret
    replication-port=3306
    enable-heterogeneous-master=true
    
    

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    If your MySQL source is a Cluster, add the following to your current cluster configuration instead:

    [alpha]
    ...existing cluster configs here...
    repl-svc-extractor-filters=colnames,pkey
    property=replicator.filter.pkey.addColumnsToDeletes=true
    property=replicator.filter.pkey.addPkeyToInserts=true
    
    

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

  2. Configure the ini parameters on the slave host that will apply the events to Elasticsearch:

    
    [defaults]
    install-directory=/opt/replicator
    disable-relay-logs=true
    mysql-allow-intensive-checks=true
    profile-script=~/.bash_profile
    skip-validation-check=HostsFileCheck
    skip-validation-check=MySQLUnsupportedDataTypesCheck
    start-and-report=true
    user=tungsten
    
    [alpha]
    topology=master-slave
    master=mysqlhost
    members=mysqlhost,elastichost
    datasource-type=elasticsearch
    property=replicator.stage.remote-to-thl.filters=convertstringfrommysql
    property=replicator.filter.convertstringfrommysql.definitionsFile=/opt/replicator/share/convertstringfrommysql.json
    property=replicator.stage.q-to-dbms.filters=casetransform
    property=replicator.filter.casetransform.to_upper_case=false
    replication-host=elastichost
    replication-password=null
    replication-port=9200
    replication-user=root
    
    

    The above configures the Elasticsearch slave to accept replication data from the master.

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

    If your MySQL source is a Cluster, use this configuration instead:

    
    [defaults]
    disable-relay-logs=true
    disable-security-controls=true
    install-directory=/opt/replicator
    mysql-allow-intensive-checks=true
    profile-script=~/.bash_profile
    skip-validation-check=HostsFileCheck
    skip-validation-check=MySQLUnsupportedDataTypesCheck
    start-and-report=true
    user=tungsten
    
    [alpha]
    ...existing cluster configs here...
    topology=clustered
    
    [beta]
    topology=cluster-slave
    relay=elastichost
    relay-source=alpha
    datasource-type=elasticsearch
    property=replicator.stage.remote-to-thl.filters=convertstringfrommysql
    property=replicator.filter.convertstringfrommysql.definitionsFile=/opt/replicator/share/convertstringfrommysql.json
    property=replicator.stage.q-to-dbms.filters=casetransform
    property=replicator.filter.casetransform.to_upper_case=false
    replication-host=elastichost
    replication-password=null
    replication-port=9200
    replication-user=root
    
    

    The above configures the Elasticsearch slave to accept replication data from the cluster nodes.

    The description of each of the options is shown below; click the icon to hide this detail:

    Click the icon to show a detailed description of each argument.

  3. Using the INI method requires that the following steps be performed on EVERY node:

    1. Unpack the Tungsten Replicator distribution in the staging directory:

      shell> tar zxf tungsten-replicator-5.2.1-263.tar.gz
    2. Change into the staging directory:

      shell> cd tungsten-replicator-5.2.1-263
    3. Install the services:

      shell> ./tools/tpm install

If you encounter problems during the installation, check the output of the /tmp/tungsten-configure.log file for more information about the root cause.

Once the service is configured and running, the service can be monitored as normal using the trepctl command. See Section 6.9.3, “Management and Monitoring of Elasticsearch Deployments” for more information.

6.9.2.3. Optional Configuration Parameters for Elasticsearch

A number of optional, configurable, properties are available that control how Tungsten Replicator applies and populates information when the data is written into Elasticsearch. The following properties can by set during configuration using --property=PROPERTYNAME=value:

Table 6.3. Optional Elasticsearch Applier Properties

OptionDescription
replicator.applier.dbms.docIdFormatDetermines the format of the document ID used to store the information with Elasticsearch
replicator.applier.dbms.embedCommitTimeSets whether the commit time for the source row is embedded into the document
replicator.applier.dbms.embedSchemaTableEmbed the source schema name and table name in the stored document
replicator.applier.dbms.ignoreDeleteErrorsWhether a failure to delete an entry should raise an error
replicator.applier.dbms.ignoreUpdateErrorsWhether a failure to update an existing entry should raise an error
replicator.applier.dbms.indexNameThe string to be used as the index name when the incoming schema name is not used
replicator.applier.dbms.selfGeneratedIdDetermines whether to use a self generated document id or one derived from the primary key information
replicator.applier.dbms.typeNameThe string to use as the index type name when the incoming table is not used
replicator.applier.dbms.useSchemaAsIndexSpecifies whether the incoming schema name for the data should be used as the Index Name
replicator.applier.dbms.useTableAsTypeSpecifies whether the incoming table name for the data should be used as the Index Type

replicator.applier.dbms.docIdFormat

Optionreplicator.applier.dbms.docIdFormat
DescriptionDetermines the format of the document ID used to store the information with Elasticsearch
Value Typestring
Defaultpkey
Valid ValuespkeyCombine the primary key column values into a single string
 pkeyusCombine the primary key column values into a single string joined by an underscore character
 tspkeyCombine the schema name, table name, and primary key column values into a single string joined by an underscore character
 tspkeyusCombine the schema name, table name, and primary key column values into a single string

Determines the format of the document ID used to store the information into Elasticsearch. For example, when configured to use tspkeyus, then the format of the document ID will consist of the schemaname, table name and primary key column information separated by underscores, SCHEMANAME_TABLENAME_234.

replicator.applier.dbms.embedCommitTime

Optionreplicator.applier.dbms.embedCommitTime
DescriptionSets whether the commit time for the source row is embedded into the document
Value Typeboolean
Defaulttrue
Valid ValuesfalseDo not embed the source database commit time
 trueEmbed the source database commit time into the stored document

Embeds the commit time of the source database row into the document information:

{
	"_index" : "test",
	"_type" : "messages",
	"_id" : "99999",
	"_version" : 1,
	"found" : true,
	"_source" : {
		"msg" : "Hello Elasticsearch",
		"committime" : "2017-06-06 19:09:20.0",
		"id" : "99999",
		"source_table" : "messages",
		"source_schema" : "test"
	}
}

replicator.applier.dbms.embedSchemaTable

Optionreplicator.applier.dbms.embedSchemaTable
DescriptionEmbed the source schema name and table name in the stored document
Value Typeboolean
Defaulttrue
Valid ValuesfalseDo not embed the schema or database name in the document
 trueEmbed the source schema name and database name into the stored document

If enabled, the documented stored into Elasticsearch will include the source schema and database name. This can be used to identify the source of the information if the schema and table name is not being used for the index and type names (see replicator.applier.dbms.useSchemaAsIndex and replicator.applier.dbms.useTableAsType).

{
	"_index" : "test",
	"_type" : "messages",
	"_id" : "99999",
	"_version" : 1,
	"found" : true,
	"_source" : {
		"msg" : "Hello Elasticsearch",
		"committime" : "2017-06-06 19:09:20.0",
		"id" : "99999",
		"source_table" : "messages",
		"source_schema" : "test"
	}
}

replicator.applier.dbms.ignoreDeleteErrors

Optionreplicator.applier.dbms.ignoreDeleteErrors
DescriptionWhether a failure to delete an entry should raise an error
Value Typeboolean
Defaultfalse
Valid ValuesfalseRaise a failure when deleting records from Elasticsearch fails
 trueIgnore errors when deleting records from Elasticsearch

The replicator.applier.dbms.ignoreDeleteErrors determines whether the replicator will fail when trying to delete an entry in the Elasticsearch index entry. This option can be useful when deleting data in an Elasticsearch target and data does not need to guarantee to be deleted.

replicator.applier.dbms.ignoreUpdateErrors

Optionreplicator.applier.dbms.ignoreUpdateErrors
DescriptionWhether a failure to update an existing entry should raise an error
Value Typeboolean
Defaultfalse
Valid ValuesfalseRaise a failure when updating records from Elasticsearch fails
 trueIgnore errors when updating records from Elasticsearch

The replicator.applier.dbms.ignoreUpdateErrors determines whether the replicator will fail when trying to update an entry in the Elasticsearch index entry. This option can be useful when updating data in an Elasticsearch target and data does not need to guarantee to be deleted.

replicator.applier.dbms.indexName

Optionreplicator.applier.dbms.indexName
DescriptionThe string to be used as the index name when the incoming schema name is not used
Value Typestring

replicator.applier.dbms.selfGeneratedId

Optionreplicator.applier.dbms.selfGeneratedId
DescriptionDetermines whether to use a self generated document id or one derived from the primary key information
Value Typeboolean
Defaultfalse
Valid ValuesfalseUse the primary key column information as the document ID
 trueGet Elasticsearch to self generate an ID

replicator.applier.dbms.typeName

Optionreplicator.applier.dbms.typeName
DescriptionThe string to use as the index type name when the incoming table is not used
Value Typestring

Specifies the type name to be used when replicator.applier.dbms.useTableAsType has been set to false. The value will be used as the index type HTTP://ELASTIC/INDEXNAME/INDEXTYPE.

replicator.applier.dbms.useSchemaAsIndex

Optionreplicator.applier.dbms.useSchemaAsIndex
DescriptionSpecifies whether the incoming schema name for the data should be used as the Index Name
Value Typeboolean
Defaulttrue
Valid ValuesfalseDo not use the incoming schema name as the Elasticsearch index name
 trueUse the incoming schema name as the Elasticsearch index name

Changes whether the string used for the index name within the URL (HTTP://ELASTIC/INDEXNAME/INDEXTYPE). If set to false, the replicator.applier.dbms.typeName must be used to specify the index type.

replicator.applier.dbms.useTableAsType

Optionreplicator.applier.dbms.useTableAsType
DescriptionSpecifies whether the incoming table name for the data should be used as the Index Type
Value Typeboolean
Defaulttrue
Valid ValuesfalseDo not use the incoming table name as the Elasticsearch index type
 trueUse the incoming table name as the Elasticsearch index type

Changes whether the string used for the index type within the URL (HTTP://ELASTIC/INDEXNAME/INDEXTYPE). If set to false, the replicator.applier.dbms.typeName must be used to specify the index type.