6.9. Deploying MySQL to Elasticsearch Replication

Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time. It is generally used as the underlying engine/technology that powers applications that have complex search features and requirements.

The deployment of Tungsten Replication to Elasticsearch service is slightly different. There are two parts to the process:

  • Service Alpha on the master extracts the information from the MySQL binary log into THL.

  • Service Alpha on the slave reads the information from the remote replicator as THL, and applies that to Elasticsearch.

Figure 6.10. Topologies: MySQL to Elasticsearch

Topologies: MySQL to Elasticsearch

Basic reformatting and restructuring of the data is performed by translating the structure extracted from one database in row format and restructuring for application in a different format. A filter, the ColumnNameFilter, is used to extract the column names against the extracted row-based information.

With the Elasticsearch applier, information is extracted from the source database using the row-format, column names and primary keys are identified, and translated to the format supported by Elasticsearch.

The transfer operates as follows:

  1. Data is extracted from MySQL using the standard extractor, reading the row change data from the binlog.

  2. The Section 11.4.8, “ColumnName Filter” filter is used to extract column name information from the database. This enables the row-change information to be tagged with the corresponding column information. The data changes, and corresponding row names, are stored in the THL.

    The Section 11.4.30, “PrimaryKey Filter” filter is used to add primary key information to row-based replication data. This is required by heterogeneous environments to ensure that the primary key is identified when updating or deleting tables. Without this filter in place, performing update or delete operations require a full table scan to be performed on the target dataserver to determine the record that must be updated.

  3. The THL information is then applied to Elasticsearch using the Elasticsearch applier.

The two replication services can operate on the same machine, or they can be installed on two different machines.