5.2.1. Preparing for Elasticsearch Replication

Configure the source and target hosts following the prerequisites outlined in Appendix B, Prerequisites then follow the appropriate steps for the required extractor topology outlined in Chapter 3, Deploying MySQL Extractors for MySQL extraction, or Chapter 4, Deploying Oracle Extractors for Oracle CDC extraction.

During the replication process, data is exchanged from the MySQL database/table/row structure into corresponding Elasticsearch structures, as follows

MySQL Elasticsearch
Database Index
Table Type
Row Document

In general, it is easier to understand that a row within the MySQL table is converted into a single document on the Elasticsearch side, and automatically added to index "schema name" with a type of "table name".

For example, the following row within MySQL:

mysql> select * from messages where id = 99999 \G
*************************** 1. row ***************************
 id: 99999
 msg: Hello Elasticsearch
 1 row in set (0.00 sec)

Is replicated into the Elasticsearch document:

{
  "_index" : "test",
  "_type" : "messages",
  "_id" : "99999",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "msg" : "Hello Elasticsearch",
    "committime" : "2017-06-23 19:02:22.0",
    "id" : "99999",
    "source_table" : "messages",
    "source_schema" : "test"
  }
}

When preparing the hosts you must be aware of this translation of the different structures, as it will have an effect on the way the information is replicated from MySQL to Elasticsearch.

MySQL Host

The data replicated from MySQL can be any data, although there are some known limitations and assumptions made on the way the information is transferred.

When configuring the extractor database and host, ensure heterogenous specific prerequisities have been included, see Section B.4.4, “MySQL Configuration for Heterogeneous Deployments”

For the best results when replicating, be aware of the following issues and limitations:

  • Use primary keys on all tables. The use of primary keys will improve the lookup of information within Elasticsearch when rows are updated. Without a primary key on a table a full table scan is performed, which can affect performance.

  • MySQL TEXT columns are correctly replicated, but cannot be used as keys.

  • MySQL BLOB columns are converted to text using the configured character type. Depending on the data that is being stored within the BLOB, the data may need to be custom converted. A filter can be written to convert and reformat the content as required.

Elasticsearch Host

  • Enable networking; by default Elasticsearch is configured to listen only on the localhost (127.0.0.1) IP address. The configuration should be changed to the IP address of your host.

    For example, edit the config/elasticsearch.yml file and add the following two lines:

    network.host: _site_
    discovery.type: single-node 
  • You may then start Elasticsearch as a daemon:

    bin/elasticsearch -d -p elasticsearch.pid
  • Check that Elasticsearch is up and running:

    curl http://$HOSTNAME:9200/
    
    {
      "name" : "i7DyHKt",
      "cluster_name" : "elasticsearch",
      "cluster_uuid" : "NKL8ZX0RSjyC1HQNAPxJ9w",
      "version" : {
        "number" : "5.4.1",
        "build_hash" : "2cfe0df",
        "build_date" : "2017-05-29T16:05:51.443Z",
        "build_snapshot" : false,
        "lucene_version" : "6.5.1"
      },
      "tagline" : "You Know, for Search"
    }