B.4. MySQL Database Setup

For replication between MySQL hosts, you must configure each MySQL database server to support the required user names and core MySQL configuration.

Important

For MySQL extraction, Tungsten Cluster must have write access to the database so that status and progress information can be recorded correctly.

Note

Native MySQL replication should not be running when you install Tungsten Cluster™. The replication service will be completely handled by Tungsten Cluster™, and the normal replication, management and monitoring techniques will not provide you with the information you need.

B.4.1. MySQL Version Support

For a full list of MySQL Versions supported, see Table B.2, “MySQL/Tungsten Version Support”

B.4.2. MySQL Configuration

Each MySQL Server should be configured identically within the system. Although binary logging must be enabled on each host, replication should not be configured, since Tungsten Replicator will be handling that process.

The configured tungsten user must be able to read the MySQL configuration file (for installation) and the binary logs. Either the tungsten user should be a member of the appropriate group (i.e. mysql), or the permissions altered accordingly.

Important

Parsing of mysqld_multi configuration files is not currently supported. To use a mysqld_multi installation, copy the relevant portion of the configuration file to a separate file to be used during installation.

To setup your MySQL servers, you need to do the following:

  • Configure your my.cnf settings. The following changes should be made to the [mysqld] section of your my.cnf file:

    • By default, MySQL is configured only to listen on the localhost address (127.0.0.1). The bind-address parameter should be checked to ensure that it is either set to a valid value, or commented to allow listening on all available network interfaces:

      #bind-address = 127.0.0.1
    • Specify the server id

      Each server must have a unique server id:

      server-id = 1

      The best practice is for all servers to have a unique ID across all clusters. For example, use a numbering scheme like 0101, 0102, 0201, 0201, where the leading two digits are the cluster number and the last two digits are the node number, which allows for 99 participating clusters with 99 nodes each.

    • Ensure that the maximum number of open files matches the configuration of the database user. This was configured earlier at 65535 files.

      open_files_limit = 65535
    • Enable binary logs

      Tungsten Replicator operates by reading the binary logs on each machine, so logging must be enabled:

      log-bin = mysql-bin
    • Set the sync_binlog parameter to 1 (one).

      Note

      In MySQL 5.7, the default value is 1.

      The MySQL sync_binlog parameter sets the frequency at which the binary log is flushed to disk. A value of zero indicates that the binary log should not be synchronized to disk, which implies that only standard operating system flushing of writes will occur. A value greater than one configures the binary log to be flushed only after sync_binlog events have been written. This can introduce a delay into writing information to the binary log, and therefore replication, but also opens the system to potential data loss if the binary log has not been flushed when a fatal system error occurs.

      Setting a value of value 1 (one) will synchronize the binary log on disk after each event has been written.

      sync_binlog = 1
    • Increase MySQL protocol packet sizes

      The replicator can apply statements up to the maximum size of a single transaction, so the maximum allowed protocol packet size must be increase to support this:

      max_allowed_packet = 52m
    • Configure InnoDB as the default storage engine

      Tungsten Cluster needs to use a transaction safe storage engine to ensure the validity of the database. The InnoDB storage engine also provides automatic recovery in the event of a failure. Using MyISAM can lead to table corruption, and in the event of a switchover or failure, and inconsistent state of the database, making it difficult to recover or restart replication effectively.

      InnoDB should therefore be the default storage engine for all tables, and any existing tables should be converted to InnoDB before deploying Tungsten Cluster.

      default-storage-engine = InnoDB
    • Configure InnoDB Settings

      Tungsten Replicator creates tables and must use InnoDB tables to store the status information for replication configuration and application:

      The MySQL option innodb_flush_log_at_trx_commit configures how InnoDB writes and confirms writes to disk during a transaction. The available values are:

      • A value of 0 (zero) provides the best performance, but it does so at the potential risk of losing information in the event of a system or hardware failure. For use with Tungsten Cluster™ the value should never be set to 0, otherwise the cluster health may be affected during a failure or failover scenario.

      • A value of 1 (one) provides the best transaction stability by ensuring that all writes to disk are flushed and committed before the transaction is returned as complete. Using this setting implies an increased disk load and so may impact the overall performance.

        When using Tungsten Cluster in an Composite Active/Active, fan-in or data critical cluster, the value of innodb_flush_log_at_trx_commit should be set to 1. This not only ensures that the transactional data being stored in the cluster are safely written to disk, this setting also ensures that the metadata written by Tungsten Cluster™ describing the cluster and replication status is also written to disk and therefore available in the event of a failover or recovery situation.

      • A value of 2 (two) ensures that transactions are committed to disk, but data loss may occur if the disk data is not flushed from any OS or hardware-based buffering before a hardware failure, but the disk overhead is much lower and provides higher performance.

        This setting must be used as a minimum for all Tungsten Cluster™ installations, and should be the setting for all configurations that do not require innodb_flush_log_at_trx_commit set to 1.

      At a minimum innodb_flush_log_at_trx_commit should be set to 2; a warning will be generated if this value is set to zero:

      innodb_flush_log_at_trx_commit = 2

      MySQL configuration settings can be modified on a running cluster, providing you switch your host to maintenance mode before reconfiguring and restarting MySQL Server. See Section 7.13, “Performing Database or OS Maintenance”.

    Optional configuration changes that can be made to your MySQL configuration:

    • InnoDB Flush Method

      innodb_flush_method=O_DIRECT

      The InnoDB flush method can effect the performance of writes within MySQL and the system as a whole.

      O_DIRECT is generally recommended as it eliminates double-buffering of InnoDB writes through the OS page cache. Otherwise, MySQL will be contending with Tungsten and other processes for pages there — MySQL is quite active and has a lot of hot pages for indexes and the like this can result lower i/o throughput for other processes.

      Tungsten particularly depends on the page cache being stable when using parallel apply. There is one thread that scans forward over the THL pages to coordinate the channels and keep them from getting too far ahead. We then depend on those pages staying in cache for a while so that all the channels can read them — as you are aware parallel apply works like a bunch of parallel table scans that are traveling like a school of sardines over the same part of the THL. If pages get kicked out again before all the channels see them, parallel replication will start to serialize as it has to wait for the OS to read them back in again. If they stay in memory on the other hand, the reads on the THL are in-memory, and fast. For more information on parallel replication, see Section 5.5, “Deploying Parallel Replication”.

    • Increase InnoDB log file size

      The default InnoDB Redo Log file size is 48MB. This should be increased to a larger file size for performance and other reasons. Values of 512MB are common.

      To change the file size, read the corresponding information in the MySQL manual for configuring the file size information. Please see both "MySQL Redo Log" and "Optimizing MySQL InnoDB Redo Logging".

    • Binary Logging Format

      Tungsten Replicator works with both statement and row-based logging, and therefore also mixed-based logging. The chosen format is entirely up to the systems and preferences, and there are no differences or changes required for Tungsten Replicator to operate. For native MySQL to MySQL Primary/Replica replication, either format will work fine.

      Depending on the exact use case and deployment, different binary log formats imply different requirements and settings. Certain deployment types and environments require different settings:

      • For Composite Active/Active deployments, use row-based logging. This will help to avoid data drift where statements make fractional changes to the data in place of explicit updates.

      • Use row-based logging for heterogeneous deployments. All deployments to Oracle, MongoDB, Vertica and others rely on row-based logging.

      • Use mixed replication if warnings are raised within the MySQL log indicating that statement only is transferring possibly dangerous statements.

      • Use statement or mixed replication for transactions that update many rows; this reduces the size of the binary log and improves the performance when the transaction are applied on the Replica.

      • Use row replication for transactions that have temporary tables. Temporary tables are replicated if statement or mixed based logging is in effect, and use of temporary tables can stop replication as the table is unavailable between transactions. Using row-based logging also prevents these tables entering the binary log, which means they do not clog and delay replication.

      The configuration of the MySQL server can be permanently changed to use an explicit replication by modifying the configuration in the configuration file:

      binlog-format = row

      Note

      In MySQL 5.7, the default format is ROW.

      For temporary changes during execution of explicit statements, the binlog format can be changed by executing the following statement:

      mysql> SET binlog-format = ROW;
    • innodb_stats_on_metadata=0

      Although optional, we would highly recommend setting this property as it has been shown to improve performance by preventing statistics updates every time the information_schema is queried.

    You must restart MySQL after any changes have been made.

  • Ensure the tungsten user can access the MySQL binary logs by either opening up the directory permissions, or adding the tungsten user to the group owner for the directory.

B.4.3. MySQL Configuration for Active/Active Deployments

If you are inserting to the same table at the same time at two or more different sites, and using bi-directional or active/active replication, then special care must be taken to avoid primary key conflicts. Either the auto-increment keys on each need to be offset so they do not conflict, or the application needs to be be able to generate unique keys taking multiple sites into account.

Important

The following configuration is required if your application is relying upon the MySQL-native auto-increment primary key feature:

Use the auto-increment-increment and auto-increment-offset variables to affect the way that MySQL generates the next value in an auto-increment field.

For example, edit my.cnf on all servers:

# for all servers at site 1

auto-increment-increment = 10
auto-increment-offset = 1
# for all servers at site 2

auto-increment-increment = 10
auto-increment-offset = 2
# for all servers at site 3

auto-increment-increment = 10
auto-increment-offset = 3

Important

Restart MySQL on all servers.

B.4.4. MySQL Configuration for Heterogeneous Deployments

The following are required for replication to heterogeneous targets to ensure that MySQL has been configured and generating row change information correctly:

  • MySQL must be using Row-based replication for information to be replicated to heterogenous targets. For the best results, you should change the global binary log format, ideally in the configuration file (my.cnf):

    binlog-format = row

    Alternatively, the global binlog format can be changed by executing the following statement:

    mysql> SET GLOBAL binlog-format = ROW;

    For MySQL 5.6.2 and later, you must enable full row log images:

    binlog-row-image = full

    This information will be forgotten when the MySQL server is restarted; placing the configuration in the my.cnf file will ensure this option is permanently enabled.

  • Table format should be updated to UTF8 by updating the MySQL configuration (my.cnf):

    character-set-server=utf8
    collation-server=utf8_general_ci

    Tables must also be configured as UTF8 tables, and existing tables should be updated to UTF8 support before they are replicated to prevent character set corruption issues.

  • To prevent timezone configuration storing zone adjusted values and exporting this information to the binary log and PostgreSQL, fix the timezone configuration to use UTC within the configuration file (my.cnf):

    default-time-zone='+00:00'

B.4.5. MySQL User Configuration

  • Tungsten User Login

    It is possible to use users with a lower-privilege level and without as many rights. For more information, see Section B.4.6, “MySQL Unprivileged Users”.

    The tungsten user connects to the MySQL database and applies the data from the replication stream from other datasources in the dataservice. The user must therefore be able execute any SQL statement on the server, including grants for other users. The user must have the following privileges in addition to privileges for creating, updating and deleting DDL and data within the database:

    • SUPER privilege (MySQL 5.x) is required so that the user can perform all administrative operations including setting global variables.

    • If using MySQL 8+ then the CONNECTION_ADMIN privilege should be used as well as the SUPER privilege. In future MySQL releases it is expected that SUPER will eventually be removed.

    • GRANT OPTION privilege is required so that users and grants can be updated.

    To create a user with suitable privileges:

    mysql> CREATE USER tungsten@'%' IDENTIFIED BY 'password';
    mysql> GRANT ALL ON *.* TO tungsten@'%'  WITH GRANT OPTION;

    The connection will be made from the host to the local MySQL server. You may also need to create an explicit entry for this connection. For example, on the host host1, create the user with an explicit host reference:

    mysql> CREATE USER tungsten@'host1' IDENTIFIED BY 'password';
    mysql> GRANT ALL ON *.* TO tungsten@'host1'  WITH GRANT OPTION;

    The above commands enable logins from any host using the user name/password combination. If you want to limit the configuration to only include the hosts within your cluster you must create and grant individual user/host combinations:

    mysql> CREATE USER tungsten@'client1' IDENTIFIED BY 'password';
    mysql> GRANT ALL ON *.* TO tungsten@'client1'  WITH GRANT OPTION;

    Note

    If you later change the cluster configuration and add more hosts, you will need to update this configuration with each new host in the cluster.

  • If you configure the connector to run in Proxy mode, and you issue the SHOW SLAVE STATUS command, then any user executing this statement will require the select privilege on the tracking schema table trep_commit_seqno. The following DDL can be used as an example:

    GRANT SELECT ON tungsten_<servicename>.trep_commit_seqno TO '<user>'@'<host>';

    This will need to be executed after installation, following the initial creation of the tracking schema and tables.

B.4.6. MySQL Unprivileged Users

By default, the tungsten user needs to be given SUPER privileges within MySQL so that the user can apply, create and access all the tables and data within the MySQL database. In some situations, this level of access is not available within the MySQL environment, for example, when using a server that is heavily secured, or Amazon's RDS service.

For this situation, the Tungsten Cluster can be configured to use an 'unprivileged' user configuration. This configuration does not require the SUPER privilege, but instead needs explicit privileges on the schema created by Tungsten Cluster, and on the schemas that it will update when applying events.

The capability can be enabled by using the following two options and behaviors:

  • --privileged-master=false

    When privileged_master is disabled:

    • A Primary replicator will not attempt to suppress binlog writes during operations.

    • A Primary replicator Will not issue a FLUSH LOGS command when the replicator starts.

    • The current replicator position is not updated within the trep_commit_seqno table.

    The tungsten user that connects to the database must be configured to work with the MySQL service using the following grants:

    mysql> GRANT ALL ON tungsten_alpha.* to tungsten@'%' IDENTIFIED BY 'secret';
    mysql> GRANT SELECT ON *.* TO tungsten@'%' IDENTIFIED BY 'secret';
    mysql> GRANT REPLICATION SLAVE ON *.* TO tungsten@'%' IDENTIFIED BY 'secret';
    mysql> REVOKE SUPER ON *.* FROM tungsten@'%';
  • --privileged-slave=false

    When privileged_slave is disabled:

    mysql> GRANT ALL ON tungsten_batch.* to tungsten@'%' IDENTIFIED BY 'secret';
    mysql> GRANT SELECT,INSERT,UPDATE ON *.* TO tungsten@'%' IDENTIFIED BY 'secret';
    mysql> GRANT REPLICATION SLAVE ON *.* TO tungsten@'%' IDENTIFIED BY 'secret';
    mysql> REVOKE SUPER ON *.* FROM tungsten@'%';

    Optionally, INSERT and UPDATE privileges can be explicitly added to the user permissions for the tables/databases that will be updated during replication.