For replication between MySQL hosts, you must configure each MySQL database server to support the required user names and core MySQL configuration.
For MySQL extraction, Tungsten Cluster must have write access to the database so that status and progress information can be recorded correctly.
Native MySQL replication should not be running when you install Tungsten Cluster™. The replication service will be completely handled by Tungsten Cluster™, and the normal replication, management and monitoring techniques will not provide you with the information you need.
Each MySQL Server should be configured identically within the system. Although binary logging must be enabled on each host, replication should not be configured, since Tungsten Replicator will be handling that process.
The configured tungsten
user must be
able to read the MySQL configuration file (for installation) and the
binary logs. Either the tungsten
user should be a member of the appropriate group (i.e.
mysql
), or the permissions altered
accordingly.
Parsing of mysqld_multi configuration files is not currently supported. To use a mysqld_multi installation, copy the relevant portion of the configuration file to a separate file to be used during installation.
Primary Keys
Whilst it is not mandatory, it is strongly advised that all tables have Primary Keys. This will ensure replication consistency and avoid any potential risks of data drift. Without Primary Keys, when using ROW or MIXED based binlogging, the replicator will create a "sudo" primary key based on all columns of the table. This could cause degredation in performance and in some edge cases could cause replication to not apply changes to the rows that were expected.
To setup your MySQL servers, you need to do the following:
Configure your my.cnf
settings. The following changes should be made to the
[mysqld]
section of your
my.cnf
file:
By default, MySQL is configured only to listen on the localhost
address (127.0.0.1). The
bind-address
parameter
should be checked to ensure that it is either set to a valid
value, or commented to allow listening on all available network
interfaces:
#bind-address = 127.0.0.1
Specify the server id
Each server must have a unique server id:
server-id = 1
The best practice is for all servers to have a unique ID across all clusters. For example, use a numbering scheme like 0101, 0102, 0201, 0201, where the leading two digits are the cluster number and the last two digits are the node number, which allows for 99 participating clusters with 99 nodes each.
Ensure that the maximum number of open files matches the configuration of the database user. This was configured earlier at 65535 files.
open_files_limit = 65535
Enable binary logs
Tungsten Replicator operates by reading the binary logs on each machine, so logging must be enabled:
log-bin = mysql-bin
Set the sync_binlog
parameter to 1 (one).
In MySQL 5.7, the default value is 1.
The MySQL sync_binlog
parameter sets the frequency at which the binary log is flushed to
disk. A value of zero indicates that the binary log should not be
synchronized to disk, which implies that only standard operating
system flushing of writes will occur. A value greater than one
configures the binary log to be flushed only after
sync_binlog
events
have been written. This can introduce a delay into writing
information to the binary log, and therefore replication, but also
opens the system to potential data loss if the binary log has not
been flushed when a fatal system error occurs.
Setting a value of value 1 (one) will synchronize the binary log on disk after each event has been written.
sync_binlog = 1
Increase MySQL protocol packet sizes
The replicator can apply statements up to the maximum size of a single transaction, so the maximum allowed protocol packet size must be increase to support this:
max_allowed_packet = 52m
Configure InnoDB as the default storage engine
Tungsten Cluster needs to use a transaction safe storage engine to ensure the validity of the database. The InnoDB storage engine also provides automatic recovery in the event of a failure. Using MyISAM can lead to table corruption, and in the event of a switchover or failure, and inconsistent state of the database, making it difficult to recover or restart replication effectively.
InnoDB should therefore be the default storage engine for all tables, and any existing tables should be converted to InnoDB before deploying Tungsten Cluster.
default-storage-engine = InnoDB
Configure InnoDB Settings
Tungsten Replicator creates tables and must use InnoDB tables to store the status information for replication configuration and application:
The MySQL option
innodb_flush_log_at_trx_commit
configures how InnoDB writes and confirms writes to disk during a
transaction. The available values are:
A value of 0 (zero) provides the best performance, but it does so at the potential risk of losing information in the event of a system or hardware failure. For use with Tungsten Cluster™ the value should never be set to 0, otherwise the cluster health may be affected during a failure or failover scenario.
A value of 1 (one) provides the best transaction stability by ensuring that all writes to disk are flushed and committed before the transaction is returned as complete. Using this setting implies an increased disk load and so may impact the overall performance.
When using Tungsten Cluster in an Composite Active/Active, fan-in or data
critical cluster, the value of
innodb_flush_log_at_trx_commit
should be set to 1. This not only ensures that the
transactional data being stored in the cluster are safely
written to disk, this setting also ensures that the metadata
written by Tungsten Cluster™ describing the cluster and
replication status is also written to disk and therefore
available in the event of a failover or recovery situation.
A value of 2 (two) ensures that transactions are committed to disk, but data loss may occur if the disk data is not flushed from any OS or hardware-based buffering before a hardware failure, but the disk overhead is much lower and provides higher performance.
This setting must be used as a minimum for
all Tungsten Cluster™ installations,
and should be the setting for all configurations that do not
require
innodb_flush_log_at_trx_commit
set to 1
.
At a minimum
innodb_flush_log_at_trx_commit
should be set to 2
; a
warning will be generated if this value is set to zero:
innodb_flush_log_at_trx_commit = 2
MySQL configuration settings can be modified on a running cluster, providing you switch your host to maintenance mode before reconfiguring and restarting MySQL Server. See Section 7.13, “Performing Database or OS Maintenance”.
Optional configuration changes that can be made to your MySQL configuration:
InnoDB Flush Method
innodb_flush_method=O_DIRECT
The InnoDB flush method can effect the performance of writes within MySQL and the system as a whole.
O_DIRECT is generally recommended as it eliminates double-buffering of InnoDB writes through the OS page cache. Otherwise, MySQL will be contending with Tungsten and other processes for pages there — MySQL is quite active and has a lot of hot pages for indexes and the like this can result lower i/o throughput for other processes.
Tungsten particularly depends on the page cache being stable when using parallel apply. There is one thread that scans forward over the THL pages to coordinate the channels and keep them from getting too far ahead. We then depend on those pages staying in cache for a while so that all the channels can read them — as you are aware parallel apply works like a bunch of parallel table scans that are traveling like a school of sardines over the same part of the THL. If pages get kicked out again before all the channels see them, parallel replication will start to serialize as it has to wait for the OS to read them back in again. If they stay in memory on the other hand, the reads on the THL are in-memory, and fast. For more information on parallel replication, see Section 5.5, “Deploying Parallel Replication”.
Increase InnoDB log file size
The default InnoDB Redo Log file size is 48MB. This should be increased to a larger file size for performance and other reasons. Values of 512MB are common.
To change the file size, read the corresponding information in the MySQL manual for configuring the file size information. Please see both "MySQL Redo Log" and "Optimizing MySQL InnoDB Redo Logging".
Binary Logging Format
Tungsten Replicator works with both statement and row-based logging, and therefore also mixed-based logging. The chosen format is entirely up to the systems and preferences, and there are no differences or changes required for Tungsten Replicator to operate. For native MySQL to MySQL Primary/Replica replication, either format will work fine.
Depending on the exact use case and deployment, different binary log formats imply different requirements and settings. Certain deployment types and environments require different settings:
For Composite Active/Active deployments, use row-based logging. This will help to avoid data drift where statements make fractional changes to the data in place of explicit updates.
Use row-based logging for heterogeneous deployments. All deployments to Oracle, MongoDB, Vertica and others rely on row-based logging.
Use mixed replication if warnings are raised within the MySQL log indicating that statement only is transferring possibly dangerous statements.
Use statement or mixed replication for transactions that update many rows; this reduces the size of the binary log and improves the performance when the transaction are applied on the Replica.
Use row replication for transactions that have temporary tables. Temporary tables are replicated if statement or mixed based logging is in effect, and use of temporary tables can stop replication as the table is unavailable between transactions. Using row-based logging also prevents these tables entering the binary log, which means they do not clog and delay replication.
The configuration of the MySQL server can be permanently changed to use an explicit replication by modifying the configuration in the configuration file:
binlog-format = row
In MySQL 5.7, the default format is
ROW
.
For temporary changes during execution of explicit statements, the binlog format can be changed by executing the following statement:
mysql> SET binlog-format = ROW;
innodb_stats_on_metadata=0
Although optional, we would highly recommend setting this property as it has been shown to improve performance by preventing statistics updates every time the information_schema is queried.
You must restart MySQL after any changes have been made.
Ensure the tungsten
user can
access the MySQL binary logs by either opening up the directory
permissions, or adding the
tungsten
user to the group owner
for the directory.
If you are inserting to the same table at the same time at two or more different sites, and using bi-directional or active/active replication, then special care must be taken to avoid primary key conflicts. Either the auto-increment keys on each need to be offset so they do not conflict, or the application needs to be be able to generate unique keys taking multiple sites into account.
The following configuration is required if your application is relying upon the MySQL-native auto-increment primary key feature:
Use the auto-increment-increment
and
auto-increment-offset
variables to affect the way that
MySQL generates the next value in an auto-increment field.
For example, edit my.cnf
on all
servers:
# for all servers at site 1
auto-increment-increment = 10
auto-increment-offset = 1
# for all servers at site 2
auto-increment-increment = 10
auto-increment-offset = 2
# for all servers at site 3
auto-increment-increment = 10
auto-increment-offset = 3
Restart MySQL on all servers.
The following are required for replication to heterogeneous targets to ensure that MySQL has been configured and generating row change information correctly:
MySQL must be using Row-based replication for information to be
replicated to heterogenous targets. For the best results, you should
change the global binary log format, ideally in the configuration file
(my.cnf
):
binlog-format = row
Alternatively, the global binlog format can be changed by executing the following statement:
mysql> SET GLOBAL binlog-format = ROW;
For MySQL 5.6.2 and later, you must enable full row log images:
binlog-row-image = full
This information will be forgotten when the MySQL server is restarted;
placing the configuration in the
my.cnf
file will ensure this
option is permanently enabled.
Table format should be updated to UTF8 by updating the MySQL
configuration (my.cnf
):
character-set-server=utf8
collation-server=utf8_general_ci
Tables must also be configured as UTF8 tables, and existing tables should be updated to UTF8 support before they are replicated to prevent character set corruption issues.
To prevent timezone configuration storing zone adjusted values and
exporting this information to the binary log and PostgreSQL, fix the
timezone configuration to use UTC within the configuration file
(my.cnf
):
default-time-zone='+00:00'
Tungsten User Login
It is possible to use users with a lower-privilege level and without as many rights. For more information, see Section B.4.6, “MySQL Unprivileged Users”.
The tungsten
user connects to
the MySQL database and applies the data from the replication stream
from other datasources in the dataservice. The user must therefore be
able execute any SQL statement on the server, including grants for
other users. The user must have the following
privileges in addition to privileges for creating, updating and
deleting DDL and data within the database:
SUPER
privilege (MySQL 5.x) is
required so that the user can perform all administrative
operations including setting global variables.
If using MySQL 8+ then the CONNECTION_ADMIN
privilege
should be used as well as the SUPER
privilege. In future
MySQL releases it is expected that SUPER
will eventually be removed.
GRANT OPTION
privilege is
required so that users and grants can be updated.
To create a user with suitable privileges:
mysql>CREATE USER tungsten@'%' IDENTIFIED BY 'password';
mysql>GRANT ALL ON *.* TO tungsten@'%' WITH GRANT OPTION;
The connection will be made from the host to the local MySQL server.
You may also need to create an explicit entry for this connection. For
example, on the host host1
,
create the user with an explicit host reference:
mysql>CREATE USER tungsten@'host1' IDENTIFIED BY 'password';
mysql>GRANT ALL ON *.* TO tungsten@'host1' WITH GRANT OPTION;
The above commands enable logins from any host using the user name/password combination. If you want to limit the configuration to only include the hosts within your cluster you must create and grant individual user/host combinations:
mysql>CREATE USER tungsten@'client1' IDENTIFIED BY 'password';
mysql>GRANT ALL ON *.* TO tungsten@'client1' WITH GRANT OPTION;
If you later change the cluster configuration and add more hosts, you will need to update this configuration with each new host in the cluster.
If you configure the connector to run in Proxy mode, and you issue the SHOW SLAVE STATUS
command, then any
user executing this statement will require the select privilege on
the tracking schema table trep_commit_seqno
. The following DDL can be used as an example:
GRANT SELECT ON tungsten_<servicename>
.trep_commit_seqno TO '<user>
'@'<host>
';
This will need to be executed after installation, following the initial creation of the tracking schema and tables.
By default, the tungsten
user needs
to be given SUPER
privileges
within MySQL so that the user can apply, create and access all the tables
and data within the MySQL database. In some situations, this level of
access is not available within the MySQL environment, for example, when
using a server that is heavily secured, or Amazon's RDS service.
For this situation, the Tungsten Cluster can be configured to use an
'unprivileged' user configuration. This configuration does not require the
SUPER
privilege, but instead needs
explicit privileges on the schema created by Tungsten Cluster, and on the
schemas that it will update when applying events.
The capability can be enabled by using the following two options and behaviors:
When privileged_master is disabled:
A Primary replicator will not attempt to suppress binlog writes during operations.
A Primary replicator Will not issue a
FLUSH LOGS
command when
the replicator starts.
The current replicator position is not updated within the
trep_commit_seqno
table.
The tungsten
user that connects
to the database must be configured to work with the MySQL service
using the following grants:
mysql>GRANT ALL ON
mysql>tungsten_alpha
.* totungsten
@'%' IDENTIFIED BY 'secret';GRANT SELECT ON *.* TO
mysql>tungsten
@'%' IDENTIFIED BY 'secret';GRANT REPLICATION SLAVE ON *.* TO
mysql>tungsten
@'%' IDENTIFIED BY 'secret';REVOKE SUPER ON *.* FROM
tungsten
@'%';
When privileged_slave is disabled:
The current replicator position is not updated within the
trep_commit_seqno
table.
mysql>GRANT ALL ON
mysql>tungsten_batch
.* totungsten
@'%' IDENTIFIED BY 'secret';GRANT SELECT,INSERT,UPDATE ON *.* TO
mysql>tungsten
@'%' IDENTIFIED BY 'secret';GRANT REPLICATION SLAVE ON *.* TO
mysql>tungsten
@'%' IDENTIFIED BY 'secret';REVOKE SUPER ON *.* FROM
tungsten
@'%';
Optionally, INSERT
and
UPDATE
privileges can be
explicitly added to the user permissions for the tables/databases that
will be updated during replication.