It can sometimes become necessary to identify table and data differences due to unexpected behaviour or failures. There are a number of third party tools that can help identify and fix however a lot of them assume native replication is in place, the following explains the recommended methods for troubleshooting a Tungsten Environment based on MySQL as the source and target technologies.
If you suspect that there are differences to a table structure, a simple method to resolve this will be to compare schema DDL.
Extract DDL on the Primary node, specifying the schema in place of {DB}:
shell> mysqldump -u root -p --no-data -h localhost --databases {DB} >Primary.sql
Repeat the same on the Replica node:
shell> mysqldump -u root -p --no-data -h localhost --databases {DB} >Replica.sql
Now, using diff, you can compare the results
shell> diff Primary.sql Replica.sql
Using the output of diff, you can then craft the necessary SQL statements to re-align your structure
It is possible to use pt-table-checksum from the Percona Toolkit to identify data differences, providing you use the syntax described below for bypassing the native replication checks. First of all, it is advisable to familiarise yourself with the product by reading through the providers own documentation here:
https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html
Once you are ready, ensure you install the latest version to the persona toolkit on all nodes, next execute the following on the Primary node:
shell> pt-table-checksum --set-vars innodb_lock_wait_timeout=500 \
--recursion-method=none \
--ignore-databases=mysql \
--ignore-databases-regex=tungsten* \
h=localhost,u=tungsten,p=secret
On first run, this will create a database called percona, and within that database a table called checksums. The process will gather checksum information on every table in every database excluding the mysql and tungsten related schemas. You can now execute the following SQL Statement on the Replica to identify tables with data differences:
SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
This SELECT
will return any
tables that it detects are different, it won't show you the differences,
or indeed how many, this is just a basic check. To identify and fix the
changes, you could use pt-table-sync, however this
product would by default assume native replication and also try and fix
the problems for you. In a tungsten environment this would not be
recommended, however by using the
--print
switch you can gather the
SQL needed to be executed to fix the mistakes. You should run this, and
review the output to determine whether you want to manually patch the
data together or consider using
tungsten_provision_slave to retrovision a node in the
case of large quantities of differences.
To use pt-table-sync, first
identify the tables with differences on each Replica, in this example, the
SELECT
statement above
identified that there was a data difference on the departments table
within the employees database on db2. Execute the
pt-table-sync script on the
Primary, passing in the database name, table name and the Replica host that
the difference exists on:
shell> pt-table-sync --databases employees --tables departments --print h=db1,u=tungsten,p=secret,P=13306 h=db2
The first h=
option should be the
Primary, also the node you run the script from, the second
h=
option relates to the Replica that
the difference exists on. Executing the script will output SQL
statements that can be used to patch the data, for example the above
statement produces the following output:
UPDATE `employees`.`departments` SET `dept_name`='Sales' WHERE `dept_no`='d007' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:departments src_dsn:P=13306,h=db1,p=...,u=tungsten dst_db:employees dst_tbl:departments dst_dsn:P=13306,h=db2,p=...,u=tungsten lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:24524 user:tungsten host:db1*/;
The UPDATE
statements could now
be issued directly on the Replica to correct the problem.
Generally, changing data directly on a Replica is not recommended, but every environment is different. before making any changes like this always ensure you have a FULL backup, and it would be recommended to shun the Replica node (if in a clustered environment) before making any changes so as not to cause any potential interruption to connected clients