Manually Recovering from Another Replica
In the event that a restore operation fails, or due to a significant failure in the dataserver, an alternative option is to seed the failed dataserver directly from an existing running Replica.
For example, on the host host2, the data directory for MySQL has been corrupted, and mysqld will no longer start. This status can be seen
from examining the MySQL error log in /var/log/mysql/error.log:
130520 14:37:08 [Note] Recovering after a crash using /var/log/mysql/mysql-bin
130520 14:37:08 [Note] Starting crash recovery...
130520 14:37:08 [Note] Crash recovery finished.
130520 14:37:08 [Note] Server hostname (bind-address): '0.0.0.0'; port: 13306
130520 14:37:08 [Note] - '0.0.0.0' resolves to '0.0.0.0';
130520 14:37:08 [Note] Server socket created on IP: '0.0.0.0'.
130520 14:37:08 [ERROR] Fatal error: Can't open and lock privilege tables: Table 'mysql.host' doesn't exist
130520 14:37:08 [ERROR] /usr/sbin/mysqld: File '/var/run/mysqld/mysqld.pid' not found (Errcode: 13)
130520 14:37:08 [ERROR] /usr/sbin/mysqld: Error reading file 'UNKNOWN' (Errcode: 9)
130520 14:37:08 [ERROR] /usr/sbin/mysqld: Error on close of 'UNKNOWN' (Errcode: 9)
Performing a restore operation on this Replica may not work. To recover from another running Replica, host3, the MySQL data files can be copied over
to host2 directly using the following steps:
Shun the
host2datasource to be restored, and put the replicator service offline usingcctrl:[LOGICAL] /alpha > datasource host2 shun[LOGICAL] /alpha > replicator host2 offlineShun the
host3datasource to be restored, and put the replicator service offline usingcctrl:[LOGICAL] /alpha > datasource host3 shun[LOGICAL] /alpha > replicator host3 offlineStop the
mysqldservice onhost2:shell> sudo /etc/init.d/mysql stopStop the
mysqldservice onhost3:shell> sudo /etc/init.d/mysql stopDelete the
mysqlddata directory onhost2:shell> sudo rm -rf /var/lib/mysql/*If necessary, ensure the
tungstenuser can write to the MySQL directory:shell> sudo chmod 777 /var/lib/mysqlUse
rsynconhost3to send the data files for MySQL tohost2:shell> rsync -aze ssh /var/lib/mysql/* host2:/var/lib/mysql/You should synchronize all locations that contain data. This includes additional folders such as
innodb_data_home_dirorinnodb_log_group_home_dir. Check themy.cnffile to ensure you have the correct paths.Once the files have been copied, the files should be updated to have the correct ownership and permissions so that the Tungsten service can read them.
Recover
host3back to the dataservice:[LOGICAL:EXPERT] /alpha > datasource host3 recoverUpdate the ownership and permissions on the data files on
host2:host2 shell> sudo chown -R mysql:mysql /var/lib/mysqlhost2 shell> sudo chmod 770 /var/lib/mysqlClear out the THL files on the target node
host2so the Replica replicator service may start cleanly:host2 shell> thl purgeRecover
host2back to the dataservice:[LOGICAL:EXPERT] /alpha > datasource host2 recoverThe
datasource recovercommand will start MySQL and ensure that the server is accessible before restarting replication. If the MySQL instance does not start, correct any issues and attempt thedatasource recovercommand again.