8.8.4. Manually Recovering from Another Slave

In the event that a restore operation fails, or due to a significant failure in the dataserver, an alternative option is to seed the failed dataserver directly from an existing running slave.

For example, on the host host2 , the data directory for MySQL has been corrupted, and mysqld will no longer start. This status can be seen from examining the MySQL error log in /var/log/mysql/error.log:

130520 14:37:08 [Note] Recovering after a crash using /var/log/mysql/mysql-bin
130520 14:37:08 [Note] Starting crash recovery...
130520 14:37:08 [Note] Crash recovery finished.
130520 14:37:08 [Note] Server hostname (bind-address): '0.0.0.0'; port: 13306
130520 14:37:08 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
130520 14:37:08 [Note] Server socket created on IP: '0.0.0.0'.
130520 14:37:08 [ERROR] Fatal error: Can't open and lock privilege tables: Table 'mysql.host' doesn't exist
130520 14:37:08 [ERROR] /usr/sbin/mysqld: File '/var/run/mysqld/mysqld.pid' not found (Errcode: 13)
130520 14:37:08 [ERROR] /usr/sbin/mysqld: Error reading file 'UNKNOWN' (Errcode: 9)
130520 14:37:08 [ERROR] /usr/sbin/mysqld: Error on close of 'UNKNOWN' (Errcode: 9)

Performing a restore operation on this slave may not work. To recover from another running slave, host3 , the MySQL data files can be copied over to host2 directly using the following steps:

  1. Put the host2 replication service offline using trepctl:

    shell> trepctl offline
  2. Put the host3 replication service offline using trepctl:

    shell> trepctl offline
  3. Stop the mysqld service on host2:

    shell> sudo /etc/init.d/mysql stop
  4. Stop the mysqld service on host3:

    shell> sudo /etc/init.d/mysql stop
  5. Delete the mysqld data directory on host2 :

    shell> sudo rm -rf /var/lib/mysql/*
  6. If necessary, ensure the tungsten user can write to the MySQL directory:

    shell> sudo chmod 777 /var/lib/mysql
  7. Use rsync on host3 to send the data files for MySQL to host2 :

    shell> rsync -aze ssh /var/lib/mysql/* host2:/var/lib/mysql/

    You should synchronize all locations that contain data. This includes additional folders such as innodb_data_home_dir or innodb_log_group_home_dir. Check the my.cnf file to ensure you have the correct paths.

    Once the files have been copied, the files should be updated to have the correct ownership and permissions so that the Tungsten service can read them.

  8. Start the mysqld service on host3 :

    shell> sudo /etc/init.d/mysql start
  9. Put the host3 replication service online using trepctl:

    shell> trepctl online
  10. Update the ownership and permissions on the data files on host2:

    host2 shell> sudo chown -R mysql:mysql /var/lib/mysql
    host2 shell> sudo chmod 770 /var/lib/mysql
  11. Clear out the THL files on the target node host2 so the slave replicator service may start cleanly:

    host2 shell> thl purge
  12. Start the mysqld service on host2 :

    shell> sudo /etc/init.d/mysql start
  13. Put the host2 replication service online using trepctl:

    shell> trepctl online