5.10.4. Manually Recovering from Another Slave

In the event that a restore operation fails, or due to a significant failure in the dataserver, an alternative option is to seed the failed dataserver directly from an existing running slave.

For example, on the host host2 , the data directory for MySQL has been corrupted, and mysqld will no longer start. This status can be seen from examining the MySQL error log in /var/log/mysql/error.log:

130520 14:37:08 [Note] Recovering after a crash using /var/log/mysql/mysql-bin
130520 14:37:08 [Note] Starting crash recovery...
130520 14:37:08 [Note] Crash recovery finished.
130520 14:37:08 [Note] Server hostname (bind-address): '0.0.0.0'; port: 13306
130520 14:37:08 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
130520 14:37:08 [Note] Server socket created on IP: '0.0.0.0'.
130520 14:37:08 [ERROR] Fatal error: Can't open and lock privilege tables: Table 'mysql.host' doesn't exist
130520 14:37:08 [ERROR] /usr/sbin/mysqld: File '/var/run/mysqld/mysqld.pid' not found (Errcode: 13)
130520 14:37:08 [ERROR] /usr/sbin/mysqld: Error reading file 'UNKNOWN' (Errcode: 9)
130520 14:37:08 [ERROR] /usr/sbin/mysqld: Error on close of 'UNKNOWN' (Errcode: 9)

Performing a restore operation on this slave may not work. To recover from another running slave, host3 , the MySQL data files can be copied over to host2 directly using the following steps:

  1. Shun the host2 datasource to be restored, and put the replicator service offline using cctrl :

    [LOGICAL] /alpha > datasource host2 shun
      [LOGICAL] /alpha > replicator host2 offline
  2. Shun the host3 datasource to be restored, and put the replicator service offline using cctrl :

    [LOGICAL] /alpha > datasource host3 shun
      [LOGICAL] /alpha > replicator host3 offline
  3. Stop the mysqld service on host2:

    shell> sudo /etc/init.d/mysql stop
  4. Stop the mysqld service on host3:

    shell> sudo /etc/init.d/mysql stop
  5. Delete the mysqld data directory on host2 :

    shell> sudo rm -rf /var/lib/mysql/*
  6. If necessary, ensure the tungsten user can write to the MySQL directory:

    shell> sudo chmod 777 /var/lib/mysql
  7. Use rsync on host3 to send the data files for MySQL to host2 :

    shell> rsync -aze ssh /var/lib/mysql/* host2:/var/lib/mysql/

    You should synchronize all locations that contain data. This includes additional folders such as innodb_data_home_dir or innodb_log_group_home_dir. Check the my.cnf file to ensure you have the correct paths.

    Once the files have been copied, the files should be updated to have the correct ownership and permissions so that the Tungsten service can read them.

  8. Recover host3 back to the dataservice:

    [LOGICAL:EXPERT] /alpha > datasource host3 recover
  9. Update the ownership and permissions on the data files on host2:

    host2 shell> sudo chown -R mysql:mysql /var/lib/mysql
    host2 shell> sudo chmod 770 /var/lib/mysql
  10. Clear out the THL files on the target node host2 so the slave replicator service may start cleanly:

    host2 shell> thl purge
  11. Recover host2 back to the dataservice:

    [LOGICAL:EXPERT] /alpha > datasource host2 recover

    The recover command will start MySQL and ensure that the server is accessible before restarting replication. if the MySQL instance does not start; correct any issues and attempt the recover command again.