8.27. The tungsten_monitor Script

The tungsten_monitor script provides a mechanism for monitoring the cluster state when monitoring tools like Nagios aren't available. It implements the Tungsten Script Interface as well as these additional options.

tungsten_monitor [ --check-log ] [ --connector-timeout  ] [ --dataservices ] [ --diagnostic-package  ] [ --directory ] [ --disk  ] [ --elb-script ] [ --email ] [ --force  ] [ --help, -h ] [ --ignore ] [ --info, -i ] [ --json ] [ --latency  ] [ --lock-dir ] [ --lock-timeout  ] [ --mail  ] [ --max-backup-age ] [ --net-ssh-option ] [ --notice, -n ] [ --reset  ] [ --subject ] [ --validate  ] [ --verbose, -v ]

Where:

Table 8.52. tungsten_monitor Command-line Options

OptionDescription
--check-logEmail any lines in the log file that match the egrep expression. --check-log=tungsten-manager/log/tmsvc.log:OFFLINE
--connector-timeoutNumber of seconds to wait for a connector response
--dataservicesThis list of dataservices to monitoring to
--diagnostic-packageCreate a diagnostic package if any issues are found
--directoryThe $CONTINUENT_ROOT directory to use for running this command. It will default to the directory you use to run the script.
--diskDisplay a warning if any disk usage is above this percentage
--elb-scriptThe xinetd script name that is responding to ELB liveness checks
--emailEmail address to send to when mailing any notifications
--forceContinue operation even if script validation fails
--help, -hShow help text
--ignoreIgnore notices that use this key
--info, -iDisplay info, notice, warning, and error messages
--jsonOutput all messages and the return code as a JSON object
--latencyThe maximum allowed latency for replicators
--lock-dirDirectory to store log and lock files in
--lock-timeoutThe number of minutes to sleep a notice after sending it
--mailPath to the mail program to use for sending messages
--max-backup-ageMaximum age in seconds of valid backups
--net-ssh-optionProvide custom SSH options to use for communication to other hosts.
--notice, -nDisplay notice, warning, and error messages
--resetRemove all entries from the lock directory
--subjectEmail subject line
--validateOnly run script validation
--verbose, -vVerbose

General Operation

Each time the tungsten_monitor runs, it will run a standard set of checks. The set of checks will be determined automatically based on the current node configuration (for example, connector-timeout check will only run if the node has a connector installed). Additional checks may be turned on using command line options.

  • Check that all Tungsten services for this host are running

  • Check that all replication services and datasources are ONLINE

  • Check that replication latency does not exceed a specified amount

  • Check that the local connector is responsive

  • Check disk usage

An example of adding it to crontab:

shell> crontab -l
10 * * * * /opt/continuent/tungsten/cluster-home/bin/tungsten_monitor >/dev/null 2>/dev/null

All messages will be sent to /opt/continuent/share/tungsten_monitor/lastrun.log.

Note that when all tungsten_monitor checks pass, the script will not print anything to the standard output.

Sending results via email

The tungsten_monitor is able to send you an email when problems are found. It is suggested that you run the script as root so it is able to use the mail program without warnings.

Alerts are cached to prevent them from being sent multiple times and flooding your inbox. You may pass --reset to clear out the cache or --lock-timeout to adjust the amount of time this cache is kept. The default is 3 hours.

shell> crontab -l
10 * * * * /opt/continuent/tungsten/cluster-home/bin/tungsten_monitor --from=you@yourcompany.com \
    --to=group@yourcompany.com >/dev/null 2>/dev/null

Monitoring log files

The tungsten_monitor can optionally monitor log files for certain keywords. This example will alert you to any lines in trepsvc.log that include OFFLINE.

shell> tungsten_monitor --check-log=tungsten-replicator/log/trepsvc.log:OFFLINE

Monitoring backup status

Knowing you have a recent backup is an important part any Tungsten deployment. The tungsten_monitor will look for the latest backup across all datasources and compare it to the value --max-backup-age. This example will let you know if a valid backup has not been taken in 3 days.

shell> tungsten_monitor --max-backup-age=259200

Compatibility

The script only works with MySQL at this time.