The tungsten_monitor script provides a mechanism for monitoring the cluster state when monitoring tools like Nagios aren't available. It implements the Tungsten Script Interface as well as these additional options.
tungsten_monitor [ --check-log
] [ --connector-timeout
] [ --dataservices
] [ --diagnostic-package
] [ --directory
] [ --disk
] [ --elb-script
] [ --email
] [ --force
] [ --help
, -h
] [ --ignore
] [ --info
, -i
] [ --json
] [ --latency
] [ --lock-dir
] [ --lock-timeout
] [ --mail
] [ --max-backup-age
] [ --net-ssh-option
] [ --notice
, -n
] [ --reset
] [ --subject
] [ --validate
] [ --verbose
, -v
]
Where:
Table 9.63. tungsten_monitor Command-line Options
Option | Description |
---|---|
--check-log | Email any lines in the log file that match the egrep expression. --check-log=tungsten-manager/log/tmsvc.log:OFFLINE |
--connector-timeout | Number of seconds to wait for a connector response |
--dataservices | This list of dataservices to monitoring to |
--diagnostic-package | Create a diagnostic package if any issues are found |
--directory | The $CONTINUENT_ROOT directory to use for running this command. It will default to the directory you use to run the script. |
--disk | Display a warning if any disk usage is above this percentage |
--elb-script | The xinetd script name that is responding to ELB liveness checks |
--email | Email address to send to when mailing any notifications |
--force | Continue operation even if script validation fails |
--help , -h | Show help text |
--ignore | Ignore notices that use this key |
--info , -i | Display info, notice, warning, and error messages |
--json | Output all messages and the return code as a JSON object |
--latency | The maximum allowed latency for replicators |
--lock-dir | Directory to store log and lock files in |
--lock-timeout | The number of minutes to sleep a notice after sending it |
--mail | Path to the mail program to use for sending messages |
--max-backup-age | Maximum age in seconds of valid backups |
--net-ssh-option | Provide custom SSH options to use for communication to other hosts. |
--notice , -n | Display notice, warning, and error messages |
--reset | Remove all entries from the lock directory |
--subject | Email subject line |
--validate | Only run script validation |
--verbose , -v | Verbose |
General Operation
Each time the tungsten_monitor runs, it will run a standard set of checks. The set of checks will be determined automatically based on the current node configuration (for example, connector-timeout check will only run if the node has a connector installed). Additional checks may be turned on using command line options.
Check that all Tungsten services for this host are running
Check that all replication services and datasources are ONLINE
Check that replication latency does not exceed a specified amount
Check that the local connector is responsive
Check disk usage
An example of adding it to crontab:
shell> crontab -l
10 * * * * /opt/continuent/tungsten/cluster-home/bin/tungsten_monitor >/dev/null 2>/dev/null
All messages will be sent to
/opt/continuent/share/tungsten_monitor/lastrun.log
.
Note that when all tungsten_monitor checks pass, the script will not print anything to the standard output.
Sending results via email
The tungsten_monitor is able to send you an email when problems are found. It is suggested that you run the script as root so it is able to use the mail program without warnings.
Alerts are cached to prevent them from being sent multiple times and
flooding your inbox. You may pass
--reset
to clear out the cache
or --lock-timeout
to adjust
the amount of time this cache is kept. The default is 3 hours.
shell> crontab -l
10 * * * * /opt/continuent/tungsten/cluster-home/bin/tungsten_monitor --from=you@yourcompany.com \
--to=group@yourcompany.com >/dev/null 2>/dev/null
Monitoring log files
The tungsten_monitor can optionally monitor log files for
certain keywords. This example will alert you to any lines in
trepsvc.log
that include OFFLINE.
shell> tungsten_monitor
--check-log=tungsten-replicator/log/trepsvc.log:OFFLINE
Monitoring backup status
Knowing you have a recent backup is an important part any Tungsten
deployment. The tungsten_monitor will look for the latest
backup across all datasources and compare it to the value
--max-backup-age
. This example
will let you know if a valid backup has not been taken in 3 days.
shell> tungsten_monitor --max-backup-age=259200
Compatibility
The script only works with MySQL at this time.