Continuent has introduced basic support for using Prometheus and Grafana to monitor Tungsten nodes. As of Tungsten software v6.1.4, five key Prometheus exporters have been added to the distribution.
The exporters will allow a Prometheus server to gather metrics for:
The underlying node "hardware" using an external
node_exporter
binary added to
the distribution.
The running MySQL server using an external
mysqld_exporter
binary added to
the distribution.
The Tungsten Replicator using new built-in functionality in the replicator binary.
The Tungsten Manager using new built-in functionality in the manager binary.
The Tungsten Connector using new built-in functionality in the connector binary.
A new command, tmonitor, has been included to assist with the management and testing of the exporters.
To manually test any exporter, get the URL
http://{theHost}:{thePort}/metrics
either via the
curl command or a browser.
shell>curl -s 'http://localhost:9100/metrics' | wc -l
916 shell>curl -s 'http://localhost:9100/metrics' | head -10
# HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 1.0129e-05 go_gc_duration_seconds{quantile="0.25"} 1.3347e-05 go_gc_duration_seconds{quantile="0.5"} 1.8895e-05 go_gc_duration_seconds{quantile="0.75"} 3.095e-05 go_gc_duration_seconds{quantile="1"} 0.00104028 go_gc_duration_seconds_sum 10.43582891 go_gc_duration_seconds_count 173258 # HELP go_goroutines Number of goroutines that currently exist.
The below table describes the exporters and the ports they listen on.
Exporter | Port | Description | Scope |
---|---|---|---|
node | 9100 | Metrics for the underlying node "hardware" | External |
mysql | 9104 | Metrics for the MySQL server | External |
replicator | 8091 | Metrics for the Tungsten Replicator | Internal |
manager | 8092 | Metrics for the Tungsten Manager | Internal |
connector | 8093 | Metrics for the Tungsten Connector | Internal |
If the default ports (8091
, 8092
& 8093
) are in conflict, it is possible to change
them.
Add one line per component to
/etc/tungsten/tungsten.ini
under the
[defaults]
section:
shell> vi /etc/tungsten/tungsten.ini [defaults] property=replicator.prometheus.exporter.port=28091 property=manager.prometheus.exporter.port=28092 property=connector.prometheus.exporter.port=28093 …
Either SHUN the individual nodes one at a time and WELCOME each one after running the update, or set policy to MAINTENANCE via cctrl and update all nodes, then set the policy back to AUTOMATIC when all nodes have been completed.
Inform the running processes of the changed configuration:
shell> tpm update
You may need to restart the Manager, Replicator or Connector, depending on what was changed.
Verify that the port is listening:
shell> sudo netstat -pan | grep 28091
Add the following to /etc/tungsten/tungsten.ini
under
the [defaults]
section:
The Prometheus exporters are all enabled by default. It is simple to disable them using the following procedure.
Add one line per component to
/etc/tungsten/tungsten.ini
under the
[defaults]
section:
shell> vi /etc/tungsten/tungsten.ini [defaults] property=manager.prometheus.exporter.enabled=false property=replicator.prometheus.exporter.enabled=false property=connector.prometheus.exporter.enabled=false …
Either SHUN the individual nodes one at a time and WELCOME each one after running the update, or set policy to MAINTENANCE via cctrl and update all nodes, then set the policy back to AUTOMATIC when all nodes have been completed.
Inform the running processes of the changed configuration:
shell> tpm update
You may need to restart the Manager, Replicator or Connector, depending on what was changed.
Verify that the port is no longer listening:
shell> sudo netstat -pan | grep 8091
tmonitor is a simple tool for the management and testing of Prometheus exporters.
Exporters that require an external binary to function (i.e. node_exporter and mysqld_exporter) are considered to have an External Scope.
Exporters that do not require an external binary to function (i.e. manager, replicator and connector) are considered to have an Internal Scope.
By default, tmonitor {action} will act upon all available exporters.
If any exporter is specified on the CLI, then only those listed on the CLI will be acted upon.
The curl command must be available
in the PATH for the status
and
test
actions to function.
tmonitor status will use
curl to test the exporter on
localhost
.
tmonitor test will use
curl to fetch and print the metrics
from one or more exporters on
localhost
.
tmonitor install will configure the specified exporter (or all external exporters if none is specified) to start at boot.
tmonitor remove will stop the specified exporter (or all external exporters if none is specified) from starting at boot.
Both the install
and
remove
actions will attempt to
auto-detect the boot sub-system. Currently,
init.d
and
systemd
are supported.
shell> tmonitor help
...
>>> Usage:
tmonitor [args] {action}
= Actions available for all exporter scopes:
status - validate the service via curl (short output)
test - validate the service via curl (full output)
= Actions available for External-scope exporters only:
start - launch the exporter process
stop - kill the exporter process
install - configure the exporter to start at boot
remove - stop the exporter from starting at boot
>>> Arguments:
[-h|--help]
[-v|--verbose]
[--force] Required for certain MySQL-specific operations
[-f|--filter {string}] Limit the `tmonitor test` output based on this string match
[-t|--tungsten] Set the `tmonitor test` filter to 'tungsten_'
[-i|--internal] Only act upon exporters with an internal scope
[-e|-x|--external] Only act upon exporters with an external scope
--internal and --external may not be specified together.
= Internal Scope Exporters:
[-C|--connector] Specify the Tungsten Connector exporter
[-M|--manager] Specify the Tungsten Manager exporter
[-R--replicator] Specify the Tungsten Replicator exporter
= External Scope Exporters:
[-m|--mysql|--mysqld] Specify the MySQL exporter
[-n|--node] Specify the Node exporter
Example: View the status of all exporters:
shell> tmonitor status
Tungsten Connector exporter running ok on port 8093
Tungsten Manager exporter running ok on port 8092
MySQL exporter running ok on port 9104
Node exporter running ok on port 9100
Tungsten Replicator exporter running ok on port 8091
All 5 exporters are running ok (Up: Tungsten Connector, Tungsten Manager, MySQL, Node, Tungsten Replicator)
Example: Start all exporters:
shell>tmonitor start
tungsten@db1-demo:/home/tungsten # tmonitor start Node exporter started successfully on port 9100. MySQL exporter started successfully on port 9104. shell>tmonitor start
The Node exporter is already running The MySQL exporter is already running
Example: Test all exporters:
shell>tmonitor test | wc -l
3097 shell>tmonitor test | grep '== '
== Metrics for the connector exporter: == Metrics for the manager exporter: == Metrics for the mysql exporter: == Metrics for the node exporter: == Metrics for the replicator exporter: shell>tmonitor test | less
Example: Stop all exporters:
shell> tmonitor stop
All exporters stopped
Example: View the status of all exporters filtered by scope:
shell>tmonitor status -i
Tungsten Connector exporter running ok on port 8093 Tungsten Manager exporter running ok on port 8092 Tungsten Replicator exporter running ok on port 8091 All 3 internal exporters are running ok (Up: Tungsten Connector, Tungsten Manager, Tungsten Replicator) shell>tmonitor status -x
MySQL exporter running ok on port 9104 Node exporter running ok on port 9100 All 2 external exporters are running ok (Up: MySQL, Node)
Example: Install init.d boot scripts for all external exporters:
shell>tmonitor install
ERROR: You must be root to install or remove boot services. Please be sure to run tmonitor as root via sudo, for example: sudo /opt/continuent/tungsten/cluster-home/bin/tmonitor install shell>sudo /opt/continuent/tungsten/cluster-home/bin/tmonitor install
node_exporter init.d boot script installed and activated Use either `sudo service node_exporter start` or `tmonitor --node start` now to start the Node exporter. mysqld_exporter init.d boot script installed and activated Use either `sudo service mysqld_exporter start` or `tmonitor --mysql start` now to start the MySQL exporter.
Example: Install systemd boot scripts for all external exporters:
shell>tmonitor install
ERROR: You must be root to install or remove boot services. Please be sure to run tmonitor as root via sudo, for example: sudo /opt/continuent/tungsten/cluster-home/bin/tmonitor install shell>sudo /opt/continuent/tungsten/cluster-home/bin/tmonitor install
Created symlink from /etc/systemd/system/multi-user.target.wants/node_exporter.service to /etc/systemd/system/node_exporter.service. node_exporter systemd boot script installed and enabled Use either `sudo systemctl start node_exporter` or `tmonitor --node start` now to start the Node exporter. Created symlink from /etc/systemd/system/multi-user.target.wants/mysqld_exporter.service to /etc/systemd/system/mysqld_exporter.service. mysqld_exporter systemd boot script installed and enabled Use either `sudo systemctl start mysqld_exporter` or `tmonitor --mysql start` now to start the MySQL exporter.
Example: Remove init.d boot scripts for all external exporters:
shell>tmonitor remove
ERROR: You must be root to install or remove boot services. Please be sure to run tmonitor as root via sudo, for example: sudo /opt/continuent/tungsten/cluster-home/bin/tmonitor remove shell>sudo /opt/continuent/tungsten/cluster-home/bin/tmonitor remove
node_exporter is still running, unable to remove. Please run either `tmonitor --node stop` or `sudo service node_exporter stop`, then retry this operation... shell>tmonitor stop
Node exporter stopped MySQL exporter stopped shell>sudo /opt/continuent/tungsten/cluster-home/bin/tmonitor remove
node_exporter init.d boot script de-activated and removed mysqld_exporter init.d boot script de-activated and removed
Example: Remove systemd boot scripts for all external exporters:
shell>tmonitor stop
Node exporter stopped MySQL exporter stopped shell>tmonitor remove
ERROR: You must be root to install or remove boot services. Please be sure to run tmonitor as root via sudo, for example: sudo /opt/continuent/tungsten/cluster-home/bin/tmonitor remove shell>sudo /opt/continuent/tungsten/cluster-home/bin/tmonitor remove
node_exporter systemd unit boot script disabled and removed mysqld_exporter systemd unit boot script disabled and removed
The tmonitor command is located in the
$CONTINUENT_ROOT/tungsten/cluster-home/bin
directory.
The tmonitor command will only be
available in the PATH
if the Tungsten software has
been installed with the configuration option
profile-script
included.
The node_exporter will respond to
requests on port 9100
, path
/metrics
The tmonitor command is the best way to manage the node_exporter binary.
For example, to test a running node_exporter service:
shell>tmonitor --node test | wc -l
869 shell>tmonitor --node test | head -10
==================================================================================================== == Metrics for the node exporter: ==================================================================================================== # HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 1.0214e-05 go_gc_duration_seconds{quantile="0.25"} 1.3524e-05 go_gc_duration_seconds{quantile="0.5"} 2.2065e-05 go_gc_duration_seconds{quantile="0.75"} 4.1943e-05 go_gc_duration_seconds{quantile="1"} 0.003692845
To start the node_exporter binary (only), and then get the status:
shell>tmonitor start --node
shell>tmonitor status --node
Node exporter running ok
The node_exporter command is not
included in the PATH
unless you add it manually.
The node_exporter binary is located
in the
$CONTINUENT_ROOT/tungsten/cluster-home/prometheus
directory.
The mysqld_exporter will respond to
requests on port 9104
, path
/metrics
The mysqld_exporter command will
read MySQL server connection information from the
~/.my.cnf
file. Since the default listener port for
the MySQL server is 13306
, the file
must contain at least:
shell> cat ~/.my.cnf
[client]
port=13306
user={tungsten_database_user_here}
password={tungsten_database_password_here}
The tmonitor command is the best way to manage the mysqld_exporter binary.
For example, to test a running mysqld_exporter service:
shell>tmonitor --mysqld test | wc -l
1813 shell>tmonitor --mysqld test | head -10
==================================================================================================== == Metrics for the mysql exporter: ==================================================================================================== # HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 1.0051e-05 go_gc_duration_seconds{quantile="0.25"} 1.4436e-05 go_gc_duration_seconds{quantile="0.5"} 3.3925e-05 go_gc_duration_seconds{quantile="0.75"} 6.3136e-05 go_gc_duration_seconds{quantile="1"} 0.000551545
To start the mysqld_exporter binary (only), and then get the status:
shell>tmonitor start --mysqld
shell>tmonitor status --mysqld
mysqld exporter running ok
The mysqld_exporter command is not
included in the PATH
unless you add it manually.
The mysqld_exporter binary is
located in the
$CONTINUENT_ROOT/tungsten/cluster-home/prometheus
directory.
The replicator will respond to
requests on port 8093
, path
/metrics
The tmonitor command is the best way to test the replicator exporter.
For example, to test a running replicator exporter service:
shell>tmonitor --replicator test | wc -l
148 shell>tmonitor -t -R test
==================================================================================================== == Metrics for the replicator exporter: ==================================================================================================== # HELP tungsten_replicator_version The Tungsten Clustering software version number # TYPE tungsten_replicator_version gauge tungsten_replicator_version{version="Tungsten Clustering 7.0.0 build 478",vendor="Continuent",name="Tungsten Replicator",} 1.0 # HELP tungsten_replicator_service Replicator service value # TYPE tungsten_replicator_service gauge tungsten_replicator_service{service="east",role="master",state="online",} 1.0 tungsten_replicator_service{service="east_from_west",role="relay",state="online",} 1.0 # HELP tungsten_replicator_seqno Replicator min/max/current sequence number value # TYPE tungsten_replicator_seqno gauge tungsten_replicator_seqno{service="east",seqno="minimum",} 0.0 tungsten_replicator_seqno{service="east",seqno="maximum",} 741693.0 tungsten_replicator_seqno{service="east",seqno="current",} 741693.0 tungsten_replicator_seqno{service="east_from_west",seqno="minimum",} 0.0 tungsten_replicator_seqno{service="east_from_west",seqno="maximum",} 638682.0 tungsten_replicator_seqno{service="east_from_west",seqno="current",} 638682.0 # HELP tungsten_replicator_latency Replicator applied/relative latency value # TYPE tungsten_replicator_latency gauge tungsten_replicator_latency{service="east",latency="applied",} 0.754 tungsten_replicator_latency{service="east",latency="relative",} 0.762 tungsten_replicator_latency{service="east_from_west",latency="applied",} 0.738 tungsten_replicator_latency{service="east_from_west",latency="relative",} 0.763
The manager will respond to requests
on port 8091
, path
/metrics
The tmonitor command is the best way to test the manager exporter.
For example, to test a running manager exporter service:
shell>tmonitor --manager test | wc -l
146 shell>tmonitor -t --manager test
==================================================================================================== == Metrics for the manager exporter: ==================================================================================================== # HELP tungsten_manager_version The Tungsten Clustering software version number # TYPE tungsten_manager_version gauge tungsten_manager_version{version="Tungsten Clustering 7.0.0 build 478",vendor="Continuent",name="Tungsten Manager",} 1.0 # HELP tungsten_manager_service The service and datasource served by this Tungsten Manager # TYPE tungsten_manager_service gauge tungsten_manager_service{service="east",datasource="db1-demo.continuent.com",role="master",state="online",} 1.0 # HELP tungsten_manager_policy The current Tungsten Manager policy mode # TYPE tungsten_manager_policy gauge tungsten_manager_policy{policy="MAINTENANCE",} 0.0 # HELP tungsten_manager_connections Connector active/total connections value # TYPE tungsten_manager_connections gauge tungsten_manager_connections{connections="active",} 1.0 tungsten_manager_connections{connections="total",} 149.0
The connector will respond to
requests on port 8092
, path
/metrics
The tmonitor command is the best way to test the connector exporter.
For example, to test a running connector exporter service:
shell>tmonitor --connector test | wc -l
136 shell>tmonitor -t -C test
==================================================================================================== == Metrics for the connector exporter: ==================================================================================================== # HELP tungsten_connector_version The Tungsten Clustering software version number # TYPE tungsten_connector_version gauge tungsten_connector_version{version="Tungsten Clustering 7.0.0 build 478",vendor="Continuent",name="Tungsten Connector",} 1.0 # HELP tungsten_connector_connections Connector active/total connections value # TYPE tungsten_connector_connections gauge tungsten_connector_connections{dataservice="east",datasource="db1-demo.continuent.com",connections="active",} 0.0 tungsten_connector_connections{dataservice="east",datasource="db1-demo.continuent.com",connections="total",} 0.0 # HELP tungsten_connector_connections Connector active/total connections value # TYPE tungsten_connector_connections gauge tungsten_connector_connections{dataservice="east",datasource="db2-demo.continuent.com",connections="active",} 0.0 tungsten_connector_connections{dataservice="east",datasource="db2-demo.continuent.com",connections="total",} 0.0 # HELP tungsten_connector_connections Connector active/total connections value # TYPE tungsten_connector_connections gauge tungsten_connector_connections{dataservice="east",datasource="db3-demo.continuent.com",connections="active",} 0.0 tungsten_connector_connections{dataservice="east",datasource="db3-demo.continuent.com",connections="total",} 2.0 # HELP tungsten_connector_connections Connector active/total connections value # TYPE tungsten_connector_connections gauge tungsten_connector_connections{dataservice="usa_active_active",datasource="east",connections="active",} 0.0 tungsten_connector_connections{dataservice="usa_active_active",datasource="east",connections="total",} 0.0 # HELP tungsten_connector_connections Connector active/total connections value # TYPE tungsten_connector_connections gauge tungsten_connector_connections{dataservice="usa_active_active",datasource="west",connections="active",} 0.0 tungsten_connector_connections{dataservice="usa_active_active",datasource="west",connections="total",} 0.0 # HELP tungsten_connector_connections Connector active/total connections value # TYPE tungsten_connector_connections gauge tungsten_connector_connections{dataservice="west",datasource="db4-demo.continuent.com",connections="active",} 0.0 tungsten_connector_connections{dataservice="west",datasource="db4-demo.continuent.com",connections="total",} 0.0 # HELP tungsten_connector_connections Connector active/total connections value # TYPE tungsten_connector_connections gauge tungsten_connector_connections{dataservice="west",datasource="db5-demo.continuent.com",connections="active",} 0.0 tungsten_connector_connections{dataservice="west",datasource="db5-demo.continuent.com",connections="total",} 0.0 # HELP tungsten_connector_connections Connector active/total connections value # TYPE tungsten_connector_connections gauge tungsten_connector_connections{dataservice="west",datasource="db6-demo.continuent.com",connections="active",} 0.0 tungsten_connector_connections{dataservice="west",datasource="db6-demo.continuent.com",connections="total",} 0.0
Below are example alerting rules for Prometheus.
These rules cover basic cluster health and operation.
- alert: 'MasterReadOnly' expr: mysql_global_variables_read_only == 1 and on(hostname)(tungsten_manager_service{role="master"}) for: 2m description: Database is read only on {{ $labels.hostname}}, but the role is master. - alert: 'TungstenReplicatorDown' expr: up == 0 and {job=~"tungsten-exporters.*",instance=~".*8091"} for: 10m description: 'Tungsten replicator down or unreachable on {{ $labels.hostname }}, please verify that the replicator is running and exporter is returning metrics' - alert: 'TungstenManagerDown' expr: up == 0 and {job=~"tungsten-exporters.*",instance=~".*8092"} for: 10m description: 'Tungsten manager down or unreachable on {{ $labels.hostname }}, please verify that the manager is running and exporter is returning metrics' - alert: 'TungstenConnectorDown' expr: up == 0 and {job=~"tungsten-exporters.*",instance=~".*8093"} for: 10m description: 'Tungsten connector down or unreachable on {{ $labels.hostname }}, please verify that the connector is running and exporter is returning metrics' - alert: 'TungstenReplicatorOffline' expr: tungsten_replicator_service{state!="online"} for: 10m description: 'Tungsten replicator not online on {{ $labels.hostname }}, please investigate. (shell> trepctl status)' - alert: 'TungstenManagerOffline' expr: tungsten_manager_service{state!="online"} for: 10m description: 'Tungsten manager not online on {{ $labels.hostname }}, please investigate. (shell> trepctl status) and (shell> echo ls | cctrl)' - alert: 'TungstenManagerMaintenance' expr: tungsten_manager_policy{policy!="AUTOMATIC"} for: 6h description: 'Tungsten manager policy not AUTOMATIC on {{ $labels.hostname }}, please check if cluster is still under maintenance.' - alert: 'TungstenTwoReplicatorMasters' expr: sum by (vip)(tungsten_replicator_service{role="master",state="online"})!= 1 for: 10m description: 'Tungsten - multiple replicator masters (shell> trepctl services) for {{ $labels.vip }}, cannot serve two masters. Please investigate immediately.' - alert: 'TungstenHeapSpaceUsage' expr: jvm_memory_bytes_used{area="heap"}/jvm_memory_bytes_max{area="heap"}*100 > 90 for: 20m description: 'Tungsten - heap space more than 90% full for more than 20 minutes on {{ $labels.instance}}. (look at tmsvc.log)' - alert: 'TungstenReplicaStale' expr: tungsten_replicator_latency{latency="relative"} > 3600 for: 10m description: 'Tungsten - no updates on replica {{ $labels.hostname}} for 60 minutes. Check if replicas are behind or if there is no DB activity to replicate' - alert: 'TungstenReplicaNoProgress' expr: rate(tungsten_replicator_seqno{seqno="current"}[10m]) == 0 for: 10m description: 'Tungsten - no updates on replica {{ $labels.hostname}} for over 10 minutes. Check if replicas are behind or if there is no DB activity to replicate' - alert: 'TungstenZeroDatasourceMasters' expr: (sum by (vip)(tungsten_manager_service{role="master"}) < 1) and (sum by (vip)(tungsten_manager_service{role="relay"}) < 1) for: 10m description: 'Tungsten - zero datasource masters/relays (shell> echo ls | cctrl) for {{ $labels.vip }}, cannot function with no datasource masters. Please investigate immediately.' - alert: 'TungstenMasterRoleNotConsistent' expr: (sum by (hostname)(tungsten_manager_service{role="master"}) == sum by (hostname)(tungsten_replicator_service{role="master"})) != 1 for: 10m description: 'Tungsten role=master not consistent between manager and replicator on {{ $labels.hostname}}.' - alert: 'TungstenSlaveRoleNotConsistent' expr: (sum by (hostname)(tungsten_manager_service{role="slave"}) == sum by (hostname)(tungsten_replicator_service{role="slave"})) != 1 for: 10m description: 'Tungsten role=slave not consistent between manager and replicator on {{ $labels.hostname}}.' - alert: 'TungstenRelayRoleNotConsistent' expr: (sum by (hostname)(tungsten_manager_service{role="relay"}) == sum by (hostname)(tungsten_replicator_service{role="relay"})) != 1 for: 10m description: 'Tungsten role=relay not consistent between manager and replicator on {{ $labels.hostname}}.'