Skip to main content
Tungsten Replicator

Alerting Using Prometheus Rules

Below are example alerting rules for Prometheus.

These rules cover basic cluster health and operation.

- alert: 'MasterReadOnly'
expr: mysql_global_variables_read_only == 1 and on(hostname)(tungsten_manager_service{role="master"})
for: 2m
description: Database is read only on {{$labels.hostname}}, but the role is master.

- alert: 'TungstenReplicatorDown'
expr: up == 0 and {job=~"tungsten-exporters.*",instance=~".*8091"}
for: 10m
description: 'Tungsten replicator down or unreachable on {{$labels.hostname}}, please verify that the replicator is running and exporter is returning metrics'

- alert: 'TungstenReplicatorOffline'
expr: tungsten_replicator_service{state!="online"}
for: 10m
description: 'Tungsten replicator not online on {{$labels.hostname}}, please investigate. (shell> trepctl status)'

- alert: 'TungstenHeapSpaceUsage'
expr: jvm_memory_bytes_used{area="heap"}/jvm_memory_bytes_max{area="heap"}*100 > 90
for: 20m
description: 'Tungsten - heap space more than 90% full for more than 20 minutes on {{$labels.instance}}. (look at tmsvc.log)'

- alert: 'TungstenReplicaStale'
expr: tungsten_replicator_latency{latency="relative"} > 3600
for: 10m
description: 'Tungsten - no updates on replica {{$labels.hostname}} for 60 minutes. Check if replicas are behind or if there is no DB activity to replicate'

- alert: 'TungstenReplicaNoProgress'
expr: rate(tungsten_replicator_seqno{seqno="current"}[10m]) == 0
for: 10m
description: 'Tungsten - no updates on replica {{$labels.hostname}} for over 10 minutes. Check if replicas are behind or if there is no DB activity to replicate'