Configuring default monitors
Lightbend Console provides a set of default monitors that are configured to detect common issues with Lightbend open source and commercial products. It is possible to tune the default monitors by providing a custom default monitors configuration file.
Getting configuration files
You can customize default monitors by modifying the config shipped with Lightbend Console. It consists of two files: default-monitors.json
and static-rules.yml
. The first is the actual default monitor configuration, the second defines prometheus recording rules for metrics that are used by some of the monitors. In most cases modifying default-monitors.json
is sufficient. The only reason to modify static-rules.yml
is for recording rules that use raw promql to produce custom metrics that will be used by a default monitor. This document will only cover modifying default-monitors.json
file. Download default monitor configuration files from a running Lightbend Console installation:
mkdir default-monitors-config
cd default-monitors-config
kubectl get configmap -n lightbend console-api-static-rules -o jsonpath='{.data.static-rules\.yml}' > static-rules.yml
kubectl get configmap -n lightbend console-api-default-monitors -o jsonpath='{.data.default-monitors\.json}' > default-monitors.json
cd ..
Modifying static-rules.yml
file
A metric available in prometheus is not always a good input to a monitor. For example, you might want to monitor rate of increase when the metric is a counter or you might want to filter out samples with specific labels. By modifying static-rules.yml
file it is possible to define new metrics based on existing ones. All prometheus recording rules defined in static-rules.yml
are accessible by monitors. A custom recording rule looks like this:
- record: prometheus_rule_evaluation_failures_rate
expr: irate(prometheus_rule_evaluation_failures_total[5m])
For more info on how to make custom prometheus recording rules look in prometheus docs.
Modifying default-monitors.json
file
The default-monitors.json
file consists of a list of monitors, each being one of three types - threshold, growth and simple moving average. More detailed descriptions of monitors and their type can be found in Console and Monitor Overview. Following in this page you can find an example JSON syntax for each monitor type with field descriptions.
Common monitor fields
All the monitor types share these fields:
monitorVersion
: must be “1”model
: one of “threshold”, “growth” or “sma”parameters.metric
: underlying prometheus metric or a static rule defined instatic-rules.yml
parameters.summary
: short summary of the monitor, used when alertingparameters.description
: templated description of the condition when monitor is unhealthy, used when alertingparameters.confidence
: confidence ratio of unhealthy/total samples inside the window that is needed to declare the monitor unhealthy; must be one of “5e-324” (means at least one sample), “0.25”, “0.5”, “0.75”, “0.95”, “1”parameters.filters
: monitor will only use samples that match this list of prometheus metric labels and their valuesparameters.severity
: one or both of “warning”, “critical”; inside each are monitor type specific parameters described below
Threshold monitor
"server_5xx": {
"monitorVersion": "1",
"model": "threshold",
"parameters": {
"metric": "http_server_responses_5xx_rate",
"window": "5m",
"confidence": "1",
"severity": {
"warning": {
"comparator": ">",
"threshold": "0"
}
},
"summary": "HTTP 5xx errors",
"description": "HTTP server at {{$labels.instance}} has 5xx errors"
}
}
parameters.window
: time window for calculating health, used together with the commonparameters.confidence
parameters.severity
: one or both of “warning”, “critical”parameters.severity.warning.comparator
: operator for comparing threshold to the metric value, one of “<”, “>”, “<=”, “>=”, “==”, “!=”parameters.severity.warning.threshold
: value to compare the metric against
Note that comparator and threshold follows the same syntax inside critical severity description too.
Growth monitor
"task_queue_growth": {
"monitorVersion": "1",
"model": "growth",
"parameters": {
"metric": "task_queue_length",
"filters": {
"quantile": "0.5"
},
"period": "15m",
"minslope": "0.1",
"confidence": "1",
"severity": {
"critical": {
"window": "5m"
}
},
"summary": "task queue growing",
"description": "node {{$labels.instance}} has a growing task queue"
}
}
parameters.period
: period used for calculating linear regression of the underlying metricparameters.minslope
: if linear regression line slope exceeds this, monitor is considered unhealthyparameters.severity.critical.window
: time window for calculating health, used together with the commonparameters.confidence
; same asparameters.window
in threshold and sma monitors, however growth monitors have separate windows for warning and critical severities
Underlying prometheus metric task_queue_length
is assumed to be a histogram of queue sizes aggregated by quantiles, so a filter is used to get median length.
SMA monitor
"task_throughput": {
"monitorVersion": "1",
"model": "sma",
"parameters": {
"metric": "task_consume_rate",
"period": "15m",
"minval": "1000",
"window": "15m",
"confidence": "1",
"severity": {
"warning": {
"numsigma": "3"
}
},
"summary": "task throughput is anomalous",
"description": "{{$labels.es_workload}} has unusual task throughput"
}
}
parameters.window
: time window for calculating health, used together with the commonparameters.confidence
parameters.period
: simple moving average windowparameters.minval
: minimum deviation from the sma required before considering the monitor unhealthyparameters.severity.warning.numsigma
: the monitor is considered unhealthy if the metric value exceeds numsigma standard deviations from the simple moving average over the period
Note that numsigma
standard deviation follows the same syntax inside critical severity description too.
Creating ConfigMap & Configuring Lightbend Console
Once default-monitors.json
and static-rules.yml
files are modified, create a Kubernetes ConfigMap
:
kubectl -n lightbend create configmap my-default-monitors-config --from-file=default-monitors-config/ --dry-run -o yaml | kubectl apply -f -
Now tell Lightbend Console to use the newly created ConfigMap
for default monitors by setting defaultMonitorsConfigMap
value in your values.yaml
:
consoleAPI.defaultMonitorsConfigMap: my-default-monitors-config
consoleAPI.staticRulesConfigMap: my-default-monitors-config
Then install console using lbc.py
, as described in the installation guide.