Lightbend Console provides the following information:
- Cluster page—displays an overview of the workloads and pods in the cluster in the Cluster Pod Map and the Workloads table.
- Workloads page—lists the monitors for a given workload and shows their health.
- Monitors page—details a monitor’s attributes, metrics, and current health, allows you to create new monitors and edit existing ones.
- Grafana Dashboards—graphs of all the metrics backing the monitors of a workload as well as KPI’s for the workload (based on its service type).
- Lightbend Telemetry Dashboards—pre-configured Grafana dashboards that display Lightbend Telemetry metrics.
Navigate the Console by drilling down from the cluster to workloads to monitors or open the Grafana dashboard:
- From the Cluster page click a workload, either in the table or the map, to open the Workload page that shows all monitors and some workload details.
- From the Workload page, click a monitor to open the Monitor page.
- From the left panel of any page, click the Grafana icon to open the Grafana dashboard.
- To return to the Cluster page, click the Lightbend Console logo in the top left or in the breadcrumbs.
In general terms, the Lightbend Console monitors Prometheus metrics so that you can:
- Track the health of the workload (i.e. the application) being monitored.
- Get alerts when the metric becomes unhealthy.
A monitor in Lightbend Console can be thought of as a specification for generating health and alert rules for a particular Prometheus metric.
A monitor definition includes:
- a name
- a workload name
- a Prometheus metric
- a monitor type (e.g. threshold, simple moving average, growth rate)
- a function and parameters appropriate for the monitor type
The predefined monitor types are threshold, simple moving average (sma), and growth. These are described in detail below.
A monitor is used to generate a set of health and alert recording rules in Prometheus, and there are often multiple sets per monitor. There might be a separate set for each pod generating that Prometheus metric for example. The Console comes with several predefined monitors (e.g. Kubernetes-related and Akka-related monitors) all with reasonable default configurations. As required, the Console allows a user to create or delete monitors, and modify associated parameters, resulting in health and alert rules tuned for their particular application.
With every Prometheus scrape, we get new metric data and that data is tagged. The tags qualify where the data applies. They may indicate which pod, or application the data comes from for example. A monitor is tied to a workload (typically the same as application but not necessarily). The monitor is used to generate health (and alert) time series for the associated workload. A single monitor definition may thus be used to generate multiple health (and alert) time series, depending on how the data labels are partitioned.
For example, for a given monitor there may be separate health/alert rules for each pod in the workload. All those time series would share a workload and Prometheus metric (as associated with the monitor) but differ on the pod tag.
The monitor rule compares some function of the metric values against thresholds to determine instantaneous health values. We calculate both a warning and critical instantaneous health value for each health series.
The health status for a health series is based on how the instantaneous values trend over time. For a given time window and severity (e.g. 15 minutes and warning), some percentage of the instantaneous health values may be of the given severity. If that percentage is over another threshold (e.g. 50%), then the health will be said to have a state of severity (e.g. warning) in that period. This calculation also happens with every scrape.
Finally, one can specify if an alert should be triggered based on the health. Separate alerts can be created for the warning and critical health states. What it means to “alert” is managed by your alertmanager configuration.
Assume you have an application with two actors–
consumer–both implemented in Akka. Also assume you had two
producer pods and three
consumer pods. One aspect of this scenario that you might want to monitor is the processing time for requests to the
There are default monitors defined for Akka apps, and one of them is
akka_processing_time. It’s based on the
akka_actor_processing_time_ns Prometheus metric and is of the simple moving average type. Corresponding health and alert time series will therefore be created automatically for the actors.
By default, there will be health and alert time series for each combination of actor and pod. The Console would allow you to group by pod only if desired.
The monitor parameters could be set to consider the health to be in a warning state if processing time went beyond 1 standard deviation of the average. The UI would allow you to see if/when this is happening for each pod. (This would suggest things are starting to get much slower.)
An alert could be configured to to trigger if the health goes into the warning state for 75% of the samples in a 5 minute period.
The Console comes with a predefined set of basic monitor types. The sections linked below describe the workings of each monitor type.
See Editing Monitors to understand how to tune monitor parameters to obtain the desired level of monitoring and alerting.