The Monitor page shows details of an individual monitor. You can edit them or create a new monitor from scratch. In edit mode, as you change parameters, the visualization graph and health bar update to show the proposed effect, based on the accumulated metrics.
Navigation elements on the Workload page enable you to:
- Open a Grafana dashboard for this workload in a separate tab by clicking the Grafana icon in the Controls panel.
- Open the Monitor page to view or edit a particular monitor by clicking the monitor row.
- Return to the Cluster page by clicking the Cluster link in the breadcrumbs in the upper left.
In edit mode (click the ‘EDIT’ button in upper right) the Monitor Details pane displays parameters describing the type and behavior of the monitor. Select a monitor to view its history. After clicking Edit, you can change its type and configure alerts.
Monitor details fall into four categories:
Data Source: The metric and labels on which to filter (see red outline in above image). There can be multiple filters to sub-select only those samples of this metric of interest. In addition, the metric can be broken apart into groups based on a label key - meaning that the health of the monitor will be calculated at the group level and any abnormality (again at the group level) can trigger an alert. This grouping also gives you visibility into the metric’s value at the group level.
Type Attributes: The monitor types supported by Lightbend Console: Threshold Monitor, Simple Moving Average, and Growth Rate. Select the type and basic attributes intrinsic to that type (orange box above).
Trigger Severity: Monitors can be instrumented for warning level alerts and / or critical alerts. Each severity has attributes that govern its triggering (green box above).
Trigger Occurrence: This controls the sensitivity (or confidence) of the monitor and can vary from ‘at least once’ to 100%. The ‘at least once’ setting causes any violation of the conditions specified in the trigger severity section to trigger an alert. This sensitivity could result in a lot of nuisance alerting. At a value of 100%, a very insensitive setting, every sample within the time window specified in the type attributes must be in violation before an alert is thrown. See the blue box above.
Monitors can be defined as monolithic entities or as a group of sub-monitors - depending on the group-by monitor parameter. This grouping construct allows very fine control of the monitor - and thus alerting - when problems occur. Group-by can be all (single monolithic monitor), a single metric label or ‘none’ (each unique collection of labels forms a group). Clicking a group health bar selects that group for monitor visualization. The title on each group is the set of metric labels unique to that group.
A visualization of the selected monitor, or sub-group of a monitor, displays at the bottom of the Monitors page. Not only is the variation of the metric presented in graph form here but also the derived time series that are fundamental to the monitor definition. These include: threshold levels (if any) or an intermediate time series such as
- the moving average
- standard deviations about that moving average (for a simple moving average monitor) or
- the slope of the metric (for a growth rate monitor).
This page supports two modes of operation: inspection (default) as well as editing. Click EDIT in the upper right to switch between the two.
In inspection mode you can freeze time to investigate a particular period of interest or enable ‘live’ mode to continuously update the monitor visualization with up-to-date metrics (the mode for all other Lightbend Console pages).
Investigating an anomaly and/or editing a monitor definition generally requires a stationary time period. For this reason only ‘non-live’ mode is supported during monitor editing.
There are two relevant time periods for the monitor page: the context period and focus period. The context period is set by the VIEW control in the upper right (and is shared by all pages). This is the larger period in which the focus period lives. The focus period is a one hour subset within the context period.
Change the focus period by:
- clicking the left/right arrows above the blue ‘lens’ - to increment by one hour, or
- dragging the lens to precisely position it in time
Hover over the LEGEND text in the graph upper left to display a transient legend for the graph. In addition to the identification of graph elements, the length scale of some elements in the legend is identical to that used in the graph itself. In this example the ‘AVERAGE WINDOW’ length scale is an accurate representation of the time window (period) used to calculate the monitor.
SMA Legend Growth Rate Legend
The Monitor Health bar rolls up the health of the monitor and any sub-group according to the monitor’s current parameters.
Any changes you make to the monitor definition trigger re-evaluation based on the new criteria. The results display in the group health section, the selected group visualization and the monitor health panels. These panels show how the monitor would have performed over the focus period had this definition been in effect over the entire period. This ‘what-if’ evaluation entails substantial computations but occurs quite fast for ‘group-by: all’ monitors, however it can be slower for monitors with numerous groupings. The group health bars are shown in a crawling gray pattern until their computation is complete.
Once you’re satisfied with the new monitor definition, click ‘SAVE CHANGES’ (upper right) and you’ll be taken back to the workload page. Of course you can always back out the changes by canceling them or simply navigating to another page in the application.
The Monitor Change Log in the left panel gives you a quick view of the monitor’s history. Clicking on the blue ‘revert’ arrow will return the monitor’s definition to that prior state.