Monitor definitions should provide the level of health monitoring and alerting suitable for your applications and SLAs. The three types of built-in monitors provide parameters to tune the level of sensitivity and frequency of alerts.
Using a simple moving average monitor in an Akka application as an example, the steps below show how to edit and try what-if scenarios. In edit mode, as you change parameters, the visualization graph and health bar update to show the proposed effect, based on the accumulated metrics.
You can either reproduce the steps using your own application, or simply read them. The screen shots reflect the results of changes in the example we used.
If you are reproducing the steps with your own Akka app, follow these steps. If not, skip to Change tolerance
Find a simple moving average monitor.
Now all monitor parameters are editable and any changes we make will instantly change the monitor definition, allowing us to try what-if scenarios.
Now let’s work on the moving average calculation to see how the parameters affect the model and health calculation.
The MINIMUM TOLERANCE changes the sensitivity of the monitor.
Increase the MINIMUM TOLERANCE value to make the monitor less sensitive to small fluctuations in the metric. In this example, we increased from a value of
8000. The graph at the bottom of the image shows the change as a semi-opaque line whose thickness is equal to the minimum threshold:
In the graph, hover over LEGEND. The AVERAGE WINDOW scale equals the time duration over which the moving average is computed, which can be seen in the dog leg in the moving average curve. It is also used in the confidence, which we’ll discuss in a bit.
Changing the TIME WINDOW has an impact on monitor health.
Reduce the TIME WINDOW value. In our example, we changed from
10minutes. The white moving average line and the standard deviation warning bands change, since they are based on that moving average.
Decrease the TIME WINDOW further and notice the change in health.
Trigger Severity parameters also change the sensitivity of the monitor.
Decrease the STANDARD DEVIATION multiple. We changed from
1is probably a useless value but illustrates the point. Note the warning in the health bar below the graph. This also changes the health of the monitor (since this monitor - so far - only has one component or grouping).
Increase Trigger Occurrence to make the monitor less sensitive. We changed from
75%. The health bar now shows less orange:
Decrease Trigger Occurrence to make the monitor more sensitive. We changed from
25%and fewer triggers (the metric outside the standard deviation bands) fire before the health bar changes.
Click LEGEND again. Note the AVERAGE WINDOW to see that the health change persists for the period defined by the TIME WINDOW.
Set the Trigger Occurrence to AT LEAST once. This means that any deviation outside the bands results in poor health. The monitor becomes very sensitive.
In this example, we’ve been using only the WARNING level of Trigger Severity. Let’s observe the results from enabling the CRITICAL level.
Click the CRITICAL control to enable it. The graph now displays red bands for this severity:
Raise and lower the STANDARD DEVIATION. Again, reducing this value makes the monitor more sensitive. Now the graph and timeline contain critical (red) and warning (orange) indicators.
Up to now, we’ve used one grouping,
Change the AGGREGATION by selecting a metric label. We selected actor. Two health bars show above the graph, one for each actor associated with this metric. The first one,
com.lightbend.prior.Asset, is highlighted and shown in the graph:
Select another bar. In this case, we selected
AssetAggregatorand the graph changes to reflect its time series (as filtered by actor = com.lightbend.prior.AssetAggregator).
At this point you can either save the monitor or cancel out and return to the original definition. If you do save it and change your mind later, it is easy to revert to the previous setting from the Monitor Change Log.