Akka

Akka metrics come from Lightbend Telemetry (Cinammon), and describe performance of actors and http endpoints.

akka_inbox_growth

Akka applications have actors, and each actor has a mailbox. An actor mailbox is a kind of queue for work to be done. The akka_inbox_growth monitor alerts if median queue size across instances of some actor class trends upward over time, meaning the actors are not keeping up with incoming requests.

Failure Examples

TBD

Suggested Actions

TBD

Implementation

This is a growth monitor based on the measure of mailbox size provided by Lightbend Telemetry akka_actor_mailbox_size.

Tuning

Basic

There are two parameters that can be tuned in the growth model and one in the health model based on that.

1- Rate of growth is calculated via the “deriv” query function of Prometheus which uses linear regression to estimate the per-second derivative of the time-series over a given window. The length of the window should be in agreement with the expected trend rate of the data, and should be at least twice as long as the period of cyclical effects if cyclical effects are suspected. Shorter windows will lead to spurious detections, while longer windows will dampen the sensitivity. The length of window is 15 minutes by default.

2- The threshold to flag growth as detected is set to 0.1 (10 percent) by default. The ideal value may depend on the application.

3- The health model requires the growth to be flagged constantly over a specified window before it leads to an alert. The default window length is 5 minutes. Shorter health windows make the health model more sensitive, but if it is coupled with shorter growth window length, it will may lead to increase in repeated false alarms. Longer windows may cause missed detection.

Advanced

The underlying Lightbend Telemetry metric is akka_actor_mailbox_size which is a histogram. The growth model uses the median of the metric. For worst case scenarios it may be appropriate to use a higher quantile such as 95th percentile.

akka_processing_time

Akka actors take one message at a time from their mailbox and “process” that message before taking the next one. The akka_processing_time monitor warns if that processing time is unusual compared to previous performance. This warning occurs if processing time rises or drops more than two standard deviations from average processing time, which by the empirical rule stands for a 95% chance that something unusual is happening.

Failure Examples

TBD

Suggested Actions

TBD

Implementation

This is a SMA monitor based on the measure of how much time actors take to process messages as provided by Lightbend Telemetry akka_actor_processing_time_ns.

This underlying metric is typically aggregated by actor class.

Tuning

Basic

There are two parameters that can be tuned in the SMA model. First one is the window size for estimating the running mean and standard deviation. The default value is 15 minutes. Shorter windows will make the estimate of the mean sensitive to transient effects. The second parameter is the multiplier of the standard deviation. The default value is 2. Larger values will cause the model to accept larger deviations from the running mean as normal.

Advanced

The underlying Lightbend Telemetry metric is akka_actor_processing_time_ns which is a histogram. The SMA model uses the median of the metric. For worst case scenarios it may be appropriate to use a higher quantile such as 95th percentile.

akka_http_server_response_time

Akka HTTP servers take remote requests, initiate some work, and then respond with an HTTP code to the remote requestor. The akka_http_server_response_time monitor warns if that time is anomalous compared to average response time, in similar manner to the akka_processing_time monitor.

Failure Examples

TBD

Suggested Actions

TBD

Implementation

This is a SMA monitor based on the measure of the time it takes for the server to respond as provided by Lightbend Telemetry akka_http_http_server_response_time_ns

Tuning

Basic

There are two parameters that can be tuned in the SMA model. First one is the window size for estimating the running mean and standard deviation. The default value is 15 minutes. Shorter windows will make the estimate of the mean sensitive to transient effects. The second parameter is the multiplier of the standard deviation. The default value is 2. Larger values will cause the model to accept larger deviations from the running mean as normal.

Advanced

The underlying Lightbend Telemetry metric is akka_http_http_server_response_time_ns which is a histogram. The SMA model uses the median of the metric. For worst case scenarios it may be appropriate to use a higher quantile such as 95th percentile.

akka_http_client_response_time

Akka applications may communicate with remote HTTP servers by making requests and waiting for a response. The akka_http_client_response_time monitor warns if that waiting time is anomalous compared to average wait time, in similar manner to the akka_processing_time monitor.

Failure Examples

TBD

Suggested Actions

TBD

Implementation

This is a SMA monitor based on the measure of the client wait time as provided by Lightbend Telemetry akka_http_http_client_http_client_service_response_time_ns

Tuning

Basic

There are two parameters that can be tuned in the SMA model. First one is the window size for estimating the running mean and standard deviation. The default value is 15 minutes. Shorter windows will make the estimate of the mean sensitive to transient effects. The second parameter is the multiplier of the standard deviation. The default value is 2. Larger values will cause the model to accept larger deviations from the running mean as normal.

Advanced

The underlying Lightbend Telemetry metric is akka_http_http_client_http_client_service_response_time_ns which is a histogram. The SMA model uses the median of the metric. For worst case scenarios it may be appropriate to use a higher quantile such as 95th percentile.

akka_http_server_5xx

Akka HTTP server endpoints may receive requests that cause crashes or otherwise result in 500 class HTTP responses. The akka_http_server_5xx monitor warns if such 5xx errors are observed continuously for 5 minutes.

Failure Examples

TBD

Suggested Actions

TBD

Implementation

This is a threshold model based on the rate of 5xx errors per second as provided by Lightbend Telemetry akka_http_http_server_responses_5xx_rate

Tuning

Basic

The default value of the threshold is 0. Although, this can be changed, it is probably the only meaningful value.