View metrics

Kalix projects have this dashboard built-in as part of the Control Tower in the Kalix Console. This is available out-of-the-box. You can see metrics such as "requests per second", "replicas" of a service, or "commands received" by a component. The data is grouped into seven categories: Services, Event Sourced Entities, Value Entities, Actions, Views, Workflows, and Subscriptions. This is an example of a section within the dashboard.

dashboard control tower metrics screenshot

There are two filters located at the top of the dashboard that allow filtering by service or endpoint. dashboard control tower metrics filters screenshot

When filtering by "Service" you are left with only the selected service(s) in the Service and component sections. On top of that, the second filter is reloaded with only the "Endpoints/Methods" belonging to the selected service(s).

"Endpoints/Methods" refers to one thing viewed from different perspectives. An endpoint is created by defining a method inside your service. With a Kalix protobuf service via an rcp description and with a Kalix Java service via @PostRequest, @PatchRequest or equivalent. The values in "Endpoints/Methods" filter come from the name of your protobuf definition or Java method. For example, the following class defines and endpoint with the method increaseBy:

public class ... {

  @PostMapping("/counter/{counter_id}/increase")
  public Effect<Number> increaseBy(@RequestBody Number increaseBy) {
    ...
  }
}

This creates an entry increaseBy in the filter Endpoint/Method.

Filtering by "Endpoint/Method" only affect the "Services" Category.

Categories

Services

Successful Requests: Rate of successful requests per second (reqs/s) over time, by endpoint. This is calculated with irate.

Failed Requests: Rate of requests (reqs/s) that raised an error when processing the request, over time, by endpoint. This is calculated with irate.

Processing time distribution(seconds): Number of requests grouped by processing duration, by endpoint. This duration only includes processing inside the Kalix service. This does not include the time between the client and the runtime from the request that generated the command, i.e. This does not include latency.

Example on how this gauge gets populated: A call that has a duration of 0.05 seconds will increase the counter of the bucket '0.05' and any other bucket with a greater duration.

Processing time distribution: Number of calls that fall into each processing time bucket over time, by endpoint i.e., a histogram of processing time, over time.

Instances: Number of running instances of the service.

Version: A single number, always increasing, shows the service incarnation number. E.g. for a service deployed three times, the value would be 3.

Data ops (read/writes): Total number of reads from the DB by any Kalix component of the service(s) and endpoint(s)/method(s) selected. Total number of writes by any Kalix component of the selected service(s) and endpoint(s)/method(s) selected.

Event Sourced Entities

Commands received: Rate of commands received per second over time.

Stored events: Total number of events stored per second, over time.

Data ops(reads/writes): Total number of reads when loading from the DB the latest snapshot and the events afterward. Total number of writes when persisting the events or the snapshots generated by the entity.

Processing time quantiles: Quantiles (50, 95 and 99) for the processing time of the commands. This duration only includes processing by the entity. This does not include the time between the client and the runtime from the request that generated this command, i.e. This does not include latency.

Value Entities

Commands received: Number of commands per second over time.

Data ops(reads/writes): Total number of reads when loading its state from the DB.

Total number of writes when persisting its state in the DB. Processing time distribution(seconds): Idem.

Processing time quantiles: Quantiles (50, 95 and 99) for the processing time of the commands. This duration only includes processing by the entity. This does not include the time between the client and the runtime from the request that generated this command, i.e. This does not include latency.

Actions

Message received: Number of messages per second over time.

Processing time quantiles: Quantiles (50, 95 and 99) for the processing time of the messages. This duration only includes processing by the action. This does not include the time between the client and the runtime from the request that generated this message, i.e. This does not include latency.

Views

Data ops(reads/writes): Total number of reads when loading the rows of the view from DB. Total number of writes when upserting the rows of the view on DB.

Workflows

Commands received: Number of commands per second over time.

Data ops(reads/writes): Total number of reads when loading from the DB the latest snapshot and the events afterward. Total number of writes, by workflow, when persisting the events or the snapshots generated by the workflow.

Processing time quantiles: Quantiles (50, 95 and 99) for processing duration. This duration only includes processing inside the workflow. This does not include the time between the client and the runtime from the request that generated this command, i.e. This does not include latency.

Subscriptions

Events consumption lag: Quantiles (50, 95, and 95) over time of the consumption of the events produced by a subscription. It can be a subscription to an entity, or to a topic and therefore is grouped by entity or topic.

Instances: Number of running instances of the subscription.

Events consumed: The processing rate of events consumed by a subscription. This is calculated with irate.