The importance of observability

Over the past eight years, Lightbend has seen first-hand the importance of observability when developing and managing reactive and streaming applications. Distributed systems, by nature, are unpredictable. Despite our best efforts, failures and performance bottlenecks in such systems are inevitable—​and can be difficult to isolate. In this complex environment, having deep visibility into the behavior of your applications is critical for software development teams.

The Costs of Deferred Observability

Without deep observability, it is natural to make assumptions about production system behavior, including what we think may be potential performance bottlenecks or failure scenarios. When failures do occur, we are often in the dark as to both the cause and the impact of potential fixes. This leads to wasted time and effort, jumping from one theory to another and one change to another change without fully understanding the impact on the system. If customers are impacted, the cost of this guess work to the business can escalate quickly. Historically, this has been the point at which Lightbend often receives an urgent call for help.

In production, while Kubernetes can help to recover from some failures, there are many scenarios that can cause a system to run sub-optimally or to fail continuously. Even when service availability is maintained, performance bottlenecks can result in premature auto-scaling, resulting in the excessive use of costly cloud computing resources. Indeed, there are cases where the first sign of failure in a non-observable system is the cloud computing bill.

Day 1 Observability

Observability is about bringing visibility into a system - turning the lights on, to see and understand the state of each component of the system, with context to aid with debugging and performance tuning. While traditional monitoring systems may have been the realm of operations engineers, today’s cloud-native applications must be developed with observability at the core of their design. Today, observability is a day 1 developer concern.

Building observable systems requires understanding the many ways in which they can fail. While it’s tempting to “just measure everything”, this creates an excess of information that results in no further insights into the system’s behavior and costly infrastructure expenditures. Knowing what to measure and how to measure it is critical.

Repeated iterations of testing and observing are critical to understanding the many ways in which failures can occur. Over the past eight years, Lightbend, working closely with our customers and partners, has been developing and managing reactive and streaming applications. This knowledge and experience directly contributed to the design Lightbend Platform’s observability features, which allow you to take advantage of Lightbend’s established knowledge base.

Observability and Lightbend Platform

While logs, metrics and traces are critical, these on their own are not enough for visibility to a system. Observability requires combining this data with rich context to create an understanding that is ideal for debugging and performance tuning. On day 1 with Lightbend Platform, developers are able to see the impact of their changes on system performance using Lightbend Telemetry and the Developer Sandbox or Lightbend Console:

  • Lightbend Telemetry provides deep visibility into Lightbend Platform in the form of events, metrics and distributed tracing for components such as: Play, Akka Http, Lagom, Akka Actors, Akka Streams, Akka Cluster and Akka Persistence. This helps to answer questions such as: “What is the message mailbox time for an actor?”, “What is the distribution of sharded entities in my Akka Cluster?”, “What part of my Akka Stream has the worst latency?” and much, much more. A custom API is included for instrumenting domain-specific KPIs, such as number of orders processed.

  • The Developer Sandbox provides a Docker Compose based observability tool-kit for developers to gain insights into their applications in real-time as they develop them.

  • Lightbend Console, which is ready to run on Kubernetes, provides a combination of deep insights and rich context to enable you to more quickly debug and tune applications with dashboards that are focused on Lightbend Platform features, such as Lightbend Pipelines.

Lightbend Platform’s observability features do not stand on their own. They work together with your in-house monitoring setup to provide deep visibility into the behavior of your Lightbend Platform applications. A wide range of integrations are offered, such that Lightbend Platform can be monitored with your tooling of choice. These integrations include: Prometheus, Elasticsearch, Grafana, DataDog, New Relic, Kibana, SLF4J, StatsD, JMX as well as OpenTracing integrations for Zipkin and Jaegar.