Introduction

Cinnamon sandbox provides a succinct way to get monitoring to run in a development environment with our complete feature set, but what about production? While monitoring in development is great for debugging and testing applications ultimately production support is the primary goal.

The question most users ask is “how do I take the Lightbend monitoring experience to production readiness?” In this document, we will address this issue by showing that our 10-minute install experience is a microcosm of a larger ecosystem designed in the spirit of our other tools like Scala (scalable language) or Akka (distributed-by-default). One can view our 10-minute sandbox experience in conjunction with this document as a roadmap to production monitoring.

What is monitoring at scale?

Most people typically think of monitoring as a set of tools and manners by which we track, measure and maintain software applications. Also, If we are coming from the developer’s perspective with little experience in monitoring, we often have the unrealistic expectation that it should just plug into or surveil our existing system, require minimal effort to maintain, yet provides extensive feedback. The reality is that both of these understandings are naive and will ultimately lead to disappointing results.

Complexities of monitoring at scale

This document assumes the reader has a basic understanding of the inherent differences between monolithic and distributed application architectures and recognizes that a “one-size-fits-all” monitoring solution does not work in a distributed environment. As such, we will not discuss the rationale for our monitoring approach; rather, in this section, we will focus on some fundamental challenges and resolutions in monitoring distributed applications.

Distributed applications at their root are challenging to build and require optimization or tuning to their particular use cases. So, it’s not hard to believe that monitoring these types of applications presents its set of challenges that need consideration. As mentioned above, “one-size-fits-all” just doesn’t work especially in monitoring. On the other hand, we don’t want to roll a custom solution for every system we build, therefore we need to find that place, somewhere in between, that is “good-enough.” Following are two main features of this idea that are contained within our monitoring solution and provide a springboard from which our 10-minute install is extensible for a scalable production environment:

  • Driven by configuration
  • Distributed by default

In each one of these features, we capture both the complexity and resolution of monitoring at scale. Let’s take a look at them individually.

Driven by configuration

First, we have driven by configuration. Driving an application in this manner is nothing new and part of many toolkits we find today. While it can be challenging to find the right arrangement of settings, the result is well worth the effort as we can tune our monitoring solution to our specific needs. As such, we can not overstate their importance as optimization of configuration is crucial to the successful scaling and efficiency of any monitoring app.

Distributed by default

If we are monitoring applications that scale (i.e. distributed applications), then it’s important that the monitoring tool itself should scale (be a distributable application). From our perspective, that means our monitoring tool is capable of being elastic and resilient. When monitoring an Akka or Lagom application, by default Lightbend Monitoring benefits by the implicit scaling of these toolkits.

It’s important to note however that much of its ability to scale is directly related to configuration settings as with the underlying application. If the configuration is tuned poorly for the particular use case, then monitoring can experience the same challenges any distributed application might face. Lightbend Monitoring will add some overhead, and that overhead can significantly increase when the configuration is not optimized. For example, if we decide to monitor every actor in a system that spins up millions of temporary or anonymous ones, then we could incur significant overhead which could adversely affect not only monitoring but the monitored application as well.

Use case optimization

There is a common theme between the two features mentioned above, “optimization.” It’s important to understand that while monitoring provides powerful insight into the behaviors of an application or system, its ability to do so efficiently differs based on successful optimization of configuration for the given use case.

When viewed and implemented from this use-case perspective, monitoring becomes the “glue” that ties together the business value of an application with the metrics and events the application or system generates. In layman’s terms, monitoring interprets those inputs into a measurable user experience with the additional benefit that helps us identify the right customers of our monitoring solution, both the business and IT.

From a business perspective

The measurable experience for our users that monitoring provides gives valuable feedback to the company that helps determine whether or not the system or application is delivering what the customer wants. The reality of it is, the monitoring system and your application/system exist for one reason, to support the business and sustain the business for continued operation. As a result, it’s key that the monitoring solution tracks and provides application and system data that allows the company to make wise product and technology investments as well as determining the value of IT deliveries.

From an IT perspective

From an IT perspective, the measurable user experience ultimately determines the quality of service (QOS). Driving the ability to determine QOS is the fact that monitoring lets IT know the state of the application or system environment and provides a means to detect, diagnose and respond to problems and faults that arise. Without this type of information, IT would be running blind and unable to take the appropriate steps to maintain a reliable and required QOS.