Pagerduty is a commercial incident management platform that orchestrates response time for customers, employees, and business value. It supports group/triage, and alert/collaborate incident reporting with granular reporting for resolution. Pagerduty integrates with email, provides a custom API for HTTP via REST and has an agent for command-line style integration. It also has a sizable extensions library for support with external tools such as Ansible, Slack, Flowdock, Google, to name a few.

Riemann notifications

Riemann supports the notion of notification via its stream processing. As such, everything is a stream with a start (which could be an aggregation) and an end. The best way to think about Riemann’s support for notification is an example from their docs.

Here’s a simple stream which sends critical events from any service beginning with “lagom” to

  (where (and (service #"^lagom")
              (state "critical"))
         (email "")))

The first argument to where is the predicate expression, which “where” is used to test each event. Events which match the predicate get passed to each of where’s children, and if they do not match, then nothing happens. In this case, there’s one child the email stream (notification.)

Prometheus alerts

While Prometheus is far more than an alerting system, alerting is one of its high points. Prometheus alerting allow the user to define alert conditions based on an expression language that fires alerts to an external service. This ability to define alert expressions is significant in that it cuts down (depending on the expression) on the potential of “alert fatigue.”