Cloudflow (formerly Pipelines) Kafka

These monitors are used by Lightbend Pipelines to track health of Kafka clients. They are not meant for monitoring any other Kafka workloads besides Lightbend Pipelines applications. The metrics come from the Kafka Java Consumer and Kafka Java Producer libraries, via jmx_exporter, and describe throughput for message handling. More details in Lightbend Pipelines Docs

pipelines_kafka_consumer_throughput

Kafka consumers take batches of messages from brokers, process them, then take the next batch in a continuous loop. Throughput for a consumer is the average number of messages read per second. The pipelines_kafka_consumer_throughput monitor warns if that throughput is unusual compared to previous throughput. This warning occurs if throughput rises or drops more than three standard deviations from average throughput, which by the empirical rules means a 99% chance that something unusual is happening.

Kafka Java Consumer provides throughput per partition as kafka_consumer_consumer_fetch_manager_metrics_records_consumed_rate which Console aggregates per topic into kafka_consumer_topic_consumed_rate. Total throughput per topic is the input to this monitor.

pipelines_kafka_producer_throughput

Kafka producers send batches of messages to brokers. Throughput for a producer is the average number of messages sent per second. The pipelines_kafka_producer_throughput monitor warns if that throughput is anomalous compared to average throughput, in similar manner to the pipelines_kafka_consumer_throughtput monitor.

Kafka Java Producer provides throughput per partition as kafka_producer_producer_metrics_record_send_rate which Console aggregates per topic into kafka_producer_topic_send_rate for the input to this monitor.

pipelines_kafka_consumer_lag

Kafka consumers read messages from a broker after some producer has written those messages. “Lag” is the number of messages between a producer’s latest message and what the consumer is currently reading. So for example, if a producer has written 100 messages, but a consumer has only read 10 messages so far, lag would be 90 messages.

The pipelines_kafka_consumer_lag monitor alerts if that lag trends upward over time, meaning the consumer is not keeping up with the producer.

Kafka Java Consumer provides lag per partition as kafka_consumer_consumer_fetch_manager_metrics_records_lag_max which Console aggregates per topic and client instance into kafka_consumer_topic_lag_max for the input to this monitor.