From batch to streaming - the power of Lightbend Pipelines

In Extracting business value from all available data, we discussed why streaming is essential for maximizing the value extracted from data. It’s worth migrating many high-value batch applications to streaming to extract useful information sooner rather than later, even while many batch and other "offline" analytics, like data warehousing and machine learning model training, will remain essential for a complete environment.

This section provides an overview of how Lightbend Pipelines makes building, deploying, and managing streaming applications as painless as possible. See the Lightbend Pipelines documentation for more details.

When you’re building streaming applications, wouldn’t it be nice to think of how they are structured conceptually, then actually work at that level of abstraction? That’s the vision of Lightbend Pipelines, allowing you to imagine your app like the following actual screen shot:

Pipelines Runtime User Interface

This is the runtime monitoring view provided by Pipelines. When you create the application, you think in terms of a blueprint that defines how streamlets (the ovoid shapes) are wired together. A streamlet is an encapsulated unit of functionality you implement to manipulate stream data.

This image and the next reflect one of the Pipelines example applications, which simulates processing of call detail records used in Telecom systems. Here’s a wire-diagram of this blueprint with a little more detail:

Pipelines Application Blueprint

Working from the left, the three streamlets each ingest a source of CDRs (simulated in the example app), their output is merged in the next streamlet, then sent to a streamlet that does parsing, validation, and transformation of the records. Errors from the parsing step are logged, while good records are sent to an aggregation streamlet that calculates various statistics over the moving data, finally sending the results downstream.

In the example code base, the aggregation streamlet is written in Spark Structured Streaming, while the others are written in Akka Streams. When you build the application, Pipelines verifies the schemas on each end of the lines are compatible and it instantiates savepoints, the connections shown as lines between the streamlets. (These are actually automatically-generated Kafka topics, but that’s an implementation detail that could change.)

If you’ve read our ebook, Fast Data Architectures For Streaming Applications, you’ll recall that there are a lot of decisions to make and components to integrate when building streaming applications. Lightbend Pipelines reflects our opinionated view about the best way to solve these challenges and we do most of the hard work for you.

See the Lightbend Pipelines documentation for more details on using Pipelines, including expanded explanations of the concepts briefly mentioned here, like blueprints and streamlets.