Introducing Akka Data Pipelines
Akka Data Pipelines provides operational support and visibility into the health and performance of services and applications built using the Cloudflow open source project. Akka Data Pipelines with a Lightbend subscription includes:
Technologies like mobile, the Internet of Things (IoT), Big Data analytics, machine learning, and others are driving enterprises to modernize how they process large volumes of data. A rapidly growing percentage of that data is now arriving in the form of data streams and a growing percentage of those streams now require near-realtime processing.
The streaming landscape has been rapidly evolving, with tools like Spark, Flink, and Kafka Streams emerging from the world of large scale batch processing while projects like Reactive Streams and Akka Streams have emerged from the world of application development and high-performance networking.
The demand for availability, scalability, and resilience is forcing streaming architectures to become more like microservice architectures. Conversely, successful organizations building microservices find their data needs grow with their organization while their data sources are becoming more stream-like and more real-time. Hence, there is a unification happening between streaming data and microservice architectures.
It can be quite hard to develop, deploy, and operate large-scale microservices-based systems that can take advantage of streaming data and seamlessly integrate with systems for analytics processing and machine learning. Individual technologies may be well-documented from the development side, but often have little information on deployment and production. This makes combining them into fully integrated unified systems no easy task. Cloudflow aims to make this easier by integrating the most popular streaming frameworks into a single platform for creating and running distributed applications.
Stream processing is a discipline and a set of techniques for extracting information from unbounded data. Streaming applications apply stream processing to provide actionable insights from data as it freshly arrives into the system. The growing popularity of streaming applications is driven by:
the increasing availability of data from many sources.
the need of enterprises to speed up their reaction time to that data.
We characterize streaming applications as a connected graph of stream-processing components where each component specializes on a particular task, using the 'right tool for the job' premise. The figure below, An Abstract Streaming Application, generically illustrates an application that processes data events:
The first circle on the left represents an initial stage for capturing or accepting data. This could be an HTTP endpoint to accept data from remote clients, a connection to a Kafka topic, or input from an internal system in an enterprise.
The next circle to the right represents a processing phase that applies some logic to the data, such as business rules, statistical data analysis, or a machine learning model that implements the business aspect of the application. This processing component may add additional information to the event and send it as valid data to an external system or flag the data as invalid and report it.
The final two circles on the right shows the two different data output paths, valid and invalid.Fig. 1 - An Abstract Streaming Application
Each component in the illustration presents different application requirements, scalability concerns, and often—different Kubernetes deployment strategies. Such specialized needs make even a simple streaming application non-trivial to develop and deploy. For enterprises with complex use cases, creating streaming data applications that can extract actionable business value quickly is challenging.
Next, see how Akka Data Pipelines and Cloudflow address streaming data challenges.
This guide last published: 2021-08-11