Riemann is an event stream processor which facilitates low-latency shared transient state for distributed systems. You can think of a Riemann event as an immutable data structure that gets sent over the wire and contains pertinent data that relates to some aspect of an application, system or service. At the core of Riemann is a store called the index which represents the current state of all the services which Reimann receives events. Events get shuttled to the index via Riemann streams which are composable and act as a pipeline.
Splunk is a commercial data analytics platform that provides operational intelligence from machine-generated data. Splunk interrogates transactions, customer behavior, machine behavior, security threats, fraudulent behavior and more. Also, Splunk provides a Search Processing Language (SPL) that allows for fast querying of all types of machine data.
Apache Spark is an open source clusterable computing framework that provides programmers with an API on a data structure known as the resilient distributed dataset (RDD). RDD is an immutable data structure spread over a cluster of machines that is elastic and scalable.
Spark RDD’s support both interactive and exploratory data analysis as well as iterative algorithms that visit their RDD in a loop. A key feature of Spark is its support for iterative algorithms, specifically learning algorithms for machine learning systems.