Akka Cluster Bootstrap

Akka Cluster Bootstrap helps forming (or joining to) a cluster by using Akka Discovery to discover peer nodes. It is an alternative to configuring static seed-nodes in dynamic deployment environments such as on Kubernetes or AWS.

It builds on the flexibility of Akka Discovery, leveraging a range of discovery mechanisms depending on the environment you want to run your cluster in.

Prerequisites

Bootstrap depends on:

A discovery mechanism needs to be chosen. A good default choice is DNS.

Dependency

Add akka-management-cluster-bootstrap and one or more discovery mechanisms to use for the discovery process.

For example, you might choose to use the DNS discovery and bootstrap extensions:

sbt
libraryDependencies ++= Seq(
  "com.lightbend.akka.management" %% "akka-management-cluster-bootstrap" % "0.20.0",
  "com.lightbend.akka.discovery" %% "akka-discovery-dns" % "0.20.0"
)
Gradle
dependencies {
  compile group: 'com.lightbend.akka.management', name: 'akka-management-cluster-bootstrap_2.12', version: '0.20.0',
  compile group: 'com.lightbend.akka.discovery', name: 'akka-discovery-dns_2.12', version: '0.20.0'
}
Maven
<dependency>
  <groupId>com.lightbend.akka.management</groupId>
  <artifactId>akka-management-cluster-bootstrap_2.12</artifactId>
  <version>0.20.0</version>
</dependency>
<dependency>
  <groupId>com.lightbend.akka.discovery</groupId>
  <artifactId>akka-discovery-dns_2.12</artifactId>
  <version>0.20.0</version>
</dependency>

Using

Akka management must be started as well as the bootstrap process:

Scala
// Akka Management hosts the HTTP routes used by bootstrap
AkkaManagement(system).start()

// Starting the bootstrap process needs to be done explicitly
ClusterBootstrap(system).start()
Full source at GitHub
Java
// Akka Management hosts the HTTP routes used by bootstrap
AkkaManagement.get(system).start();

// Starting the bootstrap process needs to be done explicitly
ClusterBootstrap.get(system).start();
Full source at GitHub

Ensure that seed-nodes is not present in configuration and that start() is called on every node.

The following configuration is required, more details for each and additional configuration can be found in reference.conf:

  • akka.management.cluster.bootstrap.contact-point-discovery.service-name: a unique name in the deployment environment for this cluster instance which is used to lookup peers in service discovery. If unset, it will be derived from the ActorSystem name.
  • akka.management.cluster.bootstrap.contact-point-discovery.discovery-method: the intended service discovery mechanism (from what choices Akka Discovery provides). If unset, falls back to the system-wide default from akka.discovery.method.

How It Works

  • Each node exposes an HTTP endpoint /bootstrap/seed-nodes. This is provided by akka-management-cluster-bootstrap and exposed automatically by starting Akka management.
  • During bootstrap each node queries service discovery repeatedly to get the initial contact points until at least the number of contact points as defined in contact-point-discovery.required-contact-point-nr has been found.
  • Each node then probes the found contact points’ /bootstrap/seed-nodes endpoint to see if a cluster has already been formed
    • If there is an existing cluster, it joins the cluster and bootstrapping is finished.
    • If no cluster exists, each node returns an empty list of seed-nodes. In that case, the node with the lowest address from the set of contact points forms a new cluster and starts advertising itself as a seed node.
  • Other nodes will start to see the /bootstrap/seed-nodes of the node that self-joined and will join its cluster.

See full bootstrap process and advanced configuration for more details on the process.

Joining Mechanism Precedence

As Akka Cluster allows nodes to join to a cluster using multiple different methods, the precedence of each method is strictly defined and is as follows:

  • If akka.cluster.seed-nodes (in your application.conf) are non-empty, those nodes will be joined, and bootstrap will NOT execute even if start() is called, however a warning will be logged.
  • If an explicit cluster.join or cluster.joinSeedNodes is invoked before the bootstrap completes, that joining would take precedence over the bootstrap (but it’s not recommended to do so, see below).
  • The Cluster Bootstrap mechanism takes some time to complete, but eventually issues a joinSeednodes.
Warning

It is NOT recommended to mix various joining mechanisms. Pick one mechanism and stick to it in order to avoid any surprises during cluster formation. E.g. do NOT set akka.cluster.seed-nodes if you are going to be using the Bootstrap mechanism.

Deployment considerations

Initial deployment

Cluster Bootstrap will always attempt to join an existing cluster if possible. However if no other contact point advertises any seed-nodes a new cluster will be formed by the node decided by the JoinDecider where the default sorts the addresses then picks the lowest.

A setting is provided, akka.management.cluster.bootstrap.new-cluster-enabled that can be disable new cluster formation to only allow the node to join existing clusters.

  • On initial deployment use the default akka.management.cluster.bootstrap.new-cluster-enabled=on
  • Following the initial deployment it is recommended to set akka.management.cluster.bootstrap.new-cluster-enabled=off with an immediate re-deployment once the initial cluster has formed

This can be used to provide additional safety during restarts and redeploys while there is a network partition present. Without new cluster formation disabled an isolated set of nodes could form a new cluster if all are restarted.

For complete safety of the Initial Bootstrap it is recommended to set the contact-point-discovery.required-contact-point-nr setting to the exact number of nodes the initial startup of the cluster will be done. For example, if starting a cluster with 4 nodes initially, and later scaling it out to many more nodes, be sure to set this setting to 4 for additional safety of the initial joining, even in face of an flaky discovery mechanism!

Recommended Configuration

When using the bootstrap module, there are some underlying Akka Cluster settings that should be specified to ensure that your deployment is robust.

Since the target environments for this module are dynamic, that is, instances can come and go, failure needs to be considered. The following configuration will result in your application being shut down after 30 seconds if it is unable to join the discovered seed nodes. In this case, the orchestrator (i.e. Kubernetes or Marathon) will restart your node and the operation will (presumably) eventually succeed. You’ll want to specify the following in your application.conf file:

akka.cluster.shutdown-after-unsuccessful-join-seed-nodes = 30s
akka.coordinated-shutdown.exit-jvm = on
Full source at GitHub

Rolling updates

Graceful shutdown

Akka Cluster can handle hard failures using a downing provider such as Lightbend’s split brain resolver discussed below. However this should not be relied upon for regular rolling redeploys. Features such as ClusterSingletons and ClusterSharding can safely restart actors on new nodes far quicker when it is certain that a node has shutdown rather than crashed.

Graceful leaving will happen with the default settings as it is part of Coordinated Shutdown. Just ensure that a node is sent a SIGTERM and not a SIGKILL. Environments such as Kubernetes will do this, it is important to ensure that if JVM is wrapped with a script that it forwards the signal.

Upon receiving a SIGTERM Coordinated Shutdown will:

  • Perform a Cluster(system).leave on itself
  • The status of the member will be changed to Exiting while allowing any shards to be shutdown gracefully and ClusterSingletons to be migrated if this was the oldest node. Finally the node is removed from the Akka Cluster membership.

Number of nodes to redeploy at once

Akka bootstrap requires a stable-period where service discovery returns a stable set of contact points. When doing rolling updates it is best to wait for a node (or group of nodes) to finish joining the cluster before adding and removing other nodes.

Cluster Singletons

ClusterSingletons run on the oldest node in the cluster. To avoid singletons moving during every node deployment it is advised to start a rolling redeploy starting at the newest node. Then ClusterSingletons only move once. This is the default behaviour for Kubernetes deployments. Cluster Sharding uses a singleton internally so this is important even if not using singletons directly.

Split brains and ungraceful shutdown

Nodes can crash causing cluster members to become unreachable. This is a tricky problem as it is not possible to distinguish between a network partition and a node failure. To rectify this in an automated manner, Lightbend provides Split Brain Resolver as a feature of the Lightbend Subscription. This module has a number of strategies that can ensure that the cluster continues to function during network partitions and node failures.

Bootstrap Recipes

To see how to configure and use bootstrap in various environments such as Kubernetes, see recipes.

The source code for this page can be found here.