Tales From The Road: Tracking down low CPU utilization and blocked threads in a Fortune 500 oilfield services company

By Patrick Prémont (@ppremont) July 18, 2018

A recurring activity for Lightbend Professional Services is helping customers track down performance issues. In this “Tale From The Road”, we take a look at how we were able to solve poor performance issues at a Lightbend customer and Fortune 500 Oilfield Services Company.

Low CPU utilization under load

In general, many of our clients suffer from problems created by blocking threads accidentally. With unbounded thread pools this can degrade performance through heavy resource consumption and frequent context switches. With bounded pools it can lead to thread starvation and even worse performance degradation.

Low CPU utilization under load points to such problems. With our Enterprise Suite module Akka Thread Starvation Detector, it becomes easier to detect such occurrences and identify the starved execution context; however, there is still a need to investigate the precise location where blocking occurs.

Tracking down the blocking call

I wanted to share a recent experience tracking down some more elusive blocking. Both the particular sources of blocking uncovered and the methodologies used can be of interest. We first reviewed the code for obvious blocking calls such Await.result calls or file IO, but couldn’t see anything obvious.

Since problematic blocking calls will tend to execute for a long time, they will frequently be visible on top of the call stack. So we used VisualVM to look at some of the threads and were surprised to see that they were blocked trying to enqueue work onto an Executor service, within java.util.concurrent.ThreadPoolExecutor.execute().

Why would this block? This was an Akka application, where execution contexts are generally created as dispatchers, and those get unbounded queues where tasks can be added without blocking. Upon inspecting the creation of the problematic thread’s execution context, we noticed that it was not created as a dispatcher, but through ExecutionContext.fromExecutor(Executors.newCachedThreadPool()). This calls new ThreadPoolExecutor(0, Integer.MAX_VALUE, 60L, TimeUnit.SECONDS, new SynchronousQueue<Runnable>()) and a synchronous queue has size 0, and will always block unless a thread is immediately available to pick up the work.

So our pool was growing very large because most threads were blocked, and they were blocked trying to enqueue work on this same pool, a bit like a dog chasing his tail!

Replacing this pool with one that does not have such a bounded queue, like Executors.newFixedThreadPool() or an Akka dispatcher, solves this problem.

The pitfall of parallel collections

However, we were still seeing low CPU utilization under load, and now VisualVM was pointing to another unexpected source of blocking. Near the top of the stacks of many threads we could see calls to scala.collection.parallel.ParIterableLike.map().

We had not noticed this during the source inspection, but parallel collection operations were present, and those do block our threads. This was tricky, because those operations do not return futures, yet schedule work on other threads, so they have to await that work’s completion by blocking. It’s easy to miss this subtlety, but inspecting ours stacks made that problem obvious. In our case we could simply revert to sequential collections.

Summary

Avoiding blocking operations often pays offs in terms of application performance, but unfortunately requires some discipline. It’s easy to make mistakes, but as we saw here, locating those errors is not very difficult and can be quite instructive.