Profiling JVM applications

By Eugene Yokota (@eed3si9n) April 9, 2018

Hi everyone. One of the Tooling team focus lately has been improvement of the contribution process to sbt. Another thing we’ve been thinking is the performance of sbt. Combining these themes together, I’ve been looking into how guys like Jason Zaugg and Johannes Rudolph profile JVM applications, and this post is my findings.

The techniques described here should be applicable to both Java and Scala, and mostly indepent of build tools you use.

Flame graphs (async-profiler)

There are several ways to profile JVM apps, but the new hotness in profiling is Flame graphs invented by Brendan Gregg, Senior Performance Architect at Netflix. You first collect stack trace samples, and then it is processed into an intereactive svg graph. For an introduction to Flame graphs, see:

The first tool I recommend is async-profiler by Andrei Pangin. This is available for macOS and Linux, and it’s easier to get started.

  1. Download the installer from async-profiler 1.2.
  2. Make symbolic link to build/ and profiler.sh to $HOME/bin, assuming you have PATH to $HOME/bin:
    ln -s ~/App/async-profiler/profiler.sh $HOME/bin/profiler.sh
    ln -s ~/App/async-profiler/build $HOME/bin/build
    

Next, close all Java appliations and anything that may affect the profiling, like Slack, and run your app in a terminal. In my case, I am trying to profile sbt’s initial loading:

$ sbt exit

In another terminal, run:

$ jps
92746 sbt-launch.jar
92780 Jps

This tells you the process ID of the app. In this case, it’s 92746. While it’s running, run

$ profiler.sh -d 60 <process id>
Started [cpu] profiling
--- Execution profile ---
Total samples:         31602
Non-Java:              3239 (10.25%)
GC active:             46 (0.15%)
Unknown (native):      14667 (46.41%)
Not walkable (native): 3 (0.01%)
Unknown (Java):        433 (1.37%)
Not walkable (Java):   8 (0.03%)
Thread exit:           1 (0.00%)
Deopt:                 9 (0.03%)

Frame buffer usage:    55.658%

Total: 1932000000 (6.11%)  samples: 1932
  [ 0] java.lang.ClassLoader$NativeLibrary.load
  [ 1] java.lang.ClassLoader.loadLibrary0
  [ 2] java.lang.ClassLoader.loadLibrary
  [ 3] java.lang.Runtime.loadLibrary0
  [ 4] java.lang.System.loadLibrary
....

This should show a bunch of stacktraces that are useful. To visualize this as a flamegraph, run:

$ profiler.sh -d 60 -f /tmp/flamegraph.svg <process id>

This should produce /tmp/flamegraph.svg at the end.

flamegraph

See flamegraph.svg to try the output yourself.

Flame graphs (perf-map-agent)

Even though async-profiler is easier to get started, the fun part of Flame graph is mixing the JVM stack trace with the native code’s stack trace, allowing you see what your program is actually spending its CPU on. It turns out that Lightbend’s Johannes Rudolph wrote a tool for this called perf-map-agent. This uses dtrace on macOS and perf on Linux. This is particularly useful if you are trying to find if the bottle neck is on the native code.

You first have to compile perf-map-agent. For macOS, here to how to export JAVA_HOME before running cmake .:

$ cd work
$ git clone https://github.com/brendangregg/FlameGraph.git

$ git clone https://github.com/jvm-profiling-tools/perf-map-agent.git
$ cd perf-map-agent
$ export JAVA_HOME=$(/usr/libexec/java_home)
$ cmake .
-- The C compiler identification is AppleClang 9.0.0.9000039
-- The CXX compiler identification is AppleClang 9.0.0.9000039
...
$ make

In a fresh termimal, run sbt with -XX:+PreserveFramePointer flag:

$ sbt -J-Dsbt.launcher.standby=20s -J-XX:+PreserveFramePointer exit

In the terminal that you will run the perf-map:

$ cd quicktest/
$ export JAVA_HOME=$(/usr/libexec/java_home)
$ export FLAMEGRAPH_DIR=$HOME/work/FlameGraph
$ jps
94592 Jps
94549 sbt-launch.jar
$ $HOME/work/perf-map-agent/bin/dtrace-java-flames 94549
dtrace: system integrity protection is on, some features will not be available

dtrace: description 'profile-99 ' matched 2 probes
Flame graph SVG written to DTRACE_FLAME_OUTPUT='/Users/xxx/work/quicktest/flamegraph-94549.svg'.

This would produce better flamegraph in theory, but the output looks too messy for sbt exit case.

flamegraph

This might work better if the operations are already JITed, or if the operation is more specific. To get a better Flame graph, one thing we can do now is to repeat the same operation multiple times.

$ sbt -J-Dsbt.launcher.standby=20s -J-XX:+PreserveFramePointer reload reload reload reload exit

This enhances the signal, and some plateaus become more pronouced that we can zoom in and find out the hot paths.

flamegraph

Flamescope

Netflix recently released a new visualization tool called Flamescope that can filter the Flame graph to a specific range of time.

Flamescope

This was developed by Martin Spier and Brendan Gregg to study perturbations and other time-based issues. This makes sense because the normal Flame graphs aggregates all stack trace samples, so if there’s a short-lived glitch, it could be burried in other traces.

JMH (sbt-jmh)

Due to JIT warmup etc, benchmarking is difficult. JMH runs the same tests multiple times to remove these effects and comes closer to measuring the performance of your code.

For sbt users, sbt-jmh that Lightbend’s Konrad Malawski wrote makes JMH testing easier. It apparently adds an integration with async-profiler too.

VisualVM

I’d also mention traditional JVM profiling tool. Since VisualVM is opensource, I’ll mention that one.

  1. First open VisualVM.
  2. Start sbt from a terminal.
  3. You should see xsbt.boot.Boot under Local.
  4. Open it, and select either sampler or profiler, and hit CPU button at the point when you want to start.

If you are familiar with YourKit, it also works similarly.

Summary

Flame graph avisualizes stack trace samples, which helps in identifying the hot paths in your application. It also helps to confirm if the changes you made has actually improved the performance or not.