Wednesday, December 2, 2015

Do Not Let Your Java Objects Escape

Background

I am working on the Open Source project Speedment and for us contributors, it is important to use code that people can understand and improve. It is also important that performance is good, otherwise people are likely to use some other solution. 

Escape Analysis allows us to write performant code at the same time as we can use good code style with appropriate abstractions.


This is Escape Analysis

Escape Analysis (also abbreviated as "EA") allows the Java compiler to optimize our code in many ways.  Please consider the following simple Point class:

public class Point {

    private final int x, y;

    public Point(int x, int y) {
        this.x = x;
        this.y = y;
    }

    @Override
    public String toString() {
        final StringBuilder sb = new StringBuilder()
                .append("(")
                .append(x)
                .append(", ")
                .append(y)
                .append(")");
        return sb.toString();
    }

}

Each time we call the Point::toString method, it looks like a new StringBuilder object is created. However, as we can see, the StringBuilder object is not visible from outside the method. It cannot be observed neither from outside the method nor by another thread running the same piece of code (because that other thread would se its own version of the StringBuilder).

So, after calling the method some million times, there might be millions of StringBuilder objects lying around? Not so! By employing EA, the compiler can allocate the StringBuilder on the stack instead. So when our method returns, the object is automatically deleted upon return, as the stack pointer is restored to the previous value it had before the method was called.

Escape analysis has been available for a relatively long time in Java. In the beginning we had to enable it using command line options, but nowadays it is used by default. Java 8 has an improved Escape Analysis compared to previous Java versions.


How It Works

Based on EA, an object's escape state will take on one of three distinct values:
  • GlobalEscape: An object may escape the method and/or the thread. Clearly, if an object is returned as the result of a method, its state is GlobalEscape. The same is true for objects that are stored in static fields or in fields of an object that itself is of state GlobalEscape. Also, if we override the finalize() method, the object will always be classified as GlobalEscape and thus, it will be allocated on the heap. This is logical, because eventually the object will be visible to the JVM's finalizer. There are also some other conditions that will render our object's status GlobalEscape.
  • ArgEscape: An object that is passed as an argument to a method but cannot otherwise be observed outside the method or by other threads.
  • NoEscape: An object that cannot escape the method or thread at all.


GlobalEscape and ArgEscape objects must be allocated on the heap, but for ArgEscape objects it is possible to remove some locking and memory synchronization overhead because these objects are only visible from the calling thread.

The NoEscape objects may be allocated freely, for example on the stack instead of on the heap. In fact, under some circumstances, it is not even necessary to construct an object at all, but instead only the object's scalar values, such as an int for the object Integer. Synchronization may be removed too, because we know that only this thread will use the objects. For example, if we were to use the somewhat ancient StringBuffer (which as opposed to StringBuilder has synchronized methods), then these synchronizations could safely be removed.

EA is currently only available under the C2 HotSpot Compiler so we have to make sure that we run in -server mode.


Why It Matters

In theory, NoEscape objects objects can be allocated on the stack or even in CPU registers using EA,  giving very fast execution.

When we allocate objects on the heap, we start to drain our CPU caches because objects are placed on different addresses on the heap possibly far away from each other. This way we will quickly deplete our L1 CPU cache and performance will decrease. With EA and stack allocation on the other hand, we are using memory that (most likely) is already in the L1 cache anyhow.  So, EA and stack allocation will improve our localization of data. This is good from a performance standpoint.

Obviously, the garbage collects needs to run much less frequently when we are using EA with stack allocation. This is perhaps the biggest performance advantage. Recall that each time the JVM runs a complete heap scan, we take performance out of our CPUs and the CPU caches will quickly deplete. Not to mention if we have virtual memory paged out on our server, whereby GC is devastating for performance.

The most important advantage of EA is not performance though. EA allows us to use local abstractions like Lambdas, Functions, Streams, Iterators etc. without any significant performance penalty so that we can write better and more readable code. Code that describes what we are doing rather than how it is done.

A Small Example

public class Main {

    public static void main(String[] args) throws IOException {
        Point p = new Point(100, 200);

        sum(p);
        System.gc();
        System.out.println("Press any key to continue");
        System.in.read();
        long sum = sum(p);

        System.out.println(sum);
        System.out.println("Press any key to continue2");
        System.in.read();
        
        sum = sum(p);

        System.out.println(sum);
        System.out.println("Press any key to exit");
        System.in.read();

    }

    private static long sum(Point p) {
        long sumLen = 0;
        for (int i = 0; i < 1_000_000; i++) {
            sumLen += p.toString().length();
        }
        return sumLen;

    }

}


The code above will create a single instance of a Point and then it will call that Point's toString() method a large number of times. We will do it in three steps where the first step is just for warming up and then GC away all the objects that were created. The two following steps will not remove anything from the heap and we will be able to examine the heap between each step.

If we run the program with the following parameters, we will be able to see what is going on within the JVM:

-server
-XX:BCEATraceLevel=3
-XX:+PrintCompilation
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintInlining
-verbose:gc
-XX:MaxInlineSize=256
-XX:FreqInlineSize=1024
-XX:MaxBCEAEstimateSize=1024
-XX:MaxInlineLevel=22
-XX:CompileThreshold=10
-Xmx4g
-Xms4g

And yes, that is a huge pile of parameters but we really want to be able to see what is going on.

After the first run, we get the following heap usage (after the System.gc() call cleaned up all our StringBuilders)

pemi$ jps | grep Main
50903 Main
pemi$ jmap -histo 50903 | head
 num     #instances         #bytes  class name

----------------------------------------------
   1:            95       42952184  [I
   2:          1079         101120  [C
   3:           485          55272  java.lang.Class
   4:           526          25936  [Ljava.lang.Object;
   5:            13          25664  [B
   6:          1057          25368  java.lang.String
   7:            74           5328  java.lang.reflect.Field

The two following steps gave:

pemi$ jmap -histo 50903 | head
 num     #instances         #bytes  class name
----------------------------------------------
   1:       2001080       88101152  [C
   2:           100       36777992  [I
   3:       1001058       24025392  java.lang.String
   4:         64513        1548312  java.lang.StringBuilder
   5:           485          55272  java.lang.Class
   6:           526          25936  [Ljava.lang.Object;
   7:            13          25664  [B


pemi$ jmap -histo 50903 | head
 num     #instances         #bytes  class name
----------------------------------------------
   1:       4001081      176101184  [C
   2:       2001059       48025416  java.lang.String
   3:           105       32152064  [I
   4:         64513        1548312  java.lang.StringBuilder
   5:           485          55272  java.lang.Class
   6:           526          25936  [Ljava.lang.Object;
   7:            13          25664  [B

As can be seen, EA was eventually able to eliminate the creation of the StringBuilder instances on the heap. There were only 64K created compared to the 2M Stings. A big improvement!

Conclusions

The advantages of Escape Analysis are nice in theory but they are somewhat difficult to understand and predict. We do not get a guarantee that we will get the optimizations we are expecting in all cases but it seems to work reasonably well under common conditions.

Check out open-source Speedment and see if you can spot the places where we rely on Escape Analysis.

Hopefully, this post contributed to shed some light on EA so that you opt to write good code over "performant" code.

I would like to thank Peter Lawery for the tips and suggestions I got from him in connection with writing this post.

Read more on Objects in general here


4 comments:

  1. Curious, if I write a method that directly returns (no tmp var), let's say an Optional (could be something else), and then that Optional is only ever used by the caller, does the optional get EA?

    ReplyDelete
  2. Presumably, if the compiler is using inlining, EA might be performed on the aggregate "flattened" code and then the Optional might be EA:ed. This is an interesting question and perhaps i can write a "revisit" post on you question.

    ReplyDelete
  3. Very informative.. Pardon my ignorance but reading "after the System.gc() call cleaned up all our StringBuilders" is the System.gc() really responsible for "cleaning out" StringBuilder instances considering (guessing) they are stack allocated objects? Does the jvm garbage collector clean the stack? Do the StringBuilders ever leave the stack? Are the actual instances really on the stack or just references? Thanks for your educative posts

    ReplyDelete
    Replies
    1. The GC cleans up the heap and not the stack. The stack is cleaned up automatically when methods return to their caller whereby the stack pointer is reset to its former value. So GC will clean up objects that ended up on the stack before EA/C2 compilation could be performed. The actual instances (or rather their corresponding representations) live on the stack, there are no referenced objects on the stack in the context of EA optimizations.

      Delete

Note: Only a member of this blog may post a comment.