Background
I am working on the Open Source project
Speedment and for us contributors, it is important to use code that people can understand and improve. It is also important that performance is good, otherwise people are likely to use some other solution.
Escape Analysis allows us to write performant code at the same time as we can use good code style with appropriate abstractions.
This is Escape Analysis
Escape Analysis (also abbreviated as "EA") allows the Java compiler to optimize our code in many ways. Please consider the following simple
Point class:
public class Point {
private final int x, y;
public Point(int x, int y) {
this.x = x;
this.y = y;
}
@Override
public String toString() {
final StringBuilder sb = new StringBuilder()
.append("(")
.append(x)
.append(", ")
.append(y)
.append(")");
return sb.toString();
}
}
Each time we call the
Point::toString method, it looks like a new
StringBuilder object is created. However, as we can see, the
StringBuilder object
is not visible from outside the method. It cannot be observed neither from outside the method nor by another thread running the same piece of code (because that other thread would se its own version of the
StringBuilder).
So, after calling the method some million times, there might be millions of
StringBuilder objects lying around? Not so! By employing EA, the compiler can allocate the
StringBuilder on the stack instead. So when our method returns, the object is automatically deleted upon return, as the stack pointer is restored to the previous value it had before the method was called.
Escape analysis has been available for a relatively long time in Java. In the beginning we had to enable it using command line options, but nowadays it is used by default. Java 8 has an improved Escape Analysis compared to previous Java versions.
How It Works
Based on EA, an object's escape state will take on one of three distinct values:
- GlobalEscape: An object may escape the method and/or the thread. Clearly, if an object is returned as the result of a method, its state is GlobalEscape. The same is true for objects that are stored in static fields or in fields of an object that itself is of state GlobalEscape. Also, if we override the finalize() method, the object will always be classified as GlobalEscape and thus, it will be allocated on the heap. This is logical, because eventually the object will be visible to the JVM's finalizer. There are also some other conditions that will render our object's status GlobalEscape.
- ArgEscape: An object that is passed as an argument to a method but cannot otherwise be observed outside the method or by other threads.
- NoEscape: An object that cannot escape the method or thread at all.
GlobalEscape and ArgEscape objects must be allocated on the heap, but for ArgEscape objects it is possible to remove some locking and memory synchronization overhead because these objects are only visible from the calling thread.
The NoEscape objects may be allocated freely, for example on the stack instead of on the heap. In fact, under some circumstances, it is not even necessary to construct an object at all, but instead only the object's scalar values, such as an
int for the object
Integer. Synchronization may be removed too, because we know that only this thread will use the objects. For example, if we were to use the somewhat ancient
StringBuffer (which as opposed to
StringBuilder has synchronized methods), then these synchronizations could safely be removed.
EA is currently only available under the C2 HotSpot Compiler so we have to make sure that we run in -server mode.
Why It Matters
In theory, NoEscape objects objects can be allocated on the stack or even in CPU registers using EA, giving very fast execution.
When we allocate objects on the heap, we start to drain our CPU caches because objects are placed on different addresses on the heap possibly far away from each other. This way we will quickly deplete our L1 CPU cache and performance will decrease. With EA and stack allocation on the other hand, we are using memory that (most likely) is already in the L1 cache anyhow. So, EA and stack allocation will improve our localization of data. This is good from a performance standpoint.
Obviously, the garbage collects needs to run much less frequently when we are using EA with stack allocation. This is perhaps the biggest performance advantage. Recall that each time the JVM runs a complete heap scan, we take performance out of our CPUs and the CPU caches will quickly deplete. Not to mention if we have virtual memory paged out on our server, whereby GC is devastating for performance.
The most important advantage of EA is not performance though. EA allows us to use local abstractions like Lambdas, Functions, Streams, Iterators etc. without any significant performance penalty so that
we can write better and more readable code. Code that describes what we are doing rather than how it is done.
A Small Example
public class Main {
public static void main(String[] args) throws IOException {
Point p = new Point(100, 200);
sum(p);
System.gc();
System.out.println("Press any key to continue");
System.in.read();
long sum = sum(p);
System.out.println(sum);
System.out.println("Press any key to continue2");
System.in.read();
sum = sum(p);
System.out.println(sum);
System.out.println("Press any key to exit");
System.in.read();
}
private static long sum(Point p) {
long sumLen = 0;
for (int i = 0; i < 1_000_000; i++) {
sumLen += p.toString().length();
}
return sumLen;
}
}
The code above will create a single instance of a
Point and then it will call that Point's
toString() method a large number of times. We will do it in three steps where the first step is just for warming up and then GC away all the objects that were created. The two following steps will not remove anything from the heap and we will be able to examine the heap between each step.
If we run the program with the following parameters, we will be able to see what is going on within the JVM:
-server
-XX:BCEATraceLevel=3
-XX:+PrintCompilation
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintInlining
-verbose:gc
-XX:MaxInlineSize=256
-XX:FreqInlineSize=1024
-XX:MaxBCEAEstimateSize=1024
-XX:MaxInlineLevel=22
-XX:CompileThreshold=10
-Xmx4g
-Xms4g
And yes, that is a huge pile of parameters but we really want to be able to see what is going on.
After the first run, we get the following heap usage (after the
System.gc() call cleaned up all our StringBuilders)
pemi$ jps | grep Main
50903 Main
pemi$ jmap -histo 50903 | head
num #instances #bytes class name
----------------------------------------------
1: 95 42952184 [I
2: 1079 101120 [C
3: 485 55272 java.lang.Class
4: 526 25936 [Ljava.lang.Object;
5: 13 25664 [B
6: 1057 25368 java.lang.String
7: 74 5328 java.lang.reflect.Field
The two following steps gave:
pemi$ jmap -histo 50903 | head
num #instances #bytes class name
----------------------------------------------
1: 2001080 88101152 [C
2: 100 36777992 [I
3: 1001058 24025392 java.lang.String
4: 64513 1548312 java.lang.StringBuilder
5: 485 55272 java.lang.Class
6: 526 25936 [Ljava.lang.Object;
7: 13 25664 [B
pemi$ jmap -histo 50903 | head
num #instances #bytes class name
----------------------------------------------
1: 4001081 176101184 [C
2: 2001059 48025416 java.lang.String
3: 105 32152064 [I
4: 64513 1548312 java.lang.StringBuilder
5: 485 55272 java.lang.Class
6: 526 25936 [Ljava.lang.Object;
7: 13 25664 [B
As can be seen, EA was eventually able to eliminate the creation of the
StringBuilder instances on the heap. There were only 64K created compared to the 2M Stings. A big improvement!
Conclusions
The advantages of Escape Analysis are nice in theory but they are somewhat difficult to understand and predict. We do not get a guarantee that we will get the optimizations we are expecting in all cases but it seems to work reasonably well under common conditions.
Check out open-source
Speedment and see if you can spot the places where we rely on Escape Analysis.
Hopefully, this post contributed to shed some light on EA so that you opt to write good code over "performant" code.
I would like to thank Peter Lawery for the tips and suggestions I got from him in connection with writing this post.
Read more on Objects in general
here