Minborg

Minborg
Minborg

Friday, December 18, 2015

My Previous Post on Video

Escape Analysis Aftermath


My post on Java Escape Analysis has triggered a lot of interest. I made a follow up post on the effect of inlining on Escape Analysis.

A company that provides Java programming training filmed the post and had one person summarize the content. The company is called Webucator.

Have a look at the presentation here.



Keep on hackin'



Wednesday, December 16, 2015

Java 8: The JVM Can Re-capture Objects That Have Escaped

Background


In my previous post, I wrote about Escape Analysis and how the JVM can allocate non-escaping objects on the stack rather than on the heap. I immediately got a very interesting question from Caleb Cushing asking if Objects that actually can escape could be optimized anyhow, provided that that escaped object is reasonably contained by the caller.

Read this post and find out the answer!


A Simple Example

Let's assume that we have the following simple Person class:

public class Person {

    private final String firstName;
    private final String middleName;
    private final String lastName;

    public Person(String firstName, String middleName, String lastName) {
        this.firstName = requireNonNull(firstName);  // Cannot be null
        this.middleName = middleName;                // Can be null
        this.lastName = requireNonNull(lastName);    // Cannot be null
    }

    public String getFirstName() {
        return firstName;
    }

    public Optional<String> getMiddleName() {
        return Optional.ofNullable(middleName);
    }

    public String getLastName() {
        return lastName;
    }

}

Now, if we call the method Person::getMiddleName, it is obvious that the Optional object will escape the method because it is returned by the method and becomes visible to anyone calling the method. Thus, it will be classified as GlobalEscape and must be allocated on the heap. However, this is not necessarily the case. The JVM will sometimes be able to allocate it on the stack, despite the fact that it escapes the method. How is that possible?


What is Escape Analysis (EA)?

Before you read on, I encourage you to read my previous post because it will be more easy to understand what is going on. The post describes the fundamental aspects of EA.

How Can GlobalEscape Objects Still Live on the Stack?

It turns out that the C2 compiler is able to do EA not only over single methods, but over larger chunks of code that is inlined by the compiler. Inlining is an optimization scheme where the code is "flattened" to eliminate redundant calls. So, one or several layers of calls are flattened to a sequential list of instructions. The compiler then evaluates EA, not on the individual methods, but on the entire inlined code block. So, even though an object might escape a particular method, it might not be able to escape the larger inlined code block. 

A Demonstration of Inlined Escape Analysis

public class Main2 {

    public static void main(String[] args) throws IOException {

        Person p = new Person("Johan", "Sebastian", "Bach");

        count(p);
        System.gc();
        System.out.println("Press any key to continue");
        System.in.read();
        long sum = count(p);

        System.out.println(sum);
        System.out.println("Press any key to continue2");
        System.in.read();

        sum = count(p);

        System.out.println(sum);
        System.out.println("Press any key to exit");
        System.in.read();

    }

    private static long count(Person p) {
        long count = 0;
        for (int i = 0; i < 1_000_000; i++) {
            if (p.getMiddleName().isPresent()) {
                count++;
            }
        }
        return count;

    }

}

The code above will create a single instance of a Person and then it will call that Person's getMiddleName() method a large number of times. We will do it in three steps where the first step is just for warming up and then GC away all the objects that were created. The two following steps will not remove anything from the heap and we will be able to examine the heap between each step.We can use the following JVM parameters when we run the code:

-server
-XX:BCEATraceLevel=3
-XX:+PrintCompilation
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintInlining
-verbose:gc
-XX:MaxInlineSize=256
-XX:FreqInlineSize=1024
-XX:MaxBCEAEstimateSize=1024
-XX:MaxInlineLevel=22
-XX:CompileThreshold=10
-Xmx4g
-Xms4g


After the first run, we get the following heap usage (after the System.gc() call cleaned up all our Optionals)

pemi$ jps | grep Main2
74886 Main2
 num     #instances         #bytes  class name
----------------------------------------------
   1:            95       42952184  [I
   2:          1062         101408  [C
   3:           486          55384  java.lang.Class
   4:           526          25944  [Ljava.lang.Object;
   5:            13          25664  [B
   6:          1040          24960  java.lang.String
   7:            74           5328  java.lang.reflect.Field

The two following steps gave:

pemi$ jmap -histo 74886 | head

 num     #instances         #bytes  class name
----------------------------------------------
   1:            95       39019792  [I
   2:        245760        3932160  java.util.Optional
   3:          1063         101440  [C
   4:           486          55384  java.lang.Class
   5:           526          25944  [Ljava.lang.Object;
   6:            13          25664  [B
   7:          1041          24984  java.lang.String
pemi$ jmap -histo 74886 | head

 num     #instances         #bytes  class name
----------------------------------------------
   1:            95       39019544  [I
   2:        245760        3932160  java.util.Optional
   3:          1064         101472  [C
   4:           486          55384  java.lang.Class
   5:           526          25944  [Ljava.lang.Object;
   6:            13          25664  [B
   7:          1042          25008  java.lang.String

No new Optionals were created between step two and step three and thus, EA was eventually able to eliminate the creation of the Optional instances on the heap even though they escaped the initial method where they were created and returned. This means that we can use an appropriate level of abstraction and still retain performant code.

Conclusions

Escape Analysis can work on several layers in our code. EA can optimize away heap allocation even though objects escapes one or several methods.  As with EA in general, we do not get a guarantee that we will get the optimizations we are expecting in all cases.

The open-source project Speedment that I am contributing to, often returns Streams containing entities or Optionals. The fact that EA works on several layers makes the application code run faster. The JVM is able to inline code from the Speedment library into the application code itself and then, using EA, temporary return objects are never allocated on the heap. So, Speedment developers can enjoy a nice API while still retaining high performance and low latency
.



Wednesday, December 2, 2015

Do Not Let Your Java Objects Escape

Background

I am working on the Open Source project Speedment and for us contributors, it is important to use code that people can understand and improve. It is also important that performance is good, otherwise people are likely to use some other solution. 

Escape Analysis allows us to write performant code at the same time as we can use good code style with appropriate abstractions.


This is Escape Analysis

Escape Analysis (also abbreviated as "EA") allows the Java compiler to optimize our code in many ways.  Please consider the following simple Point class:

public class Point {

    private final int x, y;

    public Point(int x, int y) {
        this.x = x;
        this.y = y;
    }

    @Override
    public String toString() {
        final StringBuilder sb = new StringBuilder()
                .append("(")
                .append(x)
                .append(", ")
                .append(y)
                .append(")");
        return sb.toString();
    }

}

Each time we call the Point::toString method, it looks like a new StringBuilder object is created. However, as we can see, the StringBuilder object is not visible from outside the method. It cannot be observed neither from outside the method nor by another thread running the same piece of code (because that other thread would se its own version of the StringBuilder).

So, after calling the method some million times, there might be millions of StringBuilder objects lying around? Not so! By employing EA, the compiler can allocate the StringBuilder on the stack instead. So when our method returns, the object is automatically deleted upon return, as the stack pointer is restored to the previous value it had before the method was called.

Escape analysis has been available for a relatively long time in Java. In the beginning we had to enable it using command line options, but nowadays it is used by default. Java 8 has an improved Escape Analysis compared to previous Java versions.


How It Works

Based on EA, an object's escape state will take on one of three distinct values:
  • GlobalEscape: An object may escape the method and/or the thread. Clearly, if an object is returned as the result of a method, its state is GlobalEscape. The same is true for objects that are stored in static fields or in fields of an object that itself is of state GlobalEscape. Also, if we override the finalize() method, the object will always be classified as GlobalEscape and thus, it will be allocated on the heap. This is logical, because eventually the object will be visible to the JVM's finalizer. There are also some other conditions that will render our object's status GlobalEscape.
  • ArgEscape: An object that is passed as an argument to a method but cannot otherwise be observed outside the method or by other threads.
  • NoEscape: An object that cannot escape the method or thread at all.


GlobalEscape and ArgEscape objects must be allocated on the heap, but for ArgEscape objects it is possible to remove some locking and memory synchronization overhead because these objects are only visible from the calling thread.

The NoEscape objects may be allocated freely, for example on the stack instead of on the heap. In fact, under some circumstances, it is not even necessary to construct an object at all, but instead only the object's scalar values, such as an int for the object Integer. Synchronization may be removed too, because we know that only this thread will use the objects. For example, if we were to use the somewhat ancient StringBuffer (which as opposed to StringBuilder has synchronized methods), then these synchronizations could safely be removed.

EA is currently only available under the C2 HotSpot Compiler so we have to make sure that we run in -server mode.


Why It Matters

In theory, NoEscape objects objects can be allocated on the stack or even in CPU registers using EA,  giving very fast execution.

When we allocate objects on the heap, we start to drain our CPU caches because objects are placed on different addresses on the heap possibly far away from each other. This way we will quickly deplete our L1 CPU cache and performance will decrease. With EA and stack allocation on the other hand, we are using memory that (most likely) is already in the L1 cache anyhow.  So, EA and stack allocation will improve our localization of data. This is good from a performance standpoint.

Obviously, the garbage collects needs to run much less frequently when we are using EA with stack allocation. This is perhaps the biggest performance advantage. Recall that each time the JVM runs a complete heap scan, we take performance out of our CPUs and the CPU caches will quickly deplete. Not to mention if we have virtual memory paged out on our server, whereby GC is devastating for performance.

The most important advantage of EA is not performance though. EA allows us to use local abstractions like Lambdas, Functions, Streams, Iterators etc. without any significant performance penalty so that we can write better and more readable code. Code that describes what we are doing rather than how it is done.

A Small Example

public class Main {

    public static void main(String[] args) throws IOException {
        Point p = new Point(100, 200);

        sum(p);
        System.gc();
        System.out.println("Press any key to continue");
        System.in.read();
        long sum = sum(p);

        System.out.println(sum);
        System.out.println("Press any key to continue2");
        System.in.read();
        
        sum = sum(p);

        System.out.println(sum);
        System.out.println("Press any key to exit");
        System.in.read();

    }

    private static long sum(Point p) {
        long sumLen = 0;
        for (int i = 0; i < 1_000_000; i++) {
            sumLen += p.toString().length();
        }
        return sumLen;

    }

}


The code above will create a single instance of a Point and then it will call that Point's toString() method a large number of times. We will do it in three steps where the first step is just for warming up and then GC away all the objects that were created. The two following steps will not remove anything from the heap and we will be able to examine the heap between each step.

If we run the program with the following parameters, we will be able to see what is going on within the JVM:

-server
-XX:BCEATraceLevel=3
-XX:+PrintCompilation
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintInlining
-verbose:gc
-XX:MaxInlineSize=256
-XX:FreqInlineSize=1024
-XX:MaxBCEAEstimateSize=1024
-XX:MaxInlineLevel=22
-XX:CompileThreshold=10
-Xmx4g
-Xms4g

And yes, that is a huge pile of parameters but we really want to be able to see what is going on.

After the first run, we get the following heap usage (after the System.gc() call cleaned up all our StringBuilders)

pemi$ jps | grep Main
50903 Main
pemi$ jmap -histo 50903 | head
 num     #instances         #bytes  class name

----------------------------------------------
   1:            95       42952184  [I
   2:          1079         101120  [C
   3:           485          55272  java.lang.Class
   4:           526          25936  [Ljava.lang.Object;
   5:            13          25664  [B
   6:          1057          25368  java.lang.String
   7:            74           5328  java.lang.reflect.Field

The two following steps gave:

pemi$ jmap -histo 50903 | head
 num     #instances         #bytes  class name
----------------------------------------------
   1:       2001080       88101152  [C
   2:           100       36777992  [I
   3:       1001058       24025392  java.lang.String
   4:         64513        1548312  java.lang.StringBuilder
   5:           485          55272  java.lang.Class
   6:           526          25936  [Ljava.lang.Object;
   7:            13          25664  [B


pemi$ jmap -histo 50903 | head
 num     #instances         #bytes  class name
----------------------------------------------
   1:       4001081      176101184  [C
   2:       2001059       48025416  java.lang.String
   3:           105       32152064  [I
   4:         64513        1548312  java.lang.StringBuilder
   5:           485          55272  java.lang.Class
   6:           526          25936  [Ljava.lang.Object;
   7:            13          25664  [B

As can be seen, EA was eventually able to eliminate the creation of the StringBuilder instances on the heap. There were only 64K created compared to the 2M Stings. A big improvement!

Conclusions

The advantages of Escape Analysis are nice in theory but they are somewhat difficult to understand and predict. We do not get a guarantee that we will get the optimizations we are expecting in all cases but it seams to work reasonably well under common conditions.

Check out open-source Speedment and see if you can spot the places where we rely on Escape Analysis.

Hopefully this post contributed to shed some light on EA so that you opt to write good code over "performant" code.

I would like to thank Peter Lawery for the tips and suggestions I got from him in connection with writing this post.