Wednesday, December 16, 2015

Java 8: The JVM Can Re-capture Objects That Have Escaped

Background


In my previous post, I wrote about Escape Analysis and how the JVM can allocate non-escaping objects on the stack rather than on the heap. I immediately got a very interesting question from Caleb Cushing asking if Objects that actually can escape could be optimized anyhow, provided that that escaped object is reasonably contained by the caller.

Read this post and find out the answer!


A Simple Example

Let's assume that we have the following simple Person class:

public class Person {

    private final String firstName;
    private final String middleName;
    private final String lastName;

    public Person(String firstName, String middleName, String lastName) {
        this.firstName = requireNonNull(firstName);  // Cannot be null
        this.middleName = middleName;                // Can be null
        this.lastName = requireNonNull(lastName);    // Cannot be null
    }

    public String getFirstName() {
        return firstName;
    }

    public Optional<String> getMiddleName() {
        return Optional.ofNullable(middleName);
    }

    public String getLastName() {
        return lastName;
    }

}

Now, if we call the method Person::getMiddleName, it is obvious that the Optional object will escape the method because it is returned by the method and becomes visible to anyone calling the method. Thus, it will be classified as GlobalEscape and must be allocated on the heap. However, this is not necessarily the case. The JVM will sometimes be able to allocate it on the stack, despite the fact that it escapes the method. How is that possible?


What is Escape Analysis (EA)?

Before you read on, I encourage you to read my previous post because it will be more easy to understand what is going on. The post describes the fundamental aspects of EA.

How Can GlobalEscape Objects Still Live on the Stack?

It turns out that the C2 compiler is able to do EA not only over single methods, but over larger chunks of code that is inlined by the compiler. Inlining is an optimization scheme where the code is "flattened" to eliminate redundant calls. So, one or several layers of calls are flattened to a sequential list of instructions. The compiler then evaluates EA, not on the individual methods, but on the entire inlined code block. So, even though an object might escape a particular method, it might not be able to escape the larger inlined code block. 

A Demonstration of Inlined Escape Analysis

public class Main2 {

    public static void main(String[] args) throws IOException {

        Person p = new Person("Johan", "Sebastian", "Bach");

        count(p);
        System.gc();
        System.out.println("Press any key to continue");
        System.in.read();
        long sum = count(p);

        System.out.println(sum);
        System.out.println("Press any key to continue2");
        System.in.read();

        sum = count(p);

        System.out.println(sum);
        System.out.println("Press any key to exit");
        System.in.read();

    }

    private static long count(Person p) {
        long count = 0;
        for (int i = 0; i < 1_000_000; i++) {
            if (p.getMiddleName().isPresent()) {
                count++;
            }
        }
        return count;

    }

}

The code above will create a single instance of a Person and then it will call that Person's getMiddleName() method a large number of times. We will do it in three steps where the first step is just for warming up and then GC away all the objects that were created. The two following steps will not remove anything from the heap and we will be able to examine the heap between each step.We can use the following JVM parameters when we run the code:

-server
-XX:BCEATraceLevel=3
-XX:+PrintCompilation
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintInlining
-verbose:gc
-XX:MaxInlineSize=256
-XX:FreqInlineSize=1024
-XX:MaxBCEAEstimateSize=1024
-XX:MaxInlineLevel=22
-XX:CompileThreshold=10
-Xmx4g
-Xms4g


After the first run, we get the following heap usage (after the System.gc() call cleaned up all our Optionals)

pemi$ jps | grep Main2
74886 Main2
 num     #instances         #bytes  class name
----------------------------------------------
   1:            95       42952184  [I
   2:          1062         101408  [C
   3:           486          55384  java.lang.Class
   4:           526          25944  [Ljava.lang.Object;
   5:            13          25664  [B
   6:          1040          24960  java.lang.String
   7:            74           5328  java.lang.reflect.Field

The two following steps gave:

pemi$ jmap -histo 74886 | head

 num     #instances         #bytes  class name
----------------------------------------------
   1:            95       39019792  [I
   2:        245760        3932160  java.util.Optional
   3:          1063         101440  [C
   4:           486          55384  java.lang.Class
   5:           526          25944  [Ljava.lang.Object;
   6:            13          25664  [B
   7:          1041          24984  java.lang.String
pemi$ jmap -histo 74886 | head

 num     #instances         #bytes  class name
----------------------------------------------
   1:            95       39019544  [I
   2:        245760        3932160  java.util.Optional
   3:          1064         101472  [C
   4:           486          55384  java.lang.Class
   5:           526          25944  [Ljava.lang.Object;
   6:            13          25664  [B
   7:          1042          25008  java.lang.String

No new Optionals were created between step two and step three and thus, EA was eventually able to eliminate the creation of the Optional instances on the heap even though they escaped the initial method where they were created and returned. This means that we can use an appropriate level of abstraction and still retain performant code.

Conclusions

Escape Analysis can work on several layers in our code. EA can optimize away heap allocation even though objects escapes one or several methods.  As with EA in general, we do not get a guarantee that we will get the optimizations we are expecting in all cases.

The open-source project Speedment that I am contributing to, often returns Streams containing entities or Optionals. The fact that EA works on several layers makes the application code run faster. The JVM is able to inline code from the Speedment library into the application code itself and then, using EA, temporary return objects are never allocated on the heap. So, Speedment developers can enjoy a nice API while still retaining high performance and low latency
.



9 comments:

  1. Thanks for an interesting article.

    What's puzzling is why we have 245760 instances of the Optional. If this code was inlined and escape analysis had been effected those should have been 'escaped away'. You can prove this by replacing the Optional with an inner class, say NameWrapper (a POJO wrapper for the middleName) no instances will be created because of inlining and escape analysis.

    The strange thing is that if you change the value coming back from the Optional on every call (for example by adding on the index) no instances of the Object are created.

    I've tried running a few experiments to see if I can spot a pattern but there's nothing obvious.

    ReplyDelete
    Replies
    1. Hi Daniel and thanks for your feedback.

      The reason that a bunch of Optionals are created is that it is the C2 compiler that does the escape analysis. It will take some time before C2 kicks in and it will also take some time for it to do the compilation and analysis. So, in that time, the Optionals are created. Once it is done, the Optionals can be 'escaped away'.

      Please feel free to drop more comment if you find out more interesting things!

      Delete
    2. If you run the same program with -Xint (interpreter mode only) or -Xcomp (compile straight away) then there is no escape analysis performed. If you run without either of these flags you get to the ~200k instances, so looks like you are correct.

      I'm surprised by the -Xcomp and why escape analysis doesn't seem to happen when the program is run in this mode. Does -Xcomp not trigger the C2 compiler? If anything I would have expected less instances of the Optional.

      Delete
    3. -Xcomp makes the JVM to compile the code right away. IMO, you should not use that flag, because it is better to let the JVM run for a while and gather information on how the method should be compiled for optimum performance. EA is only made on compiled code, so that, as you say, explains why interpreted code does not benefit from EA.

      Delete
    4. I agree about -Xcomp although it's interesting that it should affect ea.

      Delete
    5. I wrote a blog post summarising this thread and a few other ideas. I would be interested to hear your opinion. http://www.rationaljava.com/2015/12/how-long-does-it-take-jvm-to-effect.html

      Delete
  2. Interesting post Daniel. I think one factor is that it takes a while for the compiler to actually compile the code. So, during this time, objects are always allocated on the heap. This may explain why less objects are created if the program is slowed down...

    ReplyDelete
  3. Hi Minborg,

    I just follow your article and do an experiment. Here is my Java version:

    java version "1.8.0_161"
    Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
    Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

    And my test result

    luog@luog-X510UQR:~$ jps | grep Main2
    Picked up JAVA_TOOL_OPTIONS: -Djava.security.egd=file:/dev/./urandom -Dawt.useSystemAAFontSettings=on -Dswing.aatext=true -Dswing.defaultlaf=com.sun.java.swing.plaf.gtk.GTKLookAndFeel -Djava2d.font.loadFontConf=true
    15394 Main2
    luog@luog-X510UQR:~$ jmap -histo 15394 | head
    Picked up JAVA_TOOL_OPTIONS: -Djava.security.egd=file:/dev/./urandom -Dawt.useSystemAAFontSettings=on -Dswing.aatext=true -Dswing.defaultlaf=com.sun.java.swing.plaf.gtk.GTKLookAndFeel -Djava2d.font.loadFontConf=true

    num #instances #bytes class name
    ----------------------------------------------
    1: 93 42952200 [I
    2: 1215 99256 [C
    3: 480 54784 java.lang.Class
    4: 1203 28872 java.lang.String
    5: 522 25848 [Ljava.lang.Object;
    6: 9 25008 [B
    7: 388 9312 java.util.LinkedList$Node
    luog@luog-X510UQR:~$ jmap -histo 15394 | head
    Picked up JAVA_TOOL_OPTIONS: -Djava.security.egd=file:/dev/./urandom -Dawt.useSystemAAFontSettings=on -Dswing.aatext=true -Dswing.defaultlaf=com.sun.java.swing.plaf.gtk.GTKLookAndFeel -Djava2d.font.loadFontConf=true

    num #instances #bytes class name
    ----------------------------------------------
    1: 93 26951952 [I
    2: 1000001 16000016 java.util.Optional
    3: 1216 99288 [C
    4: 480 54784 java.lang.Class
    5: 1204 28896 java.lang.String
    6: 522 25848 [Ljava.lang.Object;
    7: 9 25008 [B
    luog@luog-X510UQR:~$ jmap -histo 15394 | head
    Picked up JAVA_TOOL_OPTIONS: -Djava.security.egd=file:/dev/./urandom -Dawt.useSystemAAFontSettings=on -Dswing.aatext=true -Dswing.defaultlaf=com.sun.java.swing.plaf.gtk.GTKLookAndFeel -Djava2d.font.loadFontConf=true

    num #instances #bytes class name
    ----------------------------------------------
    1: 94 32426544 [I
    2: 2000001 32000016 java.util.Optional
    3: 1217 99320 [C
    4: 480 54784 java.lang.Class
    5: 1205 28920 java.lang.String
    6: 522 25848 [Ljava.lang.Object;
    7: 9 25008 [B

    Obviously the Optional instance is still get created after the second pause. Any idea?

    ReplyDelete
    Replies
    1. Hi Gelin and thanks for your feedback. There are some conditions that must be met in order for EA to work. As can be seen in the comments above, it take some time for the C2 compiler to check the method(s) and do the actual code compile. So, run your code at least 10,000 times, wait for a while (say 1 s) and then check. Make sure you see the C2 compile output in your logs. Let me know if it works.

      Delete

Note: Only a member of this blog may post a comment.