Thursday, January 14, 2016

Be Lazy with Java 8

Background

One of the most distinguished feature of us programmers is that we are inherently lazy. Not in a bad way that we do not want to work, but in a better way: We do not want to do the same thing twice and we do not want to do it at all if we do not have to. In fact, not writing code is often the better alternative in the cases you can reuse something else instead.

The same thing is true for our applications. Often, we want them to be lazy so that they only do what is absolutely necessary and nothing more.

I have used the Lazy class presented here in the open-source project Speedment that makes database applications really short and concise.

Read more on how we can make our applications lazy in this post.

Implementing Lazy Initialization

In this post, the goal is to show a Lazy class that can be used for objects with a relatively long life expectancy and where there might be any number of calls (from zero to the millions) to a particular method. We must also ensure that the class is thread safe. Lastly, we want to have maximum performance for different threads calling the class many times.

Here is the proposed class:

public final class Lazy<T> {

    private volatile T value;

    public T getOrCompute(Supplier<T> supplier) {
        final T result = value;  // Read volatile just once...
        return result == null ? maybeCompute(supplier) : result;
    }

    private synchronized T maybeCompute(Supplier<T> supplier) {
        if (value == null) {
            value = requireNonNull(supplier.get());
        }
        return value;
    }

}


The Lazy class can be used in many applications. Immutable classes are especially good candidates for lazy initialization. For example, Java's built-in String class employs lazy initialization in its hashCode() method. Here is one example how we can use the Lazy class:
public class Point {

    private final int x, y;
    private final Lazy<String> lazyToString;

    public Point(int x, int y) {
        this.x = x; 
        this.y = y;
        lazyToString = new Lazy<>();
    }

    @Override
    public String toString() {
        return lazyToString.getOrCompute( () -> "(" + x + ", " + y + ")");
    }

}

Looking back on the Lazy class again, we see that it only contains a single “holder” field for its value (I will explain why the field is not declared volatile later on) (EDIT: the field must be volatile to guarantee our requirements). There is also a public method getOrCompute() that allows us to retrieve the value. This method also takes a Supplier that will be used if and only if the value has not been set previously. The Supplier must produce a non-null value. Note the use of a local variable result, allowing us to reduce the number of volatile reads from two to one where the Lazy instance has been initialized already. Before I explain the features of this particular implementation, we need to revisit the Java Memory Model and particularly variable visibility across threads. If you want, you can skip the next chapter and just accept that Lazy works as it supposed to do. However, I do encourage you to read on.

The Java Memory Model and Visibility

One of the key issues with the Java Memory Model is the concept of visibility. If Thread 1 updates a variable someValue = 2 then when would the other threads (e.g. Thread 2) see this update? It turns out that Thread 1’s update will not be seen immediately by other threads. In fact, there is no guarantee as to how quickly a change in this variable will be seen by other threads at all. It could be 100 ns, 1 ms, 1 s or even 10 years in theory. There are performance reasons for isolating the java memory view between threads. Because each thread can have its own memory view, the level of parallelism will be much higher than if threads were supposed to share and guarantee the same memory model.

Some of the benefits with relaxed visibility are that it allows:

  • The compiler to reorder instructions in order to execute more efficiently
  • The compiler to cache variables in CPU registers
  • The CPU to defer flushing of writes to main memory
  • Old entries in reading processors’ caches to be used

The Java keywords final, synchronized and volatile allows us to change the visibility of objects across threads. The Java Memory Model is quite a big topic and perhaps I will write a more elaborate post on the issue later on. However, in the case of synchronization, a thread that enters a synchronization block must invalidate its local memory (such as CPU registers or cache entries that involves variables inside the synchronization block) so that reads will be made directly from main memory. In the same way, a thread that exists a synchronization block must flush all its local memory. Because only one thread can be in a synchronization block at any given time, the effect is that all changes to variables are effectively visible to all threads that enters the synchronization block. Note that threads that do not enter the synchronization block does not have any visibility guarantee.

Also, If a field is declared volatile, reads and writes are always made via main memory and in order. Thus, updates to the field are seen by other threads at the cost of performance.

Properties of the Lazy Class

The field value is declared volatile and in the previous chapter we just learned that there are guarantees for visibility in that case and also, more importantly, guarantees of exact timing and in-order execution. So, if Thread 1 calls the Supplier and sets the value, Thread 2 might not see the update. If so, Thread 2 will enter the maybeCompute() method and because it is synchronized it will now, in fact, see Thread 1's update and it will see that the value was already set. From now on, Thread 2 will have a correct view of the value and it will never enter the synchronization block again. This is good if Thread 2 is expected to call the Lazy class many times. If another Thread 3 is created much later, it will most likely see the correct value from the beginning and we avoid synchronization altogether. So, for medium to long lived objects, this scheme is a win! We get thread isolation with no synchronization overhead.

When Are Lazy Appropriate to Use?

Lazy is a good choice if we want to defer calculation to a later time and we do not want to repeat the calculation. If we, on the other hand, know in advance that our toString() method is always going to be called many times, then we would not use the Lazy class. Instead, we could just calculate the toString() value once and for all eagerly in the constructor and store its value for later re-use.

Conclusion

The Lazy class is a very simple, yet powerful means of deferred calculation and a nice tool for performance optimization. The Lazy performs exceptionally well under the circumstances it was constructed for, with no thread synchronization overhead whatsoever for medium and long lived objects.

The Lazy class, as shown, is used in the open-source project Speedment in a number of applications including SQL rendering where, for example, the columns for a table remains the same during the JVM lifetime. Speedment is a tool that allows access to databases using standard Java 8 streams.

Be lazy and “steal” the Lazy class here so that your applications can be lazy too...