Thursday, August 15, 2019

Java: An Optional Implementation of Optional

The class java.util.Optional is implemented as a single immutable concrete class that internally handles two cases; one with an element and one without. Wouldn't it have been a better choice to let Optional be an interface and have two different implementations implement that interface instead? After all, that is what we generally are taught to do in an object-oriented language.

In this article, we will learn about some of the potential arguments for the current Optional implementation. We will also learn why Streams are implemented in a different way, enabling Streams to be obtained from files or even database tables.

The Real Optional Implementation

The real java.util.Optional::get is implemented as shown hereunder:
public T get() {
        if (value == null) {
            throw new NoSuchElementException("No value present");
        }
        return value;
    }
As can be seen, there are two code paths; one where the value is null (no element and an exception is thrown) and one when the value is something else (the value is returned).

An Optional Optional Implementation

Let’s pretend that we would go back in a time machine and were tasked to implement Optional once again. I think it is likely that many of us would come up with an initial solution much like the one below (I have named the hypothetical interface Option so we can distinguish it from the “real” one) with two distinct implementations (here EmptyOption and PresentOption):

public interface Option<T> {

    T get();

    boolean isPresent();

    public <U> Option<U> map(Function<? super T, ? extends U> mapper);

    static <T> Option<T> empty() { return (Option<T>) EmptyOption.EMPTY; }

    static <T> Option<T> of(T value) { return new PresentOption<>(value); }

    static <T> Option<T> ofNullable(T value) {
        return value == null ? empty() : of(value);
    }

}

final class EmptyOption<T> implements Option<T> {

    static final EmptyOption<?> EMPTY = new EmptyOption<>();

    private EmptyOption() {}

    @Override public T get() { throw new NoSuchElementException(); }

    @Override public boolean isPresent() { return false; }

    @Override
    public <U> Option<U> map(Function<? super T, ? extends U> mapper) {
        requireNonNull(mapper);
        return (Option<U>) EMPTY;
    }
}

final class PresentOption<T> implements Option<T> {

    private final T value;

    PresentOption(T value) { this.value = requireNonNull(value); }

    @Override public T get() { return value; }

    @Override
    public boolean isPresent() { return true; }

    @Override
    public <U> Option<U> map(Function<? super T, ? extends U> mapper) {
        requireNonNull(mapper);
        return Option.ofNullable(mapper.apply(value));
    }
}

Only a few methods are shown for brevity but the principle remains the same: distinct implementations for the case where an element is present and when it is not. This gives a much clearer code and also opens up the possibility for anyone to implement optionals.

Analysis

I am confident that this type of solution was evaluated by the JDK team at the time Optional was conceived and I think it was a well-informed decision not to opt for this solution. Optional was primarily intended to “wrap” return values to protect from NPEs and other drawbacks of returning raw null values. I also think the design goal was that there should be little to negligible performance impact using Optional.

In the following, I speculate in some of the arguments to elect the present Optional implementation over the one coined above.

Profile Pollution

The JIT compiler compiles the Java byte code on-demand to improve performance over interpreting the byte code.

In order to do this efficiently, the JIT compiler is able to gather statistics for every known method. Each method can have a MethodData object that contains metrics on how the method is used and such an object is created once the JVM thinks the method is “warm” enough (i.e. has been called sufficiently in some sense).

The process of creating and maintaining MethodData is called “profiling”.

“Profile Pollution” occurs when the method is used substantially different between calls, including, but not limited to, providing alternating non-null/null elements and calling different polymorph methods (e.g. a parameter is generic of type T and the called method invokes T::equals). A cornerstone feature of Java is its ability to invoke methods dynamically. Thus, when Option::get is invoked, either EmptyOption::get or PresentOption::get is ultimately invoked depending on which implementation is present at the time of invocation.

Once the method has been invoked some 10,000 times, the JIT compiler is using the MethodData to create an efficient compiled code snippet that executes in the best way given the statistics gathered so far.

So, if elements are present all the time (using PresentOption) and the code is compiled with that in mind, but then there is an EmptyOption suddenly appearing, the code must “back out” and take a much slower code path.

With Optional in just one final class, there can never be any other implementation of the Optional methods and thus, no profile pollution due to different implementations. The JIT can make a deterministic and reasonably fast compiled code determination.

But wait, wouldn’t it be possible for the JVM to check all classes at startup and determine that there were, in fact, just two implementing classes of the Option and then it could figure the whole thing out? Well, no. We are free to add classes at any time so there would be no way of safely enumerating all possible implementations of a particular interface. At least not until we have real sealed classes in Java.

API pollution

If people were free to write custom implementations of Optional, then these implementations would most likely be suffering from design flaws/deviations compared to the built-in Optional. Also, people would likely let their own types implement the interface Optional adding to the burden of the JIT compiler/profiler and will thus tempt people to use composite types (e.g. Foo implements Bar, Optional<Bazz>) which was not intended.

Also, Optional is now an integral part of Java and as such, it can be made to efficiently evolve with the JDK itself including, perhaps, inline classes and other new upcoming Java features.

Optional vs. Streams

As opposed to Optional, java.util.stream.Stream and the specialized versions, like IntStream, are indeed interfaces. Why is not Stream a concrete single final class just like Optional?

Well, Streams have a completely different set of requirements. Streams can be obtained from a Collection or an array but there are far more powerful ways of obtaining a Stream. It is possible to acquire a Stream from a file, a socket, a random generator and even from tables in a database. These features would not be possible to implement if Stream was sealed.

Speedment Stream is an example of a library that allows standard Java Streams to be obtained from virtually any database. Read more about Speedment Stream here.

Conclusion

Optional is sealed and there are good reasons why. The internal implementation of Optional is less clear but that is a price worth paying with the benefits of better performance and clearer user code.

Streams are non-sealed interfaces that can be implemented by anyone and can be used to obtain elements from various sources including files and database tables. Speedment Stream ORM can be used to get Streams from database tables.

Download Speedment Stream here.

2 comments:

  1. You should read this: https://shipilev.net/blog/2015/black-magic-method-dispatch/. Taking "another" path is not that dramatic and ultimately is something you very rarely care about. Also that 10.000 calls is a myth that just would not die! https://stackoverflow.com/questions/53885832/why-is-server-option-there-when-server-vm-is-the-default-option/53887071#53887071 disclaimer, that is my answer. Interesting library, will look for sure! Very eager to find out how you handle parallelism and databases that support native "streaming" over it's entries.

    ReplyDelete
  2. Thanks for the link that looks very interesting. I am aware that the "10k" varies depending on a number of reasons. This could be a topic in a separate article. Thanks for your comment and let me know what you think about the library.

    ReplyDelete

Note: Only a member of this blog may post a comment.