Minborg's Java Pot

Friday, October 4, 2019

Become a Master of Java Streams - Part 1: Creating Streams

Declarative code (e.g. functional composition with Streams) provides superior code metrics in many cases. Code your way through this hands-on-lab article series and mature into a better Java programmer by becoming a Master of Java Streams.

The whole idea with Streams is to represent a pipeline through which data will flow and the pipeline’s functions operate on the data. This way, functional-style operations on Streams of elements can be expressed. This article is the first out of five where you will learn firsthand how to become a Master of Streams. We start with basic stream examples and progress with more complex tasks until you know how to connect standard Java Streams to databases in the Cloud.

Once you have completed all five articles, you will be able to drastically reduce your codebase and know how to write pure Java code for the entire applications in a blink.

Here is a summary of the upcoming articles:

Part 1: Creating Streams
Part 2: Intermediate Operations
Part 3: Terminal Operations
Part 4: Database Streams
Part 5: Creating a Database Application Using Streams

Since we are firm believers in the concept of ”Learning by doing”, the series is complemented by a GitHub repository that contains Stream exercises split into 5 Units - each corresponding to the topic of an article. Instructions on how to use the source code are provided in the README-file.

What are Java Streams?

The Java Stream interface was first introduced in Java 8 and, together with lambdas, acts as a milestone in the development of Java since it contributes greatly to facilitating a declarative (functional) programming style. If you want to learn more about the advantages of declarative coding we refer you to this article.

A Java Stream can be visualized as a pipeline through which data will flow (see the image below). The pipeline’s functions will operate on the data by e.g. filtering, mapping and sorting the items. Lastly, a terminal operation can be performed to collect the items in a preferred data structure such as a List, an Array or a Map. An important thing to notice is that a Stream can only be consumed once.

A Stream Pipeline contains three main parts; the stream source, the intermediate operation(s) (zero to many) and a terminal operation.

Let’s have a look at an example to get a glimpse of what we will be teaching throughout this series. We encourage you to look at the code below and try to figure out what the print-statement will result in before reading the next paragraph.

List<String> list = Stream.of("Monkey", "Lion", "Giraffe","Lemur")
    .filter(s -&gt; s.startsWith("L"))
    .map(String::toUpperCase)
    .sorted()
    .collect(toList());
System.out.println(list);

Since the Stream API is descriptive and most often intuitive to use, you will probably have a pretty good understanding of the meaning of these operations regardless if you have encountered them before or not. We start off with a Stream of a List containing four Strings, each representing an African animal. The operations then filter out the elements that start with the letter “L”, converts the remaining elements to uppercase letters, sorts them in natural order (which in this case means alphabetical order) and lastly collects them into a List. Hence, resulting in the output [“LEMUR”, “LION”].

It is important to understand that Streams are “lazy” in the sense that elements are “requested” by the terminal operation (in this case the .collect() statement). If the terminal operation only needs one element (like, for example, the terminal operation .findFirst()), then at most one element is ever going to reach the terminal operation and the reminding elements (if any) will never be produced by the source. This also means that just creating a Stream is often a cheap operation whereas consuming it might be expensive depending on the stream pipeline and the number of potential elements in the stream.

In this case, the Stream Source was a List although many other types can act as a data source. We will spend the rest of this article describing some of the most useful source alternatives.

Stream Sources

Streams are mainly suited for handling collections of objects and can operate on elements of any type T. Although, there exist three special Stream implementations; IntStream, LongStream, and DoubleStream which are restricted to handle the corresponding primitive types.

An empty Stream of any of these types can be generated by calling Stream.empty() in the following manner:

Stream<T>     Stream.empty()
IntStream     IntStream.empty()
LongStream    LongStream.empty()
DoubleStream  DoubleStream.empty()

Empty Streams are indeed handy in some cases, but the majority of the time we are interested in filling our Stream with elements. This can be accomplished in a large number of ways. We will start by looking at the special case of an IntStream since it provides a variety of useful methods.

Useful IntStreams

A basic case is generating a Stream over a small number of items. This can be accomplished by listing the integers using IntStream.of(). The code below yields a simple stream of elements 1, 2 and 3.

IntStream oneTwoThree = IntStream.of(1, 2, 3);

Listing all elements manually can be tedious if the number of items grows large. In the case where we are interested in values in a certain range, the command .rangeClosed() is more effective. The operation is inclusive, meaning that the following code will produce a stream of all elements from 1 to 9.

IntStream positiveSingleDigits = IntStream.rangeClosed(1, 9);

An even more powerful command is .iterate() which enables greater flexibility in terms of what numbers to include. Below, we show an example of how it can be used to produce a Stream of all numbers that are powers of two.

IntStream powersOfTwo = IntStream.iterate(1, i -> i * 2);

There are also several perhaps more unexpected ways of producing a Stream. The method chars() can be used to Stream over the characters in a String, in this case, the elements “A”, “B” and “C”.

IntStream chars = "ABC".chars();

There is also a simple way to generate a Stream of random integers.

IntStream randomInts = new Random().ints();

Stream an Array

Streaming existing data collections is another option. We can stream the elements of an existing Array or choose to list items manually using Stream.of() as previously shown and repeated below.

String[] array = {"Monkey", "Lion", "Giraffe", "Lemur"};
Stream<String> stream2 = Stream.of(array);

Stream<String> stream = Stream.of("Monkey", "Lion", "Giraffe", "Lemur");

Stream from a Collection

It is also very simple to stream any Collection. The examples below demonstrate how a List or Set can be streamed with the simple command .stream().

List<String> list = Arrays.asList("Monkey", "Lion", "Giraffe", "Lemur");
Stream<String> streamFromList = list.stream();

Set<String> set = new HashSet<>(list);
Stream<String> streamFromSet = set.stream();

Stream from a Text File

Sometimes it can also be useful to stream the contents of a text-file. The following command will provide a Stream<String> that holds every line from the referenced file as a separate element.

Stream<String> lines = Files.lines(Paths.get("file.txt"));

Exercise

Now that we have familiarized you with some of the ways of creating a Stream, we encourage you to clone this GitHub repo and start practicing. The content of the article will be enough to solve the first Unit which is called Create. The Unit1Create interface contains JavaDocs which describes the intended implementation of the methods in Unit1MyCreate.

public interface Unit1Create {
 /**
  * Creates a new Stream of String objects that contains
  * the elements "A", "B" and "C" in order.
  *
  * @return a new Stream of String objects that contains
  *   the elements "A", "B" and "C" in order
  */
  Stream<String> newStreamOfAToC();

The provided tests (e.g. Unit1MyCreateTest) will act as an automatic grading tool, letting you know if you solution was correct or not.

If you have not done so yet, go ahead and solve the work items in the Unit1MyCreate class. “Gotta catch ‘em all”.

In the next article, we will continue to describe several intermediate operations that can be applied to these Streams and that will convert them into other Streams. See you soon!

Authors

Per Minborg
Julia Gustafsson

Monday, September 2, 2019

Java Performance: For-looping vs. Streaming

Is counting upwards or downwards in a for-loop the most efficient way of iterating? Sometimes the answer is neither. Read this post and understand the impact of different iteration varieties.

Iteration Performance

There are many views on how to iterate with high performance. The traditional way of iterating in Java has been a for-loop starting at zero and then counting up to some pre-defined number:

private static final int ITERATIONS = 10_000;

@Benchmark
public int forUp() {
    int sum = 0;
    for (int i = 0; i < ITERATIONS; i++) {
        sum += i;
    }
    return sum;
}

Sometimes, we come across a for-loop that starts with a predetermined non-negative value and then it counts down instead. This is fairly common within the JDK itself, for example in the class String. Here is an example of solving the previous problem by counting down instead of up.

@Benchmark
public int forDown() {
    int sum = 0;
    for (int i = ITERATIONS; i-- > 0;) {
        sum += i;
    }
    return sum;
}

I think the rationale here is that checking how values relate to zero is potentially more efficient than testing how values relate to any other arbitrary value. In fact, all CPUs that I know about have machine code instructions that can check how a given value relates to zero. Another idea is that the count down idiom given above appears to only inspect the loop variable one time (it simultaneously checks the value and then decreases it) as opposed to the regular example at the top. I suspect that this has little or no influence on today’s efficient JIT compiler who will be able to optimize the first iteration just as good as the second. It might have an impact when the code runs in interpretation mode but this is not examined in this article.

Another way of doing the same thing using an IntStream looks like this:

@Benchmark
public int stream() {
    return IntStream.range(0, ITERATIONS)
        .sum();
}

If more performance is needed for large iterations, it is relatively easy to make the stream parallel by just adding a .parallel() operator to the stream. This is not examined in this article.

Performance under Graal VM

Running these tests under GraalVM (rc-11, with the new C2 compiler that ships with GraallVM) on my laptop (MacBook Pro mid 2015, 2.2 GHz Intel Core i7) gives the following:

Benchmark              Mode  Cnt       Score       Error  Units
ForBenchmark.forDown  thrpt    5  311419.166 ±  4201.724  ops/s
ForBenchmark.forUp    thrpt    5  309598.916 ± 12998.579  ops/s
ForBenchmark.stream   thrpt    5  312360.089 ±  8291.792  ops/s

It might come as a surprise for some that the stream solution is the fastest one, albeit by a margin that is well within error margins.

In a previous article, I presented some code metric advantages with streams and Declarative programming compared to traditional Imperative code. I have not tested performance for cold code sections (i.e. before the JIT kicks in).

Clever Math

From math, we recall that the sum of consecutive numbers starting at zero is N*(N+1)/2 where N is the highest number in the series. Running this benchmark:

@Benchmark
public int math() {
    return ITERATIONS * (ITERATIONS + 1) / 2;
}

gives us a performance increase of over 1,000 times over the previous implementations:

Benchmark           Mode  Cnt          Score          Error  Units
ForBenchmark.math  thrpt    5  395561077.984 ± 11138012.141  ops/s

The more iterations, the more gain. Cleverness sometimes trumps brute force.

Ultra-fast Data Streams

With Speedment HyperStream, it is possible to get similar performance with data from databases. Read more here on HyperStream.

Conclusions

On some commonly used hardware/JVMs, it does not matter if we iterate upwards or downwards in our for-loops. More modern JVMs are able to optimize stream iterations so they have equivalent or even better performance than for-loops.

Stream code is generally more readable compared to for-loops in my opinion and so, I believe streams are likely to be the de facto iteration contrivance in some future.

Database content can be streamed with high performance using Speedment HyperStream.

Thursday, August 15, 2019

Java: An Optional Implementation of Optional

The class java.util.Optional is implemented as a single immutable concrete class that internally handles two cases; one with an element and one without. Wouldn't it have been a better choice to let Optional be an interface and have two different implementations implement that interface instead? After all, that is what we generally are taught to do in an object-oriented language.

In this article, we will learn about some of the potential arguments for the current Optional implementation. We will also learn why Streams are implemented in a different way, enabling Streams to be obtained from files or even database tables.

The Real Optional Implementation

The real java.util.Optional::get is implemented as shown hereunder:

public T get() {
        if (value == null) {
            throw new NoSuchElementException("No value present");
        }
        return value;
    }

As can be seen, there are two code paths; one where the value is null (no element and an exception is thrown) and one when the value is something else (the value is returned).

An Optional Optional Implementation

Let’s pretend that we would go back in a time machine and were tasked to implement Optional once again. I think it is likely that many of us would come up with an initial solution much like the one below (I have named the hypothetical interface Option so we can distinguish it from the “real” one) with two distinct implementations (here EmptyOption and PresentOption):

public interface Option<T> {

    T get();

    boolean isPresent();

    public <U> Option<U> map(Function<? super T, ? extends U> mapper);

    static <T> Option<T> empty() { return (Option<T>) EmptyOption.EMPTY; }

    static <T> Option<T> of(T value) { return new PresentOption<>(value); }

    static <T> Option<T> ofNullable(T value) {
        return value == null ? empty() : of(value);
    }

}

final class EmptyOption<T> implements Option<T> {

    static final EmptyOption<?> EMPTY = new EmptyOption<>();

    private EmptyOption() {}

    @Override public T get() { throw new NoSuchElementException(); }

    @Override public boolean isPresent() { return false; }

    @Override
    public <U> Option<U> map(Function<? super T, ? extends U> mapper) {
        requireNonNull(mapper);
        return (Option<U>) EMPTY;
    }
}

final class PresentOption<T> implements Option<T> {

    private final T value;

    PresentOption(T value) { this.value = requireNonNull(value); }

    @Override public T get() { return value; }

    @Override
    public boolean isPresent() { return true; }

    @Override
    public <U> Option<U> map(Function<? super T, ? extends U> mapper) {
        requireNonNull(mapper);
        return Option.ofNullable(mapper.apply(value));
    }
}

Only a few methods are shown for brevity but the principle remains the same: distinct implementations for the case where an element is present and when it is not. This gives a much clearer code and also opens up the possibility for anyone to implement optionals.

Analysis

I am confident that this type of solution was evaluated by the JDK team at the time Optional was conceived and I think it was a well-informed decision not to opt for this solution. Optional was primarily intended to “wrap” return values to protect from NPEs and other drawbacks of returning raw null values. I also think the design goal was that there should be little to negligible performance impact using Optional.

In the following, I speculate in some of the arguments to elect the present Optional implementation over the one coined above.

Profile Pollution

The JIT compiler compiles the Java byte code on-demand to improve performance over interpreting the byte code.

In order to do this efficiently, the JIT compiler is able to gather statistics for every known method. Each method can have a MethodData object that contains metrics on how the method is used and such an object is created once the JVM thinks the method is “warm” enough (i.e. has been called sufficiently in some sense).

The process of creating and maintaining MethodData is called “profiling”.

“Profile Pollution” occurs when the method is used substantially different between calls, including, but not limited to, providing alternating non-null/null elements and calling different polymorph methods (e.g. a parameter is generic of type T and the called method invokes T::equals). A cornerstone feature of Java is its ability to invoke methods dynamically. Thus, when Option::get is invoked, either EmptyOption::get or PresentOption::get is ultimately invoked depending on which implementation is present at the time of invocation.

Once the method has been invoked some 10,000 times, the JIT compiler is using the MethodData to create an efficient compiled code snippet that executes in the best way given the statistics gathered so far.

So, if elements are present all the time (using PresentOption) and the code is compiled with that in mind, but then there is an EmptyOption suddenly appearing, the code must “back out” and take a much slower code path.

With Optional in just one final class, there can never be any other implementation of the Optional methods and thus, no profile pollution due to different implementations. The JIT can make a deterministic and reasonably fast compiled code determination.

But wait, wouldn’t it be possible for the JVM to check all classes at startup and determine that there were, in fact, just two implementing classes of the Option and then it could figure the whole thing out? Well, no. We are free to add classes at any time so there would be no way of safely enumerating all possible implementations of a particular interface. At least not until we have real sealed classes in Java.

API pollution

If people were free to write custom implementations of Optional, then these implementations would most likely be suffering from design flaws/deviations compared to the built-in Optional. Also, people would likely let their own types implement the interface Optional adding to the burden of the JIT compiler/profiler and will thus tempt people to use composite types (e.g. Foo implements Bar, Optional<Bazz>) which was not intended.

Also, Optional is now an integral part of Java and as such, it can be made to efficiently evolve with the JDK itself including, perhaps, inline classes and other new upcoming Java features.

Optional vs. Streams

As opposed to Optional, java.util.stream.Stream and the specialized versions, like IntStream, are indeed interfaces. Why is not Stream a concrete single final class just like Optional?

Well, Streams have a completely different set of requirements. Streams can be obtained from a Collection or an array but there are far more powerful ways of obtaining a Stream. It is possible to acquire a Stream from a file, a socket, a random generator and even from tables in a database. These features would not be possible to implement if Stream was sealed.

Speedment Stream is an example of a library that allows standard Java Streams to be obtained from virtually any database. Read more about Speedment Stream here.

Conclusion

Optional is sealed and there are good reasons why. The internal implementation of Optional is less clear but that is a price worth paying with the benefits of better performance and clearer user code.

Streams are non-sealed interfaces that can be implemented by anyone and can be used to obtain elements from various sources including files and database tables. Speedment Stream ORM can be used to get Streams from database tables.

Download Speedment Stream here.

Monday, August 5, 2019

Java: Benefit from Inline Class Properties Starting from Java 8

In some years, Java will hopefully have an “inline class" feature which solves many challenges with the current state of Java. Read this article and learn how to use Java 8 and upwards today, and still benefit from some of the advantages of the coming inline object arrays such as; no indirect pointers, eliminated object header overhead and improved data locality.

In this article, we will learn how we can write a short class named InlineArray that supports many of the future inline class features. We will also take a look at Speedment HyperStream, an existing Java tool that is using similar means of operation.

Background

Since 1995, when it made perfect sense, an array of Objects in Java consists of an array which in turn holds a number of references to other objects which are ultimately spread out on the heap.

Here is how an array with two initial Point objects is laid out on the heap in Java today:

 Array
+======+
|Header|
+------+      Point 0
|ref 0 |---> +======+
+------+     |Header|       Point 1
|ref 1 |---- +------+ ---> +======+
+------+     |x     |      |Header|
|null  |     +------+      +------+
+------+     |y     |      |x     |
|null  |     +------+      +------+
+------+                   |y     |
|...   |                   +------+
+------+

However, over time, the execution pipeline of a typical CPUs has evolved tremendously with an incredible computation performance increase. On the other hand, the speed of light has remained constant and so, the latency of loading data from main memory has unfortunately remained within the same order of magnitude. The balance between computing and retrieving has skewed over in favor of computing.

Accessing main memory these days becomes a thing we want to avoid, much like we wanted to avoid loading data from spinning disks back in the days.

Evidently, the current Object array layout implies several drawbacks such as:

Double memory access (due to the indirect reference pointers in the array)
Reduced locality of data (because array objects are laid out on different places on the heap)
Increased memory footprint (because all the objects referenced in the array are Objects and therefore holds additional Class and synchronization information).

Inline Classes

Within the Java community, there is now a major effort going on to introduce “inline classes” (previously known as “value classes”). The current state of this effort (per July 2019) was presented by Brian Goetz in this video titled “Project Valhalla Update (2019 edition)”. No one knows when this feature will be available in an official Java release. My personal guess is sometime after 2021.

Here is how an array of inline Point objects would be laid out, once this feature becomes available:

 Array
+======+
|Header|
+------+
|x     |
+------+
|y     |
+------+
|x     |
+------+
|y     |
+------+
|...   |
+------+

As can be seen, this scheme consumes less memory (no Point headers), improves locality (data is sequentially laid out in memory) and data can be accessed directly without following indirect reference pointers. On the flip side, we lose the concept of object identity which will be discussed later in this article.

Emulating Some Inline Class Properties

In the following, we will implement an emulation of some of the properties of inline classes. Is should be noted that all examples below can be run on standard Java 8 and upwards already now.

Assume that we have an interface Point with X and Y getters as described here:

public interface Point { int x(); int y(); }

We could then trivially create an immutable implementation of the Point interface as shown hereunder:

public final class VanillaPoint implements Point {

    private final int x, y;

    public VanillaPoint(int x, int y) {
        this.x = x;
        this.y = y;
    }

    @Override public int x() { return x; }

    @Override public int y() { return y; }

    // toString(), equals() and hashCode() not shown for brevity

}

Further, assume that we are willing to give up the Object/identity properties of Point objects in arrays. This means, among other things, that we cannot synchronize or perform identity operations (such as == and System::identityHashCode)

The idea here is to create a memory region that we can work with directly at byte level and flatten out our objects there. This memory region could be encapsulated in a generic class called InlineArray<T> like this:

public final class InlineArray<T> {

    private final ByteBuffer memoryRegion;
    private final int elementSize;
    private final int length;
    private final BiConsumer<ByteBuffer, T> deconstructor;
    private final Function<ByteBuffer,T> constructor;
    private final BitSet presentFlags;

    public InlineArray(
        int elementSize,
        int length,
        BiConsumer<ByteBuffer, T> deconstructor,
        Function<ByteBuffer,T> constructor
    ) {
        this.elementSize = elementSize;
        this.length = length;
        this.deconstructor = requireNonNull(deconstructor);
        this.constructor = requireNonNull(constructor);
        this.memoryRegion = ByteBuffer.allocateDirect(elementSize * length);
        this.presentFlags = new BitSet(length);
    }

    public void put(int index, T value) {
        assertIndexBounds(index);
        if (value == null) {
            presentFlags.clear(index);
        } else {
            position(index);
            deconstructor.accept(memoryRegion, value);
            presentFlags.set(index);
        }
    }

    public T get(int index) {
        assertIndexBounds(index);
        if (!presentFlags.get(index)) {
            return null;
        }
        position(index);
        return constructor.apply(memoryRegion);
    }

    public int length() {
        return length;
    }

    private void assertIndexBounds(int index) {
        if (index < 0 || index >= length) {
            throw new IndexOutOfBoundsException("Index [0, " + length + "), was:" + index);
        }
    }

    private void position(int index) {
        memoryRegion.position(index * elementSize);
    }

}

Note that this class can handle any type of element (of type T) than can be deconstructed (serialized) to bytes provided that it has a maximum element size. The class is most efficient if all the elements have the same element size as Point does (i.e. always Integer.BYTES * 2 = 8 bytes). Further note that the class is not thread-safe, but that this can be added at the expense of introducing a memory barrier and, depending on solution, use separate views of the ByteBuffer.

Now, suppose we want to allocate an array of 10 000 Points. Armed with the new InlineArray class we can proceed like this:

public class Main {

    public static void main(String[] args) {

        InlineArray<Point> pointArray = new InlineArray<>(
            Integer.BYTES * 2, // The max element size
            10_000,
            (bb, p) -> {bb.putInt(p.x()); bb.putInt(p.y());},
            bb -> new VanillaPoint(bb.getInt(), bb.getInt())
        );

        Point p0 = new VanillaPoint(0, 0);
        Point p1 = new VanillaPoint(1, 1);

        pointArray.put(0, p0); // Store p0 at index 0
        pointArray.put(1, p1); // Store p1 at index 1

        System.out.println(pointArray.get(0)); // Should produce (0, 0)
        System.out.println(pointArray.get(1)); // Should produce (1, 1)
        System.out.println(pointArray.get(2)); // Should produce null

    }

}

As expected, the code will produce the following output when run:

VanillaPoint{x=0, y=0}
VanillaPoint{x=1, y=1}
null

Note how we provide an element deconstructor and element constructor to the InlineArray telling it how it should deconstruct and construct the Point objects to and from linear memory.

Emulation Properties

The emulation above will probably not get the same performance gains as real inline classes but the savings in terms of memory allocation and locality will be about the same. The emulation above is allocating memory off-heap so your garbage collection times will not be affected by element data put in the InlineArray. The elements in the ByteBuffer are laid out just like the proposed inline class array:

 Array
+======+
|Header|
+------+
|x     |
+------+
|y     |
+------+
|x     |
+------+
|y     |
+------+
|...   |
+------+

Because we use ByteBuffer objects that are indexed with an int, the backing memory region becomes limited to 2^31 bytes. This means, for example, we can only put 2^(31-3) = 2^28 ≈ 268 million Point elements in the array (because each point occupies 2^3 = 8 bytes) before we run out of address space. Real implementations can overcome this limitation by using several ByteBuffers, Unsafe or libraries like Chronicle Bytes.

Lazy Entities

Given the InlineArray class, it is fairly easy to provide elements from the InlineArray that are lazy, in the sense that they do not have to deserialize all fields eagerly when an element is retrieved from the array. This is how it can be done:

First, we create another implementation of the Point interface that takes its data from a backing ByteBuffer itself rather than from local fields:

public final class LazyPoint implements Point {

    private final ByteBuffer byteBuffer;
    private final int position;

    public LazyPoint(ByteBuffer byteBuffer) {
        this.byteBuffer = byteBuffer;
        this.position = byteBuffer.position();
    }

    @Override
    public int x() {
        return byteBuffer.getInt(position);
    }

    @Override
    public int y() {
        return byteBuffer.getInt(position + Integer.BYTES);
    }

    // toString(), equals() and hashCode() not shown for brevity

}

Then, we just replace the deserializer pased to the constructor of the InlineArray like this:

InlineArray pointArray = new InlineArray<>(
    Integer.BYTES * 2,
    10_000,
    (bb, p) -> {bb.putInt(p.x()); bb.putInt(p.y());},
    LazyPoint::new // Use this deserializer instead
);

If used in the same main method as above, this will produce the following output:

LazyPoint{x=0, y=0}
LazyPoint{x=1, y=1}
null

Cool. This is particularly useful for entities with tens or even hundreds of fields and where just a limited subset of the fields are ever accessed for the problem at hand.

A drawback with this approach is that if just a single LazyPoint reference is retained in our application, it prevents the entire backing ByteBuffer from being garbage collected. So, any lazy entities like these are best used as short-lived objects.

Using Large Collections of Data

What if we want to use very large collections of data (e.g. in the terabytes), perhaps from a database or from files, and store them efficiently in-JVM-memory and then be able to work with these collections to improve computational performance? Can we use this type of technology?

Speedment HyperStream is a product that leverages a similar technology to be able to provide database data as standard Java Streams and has been available for some time now. HyperStream lays out data similar to above and can hold terabytes of data in a single JVM with little or no Garbage Collection impact because data is stored off-heap. It can use in-place deserialization to obtain single fields directly from the backing memory region, thereby avoiding unnecessary full deserialization of entities. Its standard Java Streams are deterministic ultra-low latency that can construct and consume streams in under 100 ns in some cases.

Here is an example of how HyperStream (which implements a standard Java Stream) can be used in an application when paging between films. The Manager films variable is provided by Speedment automatically:

private Stream<Film> getPage(int page, Comparator<Film> comparator) {
    return films.stream()
        .sorted(comparator)
        .skip(page * PAGE_SIZE)
        .limit(PAGE_SIZE)
    }

Even though there might be trillions of films, the method will typically complete in less than a microsecond as the Stream is connected directly to RAM and are using in-memory indexes.

Read more about Speedment HyperStream performance here.

Evaluate the performance in your own database applications by downloading Speedment HyperStream here.

Resources

Project Valhalla https://openjdk.java.net/projects/valhalla/
Speedment HyperStream https://www.speedment.com/hyperstream/
Speedment Initializer https://www.speedment.com/initializer/

Thursday, August 1, 2019

Why Declarative Coding Makes You a Better Programmer

Declarative solutions with functional composition provide superior code metrics over legacy imperative code in many cases. Read this article and understand how to become a better programmer using declarative code with functional composition.

In this article, we will take a closer look at three problem examples and examine two different techniques (Imperative and Declarative) for solving each of these problems.

All source code in this article is open-source and available at https://github.com/minborg/imperative-vs-declarative. In the end, we will also see how the learnings of this article can be applied in the field of database applications. We will use Speedment Stream as an ORM tool, since it provides standard Java Streams that correspond to tables, views and joins from databases and supports declarative constructs.

There is literally an infinite number of example candidates that can be used for code metrics evaluation.

Problem Examples

In this article, I have selected three common problems we developers might face over the course of our job days:

SumArray

Iterating over an array and perform a calculation

GroupingBy

Aggregating values in parallel

Rest

Implementing a REST interface with pagination

Solution Techniques

As implied at the beginning of this article, we will be solving said problems using these two coding techniques:

Imperative

An Imperative Solution in which we use traditional code styles with for-loops and explicitly mutable states.

Declarative

A Declarative Solution where we compose various functions to form a higher-order composite function that solves the problem, typically using java.util.stream.Stream or variants thereof.

Code Metrics

The idea is then to use static code analysis applied to the different solutions using SonarQube (here SonarQube Community Edition, Version 7.7) ) so that we may derive useful and standardized code metrics for the problem/solution combinations. These metrics would then be compared.

In the article, we will be using the following code metrics:

LOC

“LOC” means “Lines-Of-Code” and is the number of non-empty lines in the code.

Statements

Is the total number of statements in the code. There could be zero to many statements on each code line.

Cyclomatic Complexity

Indicates the complexity of the code and is a quantitative measure of the number of linearly independent paths through a program's source code. For example, a single “if” clause presents two separate paths through the code. Read more here on Wikipedia.

Cognitive Complexity

SonarCube claims that “Cognitive Complexity breaks from the practice of using mathematical models to assess software maintainability. It starts from the precedents set by Cyclomatic Complexity, but uses human judgment to assess how structures should be counted and to decide what should be added to the model as a whole. As a result, it yields method complexity scores which strike programmers as fairer relative assessments of maintainability than have been available with previous models.” Read more here on SonarCube’s own page.

More often than not, it is desirable to conceive a solution where these metrics are small, rather than large.

For the record, it should be noted that any solution devised below is just one way of solving any given problem. Let me know if you know a better solution and feel free to submit a pull request via https://github.com/minborg/imperative-vs-declarative.

Iterating over an Array

We start off with an easy one. The object with this problem example is to compute the sum of the elements in an int array and return the result as a long. The following interface defines the problem:

public interface SumArray {

    long sum(int[] arr);
}

Imperative Solution

The following solution implements the SumArray problem using an imperative technique:

public class SumArrayImperative implements SumArray {

    @Override
    public long sum(int[] arr) {
        long sum = 0;
        for (int i : arr) {
            sum += i;
        }
        return sum;
    }

}

Declarative Solution

Here is a solution that implements SumArray using a declarative technique:

public class SumArrayDeclarative implements SumArray {

    @Override
    public long sum(int[] arr) {
        return IntStream.of(arr)
            .mapToLong(i -> i)
            .sum();
    }
}

Note that IntStream::sum only returns an int and therefore we have to apply the intermediate operation mapToLong().

Analysis

SonarQube provides the following analysis:

The code metrics for SumArray are shown in the following table (lower is generally better):

Technique	LOC	Statements	Cyclomatic Complexity	Cognitive Complexity
Imperative	12	5	2	1
Functional	11	2	2	0

This is how it looks in a graph (lower is generally better):

Aggregating Values in Parallel

The object with this problem example is to group Person objects into different buckets, where each bucket constitutes a unique combination of the birth year of a person and the country that a person is working in. For each group, the average salary shall be computed. The aggregation shall be computed in parallel using the common ForkJoin pool.

This is how the (immutable) Person class looks like:

public final class Person {

    private final String firstName;
    private final String lastName;
    private final int birthYear;
    private final String country;
    private final double salary;

    public Person(String firstName, 
                  String lastName, 
                  int birthYear, 
                  String country, 
                  double salary) {
        this.firstName = requireNonNull(firstName);
        this.lastName = requireNonNull(lastName);
        this.birthYear = birthYear;
        this.country = requireNonNull(country);
        this.salary = salary;
    }

    public String firstName() { return firstName; }
    public String lastName() { return lastName; }
    public int birthYear() { return birthYear; }
    public String country() { return country; }
    public double salary() { return salary; }

    // equals, hashCode and toString not shown for brevity
}

We have also defined another immutable class called YearCountry that shall be used as the grouping key:

public final class YearCountry {

    private final int birthYear;
    private final String country;

    public YearCountry(Person person) {
        this.birthYear = person.birthYear();
        this.country = person.country();
    }

    public int birthYear() { return birthYear; }
    public String country() { return country; }

    // equals, hashCode and toString not shown for brevity
}

Having defined these two classes, we can now define this problem example by means of this interface:

public interface GroupingBy {

    Map<YearCountry, Double> average(Collection<Person> persons);

}

Imperative Solution

It is non-trivial to implement an imperative solution to the GroupingBy example problem. Here is one solution that solves the problem:

public class GroupingByImperative implements GroupingBy {

    @Override
    public Map<YearCountry, Double> average(Collection<Person> persons) {
        final List<Person> personList = new ArrayList<>(persons);
        final int threads = ForkJoinPool.commonPool().getParallelism();
        final int step = personList.size() / threads;

        // Divide the work into smaller work items
        final List<List<Person>> subLists = new ArrayList<>();
        for (int i = 0; i < threads - 1; i++) {
            subLists.add(personList.subList(i * step, (i + 1) * step));
        }
        subLists.add(personList.subList((threads - 1) * step, personList.size()));


        final ConcurrentMap<YearCountry, AverageAccumulator> accumulators = new ConcurrentHashMap<>();
        // Submit the work items to the common ForkJoinPool
        final List<CompletableFuture<Void>> futures = new ArrayList<>();
        for (int i = 0; i < threads; i++) {
            final List<Person> subList = subLists.get(i);
            futures.add(CompletableFuture.runAsync(() -> average(subList, accumulators)));
        }

        // Wait for completion
        for (int i = 0; i < threads; i++) {
            futures.get(i).join();
        }

        // Construct the result
        final Map<YearCountry, Double> result = new HashMap<>();
        accumulators.forEach((k, v) -> result.put(k, v.average()));

        return result;
    }

    private void average(List<Person> subList, ConcurrentMap<YearCountry, AverageAccumulator> accumulators) {
        for (Person person : subList) {
            final YearCountry bc = new YearCountry(person);
            accumulators.computeIfAbsent(bc, unused -> new AverageAccumulator())
                .add(person.salary());
        }
    }

    private final class AverageAccumulator {
        int count;
        double sum;

        synchronized void add(double term) {
            count++;
            sum += term;
        }

        double average() {
            return sum / count;
        }
    }

}

Declarative Solution

Here is a solution that implements GroupingBy using a declarative construct:

public class GroupingByDeclarative implements GroupingBy {

    @Override
    public Map<YearCountry, Double> average(Collection<Person> persons) {
        return persons.parallelStream()
            .collect(
                groupingBy(YearCountry::new, averagingDouble(Person::salary))
            );
    }
}

In the code above, I have used some static imports from the Collectors class (e.g. Collectors::groupingBy). This does not affect the code metrics.

Analysis

SonarQube provides the following analysis:

The code metrics for GroupingBy are shown in the following table (lower is better):

Technique	LOC	Statements	Cyclomatic Complexity	Cognitive Complexity
Imperative	52	27	11	4
Functional	17	1	1	0

The corresponding graph looks like this (lower is generally better):

Implementing a REST Interface

In this exemplary problem, we are to provide a pagination service for Person objects. Persons appearing on a page must satisfy some (arbitrary) conditions and are to be sorted in a certain given order. The page shall be returned as an unmodifiable List of Person objects.

Here is an interface that captures the problem:

public interface Rest {

/**
 * Returns an unmodifiable list from the given parameters.
 *
 * @param persons as the raw input list
 * @param predicate to select which elements to include
 * @param order in which to present persons
 * @param page to show. 0 is the first page
 * @return an unmodifiable list from the given parameters
 */
 List<Person> page(List<Person> persons, 
                   Predicate<Person> predicate,
                   Comparator<Person> order,
                   int page);
}

The size of a page is given in a separate utility class called RestUtil:

public final class RestUtil {
    private RestUtil() {}

    public static final int PAGE_SIZE = 50;
}

Imperative Solution

Here is an imperative implementation of the Rest interface:

public final class RestImperative implements Rest {

    @Override
    public List<Person> page(List<Person> persons, 
                             Predicate<Person> predicate, 
                             Comparator<Person> order, 
                             int page) {
        final List<Person> list = new ArrayList<>();
        for (Person person:persons) {
            if (predicate.test(person)) {
                list.add(person);
            }
        }
        list.sort(order);
        final int from = RestUtil.PAGE_SIZE * page;
        if (list.size() <= from) {
            return Collections.emptyList();
        }
        return unmodifiableList(list.subList(from, Math.min(list.size(), from + RestUtil.PAGE_SIZE)));
    }
}

Declarative Solution

The following class implements the Rest interface in a declarative way:

public final class RestDeclarative implements Rest {

    @Override
    public List<Person> page(List<Person> persons,
                             Predicate<Person> predicate, 
                             Comparator<Person> order,
                             int page) {
        return persons.stream()
            .filter(predicate)
            .sorted(order)
            .skip(RestUtil.PAGE_SIZE * (long) page)
            .limit(RestUtil.PAGE_SIZE)
            .collect(collectingAndThen(toList(), Collections::unmodifiableList));
    }
}

Analysis

SonarQube provides the following analysis:

The following table shows the code metrics for Rest (lower is generally better):

Technique	LOC	Statements	Cyclomatic Complexity	Cognitive Complexity
Imperative	27	10	4	4
Functional	21	1	1	0

Here, the same numbers are shown in a graph (again lower is generally better):

Java 11 Improvements

The examples above were written in Java 8. With Java 11, we could shorten our declarative code using LVTI (Local Variable Type Inference). This would make our code a bit shorter but would not affect code metrics.

@Override
public List<Person> page(List<Person> persons,
                         Predicate<Person> predicate, 
                         Comparator<Person> order, 
                         int page) {
    final var list = new ArrayList<Person>();
    ...

Compared to Java 8, Java 11 contains some new collectors. For example, the Collectors.toUnmodifiableList() which would make our declarative Rest solution a bit shorter:

public final class RestDeclarative implements Rest {

@Override
public List<Person> page(List<Person> persons,
                         Predicate<Person> predicate, 
                         Comparator<Person> order, 
                         int page) {
    return persons.stream()
        .filter(predicate)
        .sorted(order)
        .skip(RestUtil.PAGE_SIZE * (long) page)
        .limit(RestUtil.PAGE_SIZE)
        .collect(toUnmodifiableList());
}

Again, this will not impact the code metrics.

Summary

Averaging the code metrics for our three exemplary problems yields the following result (lower is generally better) :

Given the input requirements in this article, there is a remarkable improvement for all code metrics when we go from imperative to declarative constructs.

Use Declarative Constructs in Database Applications

In order to reap the benefits of declarative constructs in database applications, we have used Speedment Stream. Speedment Stream is a Stream-based Java ORM tool that can turn any database table/view/join into Java streams and thereby allows you to apply your declarative skills in database applications.

Your database applications code will get much better. In fact, a pagination REST solution with Speedment and Spring Boot against a database might be expressed like this:

public Stream<Person> page(Predicate<Person> predicate, 
                           Comparator<Person> order, 
                           int page) {
    return persons.stream()
        .filter(predicate)
        .sorted(order)
        .skip(RestUtil.PAGE_SIZE * (long) page)
        .limit(RestUtil.PAGE_SIZE);
}

Where the Manager<Person> persons is provided by Speedment and constitutes a handle to the database table “Person” and can be @AutoWired via Spring.

Conclusions

Choosing declarative over imperative solutions can reduce general code complexity massively and can provide many benefits including faster coding, better code quality, improved readability, less testing, reduced maintenance costs and more.

In order to benefit from declarative constructs within database applications, Speedment Stream is a tool that can provide standard Java Streams directly from the database.

Mastering declarative constructs and functional composition is a must for any contemporary Java developer these days.

Resources

Article Source Code: https://github.com/minborg/imperative-vs-declarative
SonarQube: https://www.sonarqube.org/
Speedment Stream: https://speedment.com/stream/
Speedment Initializer: https://www.speedment.com/initializer/

Wednesday, July 31, 2019

Java: ChronicleMap Part 3, Fast Microservices

Standard Java Maps needs to be initialized upon startup. Learn how to leverage ChronicleMaps that is initializable from a file and reduce microservice startup times significantly and how to share Maps between JVMs.

The built-in Map implementations, such as HashMap and ConcurrentHashMap are fast but they must be initialized with mappings before they can be used for looking up values. Also, they are limited in size by practical means such as heap and RAM size. Lastly, they are local to the JVM it runs in.

The initialization process can slow down critical startup for microservices, especially when reading mappings from a remote REST interface or a remote database. In this article, you will learn how you can start your microservice applications in seconds instead of minutes by using memory-mapped ChronicleMap instances and how Maps can be shared between JVMs in this third article in an article series about CronicleMap.

Read more about the fundamentals of CronicleMap in the first article.

Read more about file mapped CronicleMap objects in the second article.

Creating a Shared Map

As described in the second article in the series, we can easily create a file mapped Map like this:

private static Map<Long, Point> createFileMapped() {
    try {
        return ChronicleMap
            .of(Long.class, Point.class)
            .averageValueSize(8)
            .valueMarshaller(PointSerializer.getInstance())
            .entries(10_000_000)
            .createPersistedTo(new File("my-map"));

    } catch (IOException ioe) {
        throw new RuntimeException(ioe);
    }
}

Created Map objects can now be accessed by any JVM that has access to the “my-map” file. Updates to the maps will be shared among the participating JVMs via the shared file.

Initializing the Map

As also shown in the second article, we could create and initialize a Map like this:

final Map<Long, Point> m3 = LongStream.range(0, 10_000_000)
    .boxed()
        .collect(
            toMap(
                Function.identity(),
                FillMaps::pointFrom,
                (u, v) -> {
                    throw new IllegalStateException();
                },
                FillMaps::createFileMapped
            )
        );

When running on my laptop (MacBook Pro mid 2015, 16 GB, 2.2 GHz Intel Core i7), it takes about 10 seconds to create and fill the Map with 10 million entries.

If the Map contents were retrieved externally (as opposed to being created locally by the pointFrom() method), it would likely take much longer time to fill the Map. For example, if we get 50 Mbit/s REST throughput and each JSON Point representation consumes 25 bytes, then it would take some 60 seconds to fill the Map.

Starting a new JVM

Now that there is a pre-existing mapped file, we can start directly off this file as shown in this snippet:

return ChronicleMap
    .of(Long.class, Point.class)
    .averageValueSize(8)
    .valueMarshaller(PointSerializer.getInstance())
    .entries(10_000_000)
    .createOrRecoverPersistedTo(new File("my-map"));

This will create a Map directly from the existing “my-map” file.

Running this on my laptop will yield a start time of 5 seconds. This could be compared to the 60 second REST example, yielding a 90% startup time reduction.

Running Several JVMs on the Same Node

We could elect to run several JVMs on the same physical server node. By doing so, we benefit from the OS’es ability to make mappings of the file available for each JVM by exposing shared memory. This constitutes an efficient and low latency means of communication between the JVMs. The fact that there is a common pool of mapped memory makes the memory management much more efficient compared to a situation where each and every JVM/OS would have to maintain its own separate mappings.

Summary

ChronicleMaps can be shared between participating JVM via shared files
Startup times can be reduced significantly using shared files
If JVMs are running on the same physical machine, performance and efficiency is further improved
Shared files via ChronicleMap provides a low latency means of communication between JVMs

Tuesday, July 30, 2019

Java: ChronicleMap Part 2, Super RAM Maps

The standard Java Maps, such as the ubiquitous HashMap, are ultimately limited by the available RAM. Read this article and learn how you can create Java Maps with virtually unlimited sizes even exceeding the target machine’s RAM size.

The built-in Map implementations, such as HashMap and ConcurrentHashMap work fine as long as they are relatively small. In all cases, they are limited by the available heap and therefore eventually the available RAM size. ChronicleMap can store its contents in files, thereby circumventing this limitation, opening up for terabyte-sized mappings as shown in this second article in an article series about CronicleMap.

Read more about the fundamentals of CronicleMap in my previous first article.

File Mapping

Mapping of a file is made by invoking the createPersistedTo() method on a ChronicleMap builder as shown in the method below:

private static Map<Long, Point> createFileMapped() {
   try {
        return ChronicleMap
            .of(Long.class, Point.class)
            .averageValueSize(8)
            .valueMarshaller(PointSerializer.getInstance())
            .entries(10_000_000)
            .createPersistedTo(new File("my-map"));

    } catch (IOException ioe) {
        throw new RuntimeException(ioe);
    }
}

This will create a Map that will layout its content in a memory-mapped file named “my-map” rather than in direct memory. The following example shows how we can create 10 million Point objects and store them all in a file mapped map:

final Map<Long, Point> m3 = LongStream.range(0, 10_000_000)
    .boxed()
    .collect(
        toMap(
            Function.identity(),
            FillMaps::pointFrom,
            (u, v) -> {
                throw new IllegalStateException();
           },
           FillMaps::createFileMapped
       )
   );

The following command shows the newly created file:

Pers-MacBook-Pro:target pemi$ ls -lart my-map 
-rw-r--r--  1 pemi  staff  330305536 Jul 10 16:56 my-map

As can be seen, the file is about 33 MB and thus, each entry occupies 33 bytes on average.

Persistence

When the JVM terminates, the mapped file is still there, making it easy to pick up a previously created map including its content. This works much like a rudimentary superfast database. Here is how we can start off from an existing file:

return ChronicleMap
    .of(Long.class, Point.class)
    .averageValueSize(8)
    .valueMarshaller(PointSerializer.getInstance())
    .entries(10_000_000)
    .createOrRecoverPersistedTo(new File("my-map"));

The Map will be available directly, including its previous content.

Java Map Exceeding RAM Limit

One interesting aspect of memory-mapped files is that they can exceed both the heap and RAM limits. The file mapping logic will make sure that the parts being currently used are loaded into RAM on demand. The mapping logic will also retain recent portions of accessed mapped memory in physical memory to improve performance. This occurs behind-the-scenes and need not be managed by the application itself.

My desktop computer is an older MacBook Pro with only 16GB of memory (Yes, I know that sucks). Nevertheless, I can allocate a Map with 1 billion entries potentially occupying 33 * 1,000,000,000 = 33 GB memory (We remember from above that each entry occupied 33 bytes on average). The code looks like this:

return ChronicleMap
    .of(Long.class, Point.class)
    .averageValueSize(8)
    .valueMarshaller(PointSerializer.getInstance())
    .entries(1_000_000_000)
    .createPersistedTo(new File("huge-map"));

Even though I try to create a Java Map with 2x my RAM size, the code runs flawlessly and I get this file:

Pers-MacBook-Pro:target pemi$ ls -lart | grep huge-map 
-rw-r--r--   1 pemi  staff  34573651968 Jul 10 18:52 huge-map

Needless to say, you should make sure that the file you are mapping to is located on a file system with high random access performance. For example, a filesystem located on a local SSD.

Summary

ChronicleMap can be mapped to an external file
The mapped file is retained when the JVM exits
New applications can pick up an existing mapped file
ChronicleMap can hold more data than there is RAM
Mapped files are best placed on file systems with high random access performance