Minborg

Minborg
Minborg

Monday, September 2, 2019

Java Performance: For-looping vs. Streaming

Is counting upwards or downwards in a for-loop the most efficient way of iterating? Sometimes the answer is neither. Read this post and understand the impact of different iteration varieties.

Iteration Performance

There are many views on how to iterate with high performance. The traditional way of iterating in Java has been a for-loop starting at zero and then counting up to some pre-defined number:

private static final int ITERATIONS = 10_000;

@Benchmark
public int forUp() {
    int sum = 0;
    for (int i = 0; i < ITERATIONS; i++) {
        sum += i;
    }
    return sum;
}
Sometimes, we come across a for-loop that starts with a predetermined non-negative value and then it counts down instead. This is fairly common within the JDK itself, for example in the class String. Here is an example of solving the previous problem by counting down instead of up.

@Benchmark
public int forDown() {
    int sum = 0;
    for (int i = ITERATIONS; i-- > 0;) {
        sum += i;
    }
    return sum;
}

I think the rationale here is that checking how values relate to zero is potentially more efficient than testing how values relate to any other arbitrary value. In fact, all CPUs that I know about have machine code instructions that can check how a given value relates to zero. Another idea is that the count down idiom given above appears to only inspect the loop variable one time (it simultaneously checks the value and then decreases it) as opposed to the regular example at the top. I suspect that this has little or no influence on today’s efficient JIT compiler who will be able to optimize the first iteration just as good as the second. It might have an impact when the code runs in interpretation mode but this is not examined in this article.

Another way of doing the same thing using an IntStream looks like this:

@Benchmark
public int stream() {
    return IntStream.range(0, ITERATIONS)
        .sum();
}

If more performance is needed for large iterations, it is relatively easy to make the stream parallel by just adding a .parallel() operator to the stream. This is not examined in this article.

Performance under Graal VM

Running these tests under GraalVM (rc-11, with the new C2 compiler that ships with GraallVM) on my laptop (MacBook Pro mid 2015, 2.2 GHz Intel Core i7) gives the following:

Benchmark              Mode  Cnt       Score       Error  Units
ForBenchmark.forDown  thrpt    5  311419.166 ±  4201.724  ops/s
ForBenchmark.forUp    thrpt    5  309598.916 ± 12998.579  ops/s
ForBenchmark.stream   thrpt    5  312360.089 ±  8291.792  ops/s

It might come as a surprise for some that the stream solution is the fastest one, albeit by a margin that is well within error margins.

In a previous article, I presented some code metric advantages with streams and Declarative programming compared to traditional Imperative code. I have not tested performance for cold code sections (i.e. before the JIT kicks in).

Clever Math

From math, we recall that the sum of consecutive numbers starting at zero is N*(N+1)/2 where N is the highest number in the series. Running this benchmark:

@Benchmark
public int math() {
    return ITERATIONS * (ITERATIONS + 1) / 2;
}

gives us a performance increase of over 1,000 times over the previous implementations:

Benchmark           Mode  Cnt          Score          Error  Units
ForBenchmark.math  thrpt    5  395561077.984 ± 11138012.141  ops/s

The more iterations, the more gain. Cleverness sometimes trumps brute force.

Ultra-fast Data Streams

With Speedment HyperStream, it is possible to get similar performance with data from databases. Read more here on HyperStream.

Conclusions

On some commonly used hardware/JVMs, it does not matter if we iterate upwards or downwards in our for-loops. More modern JVMs are able to optimize stream iterations so they have equivalent or even better performance than for-loops.

Stream code is generally more readable compared to for-loops in my opinion and so, I believe streams are likely to be the de facto iteration contrivance in some future.

Database content can be streamed with high performance using Speedment HyperStream.

Thursday, August 15, 2019

Java: An Optional Implementation of Optional

The class java.util.Optional is implemented as a single immutable concrete class that internally handles two cases; one with an element and one without. Wouldn't it have been a better choice to let Optional be an interface and have two different implementations implement that interface instead? After all, that is what we generally are taught to do in an object-oriented language.

In this article, we will learn about some of the potential arguments for the current Optional implementation. We will also learn why Streams are implemented in a different way, enabling Streams to be obtained from files or even database tables.

The Real Optional Implementation

The real java.util.Optional::get is implemented as shown hereunder:
public T get() {
        if (value == null) {
            throw new NoSuchElementException("No value present");
        }
        return value;
    }
As can be seen, there are two code paths; one where the value is null (no element and an exception is thrown) and one when the value is something else (the value is returned).

An Optional Optional Implementation

Let’s pretend that we would go back in a time machine and were tasked to implement Optional once again. I think it is likely that many of us would come up with an initial solution much like the one below (I have named the hypothetical interface Option so we can distinguish it from the “real” one) with two distinct implementations (here EmptyOption and PresentOption):

public interface Option<T> {

    T get();

    boolean isPresent();

    public <U> Option<U> map(Function<? super T, ? extends U> mapper);

    static <T> Option<T> empty() { return (Option<T>) EmptyOption.EMPTY; }

    static <T> Option<T> of(T value) { return new PresentOption<>(value); }

    static <T> Option<T> ofNullable(T value) {
        return value == null ? empty() : of(value);
    }

}

final class EmptyOption<T> implements Option<T> {

    static final EmptyOption<?> EMPTY = new EmptyOption<>();

    private EmptyOption() {}

    @Override public T get() { throw new NoSuchElementException(); }

    @Override public boolean isPresent() { return false; }

    @Override
    public <U> Option<U> map(Function<? super T, ? extends U> mapper) {
        requireNonNull(mapper);
        return (Option<U>) EMPTY;
    }
}

final class PresentOption<T> implements Option<T> {

    private final T value;

    PresentOption(T value) { this.value = requireNonNull(value); }

    @Override public T get() { return value; }

    @Override
    public boolean isPresent() { return true; }

    @Override
    public <U> Option<U> map(Function<? super T, ? extends U> mapper) {
        requireNonNull(mapper);
        return Option.ofNullable(mapper.apply(value));
    }
}

Only a few methods are shown for brevity but the principle remains the same: distinct implementations for the case where an element is present and when it is not. This gives a much clearer code and also opens up the possibility for anyone to implement optionals.

Analysis

I am confident that this type of solution was evaluated by the JDK team at the time Optional was conceived and I think it was a well-informed decision not to opt for this solution. Optional was primarily intended to “wrap” return values to protect from NPEs and other drawbacks of returning raw null values. I also think the design goal was that there should be little to negligible performance impact using Optional.

In the following, I speculate in some of the arguments to elect the present Optional implementation over the one coined above.

Profile Pollution

The JIT compiler compiles the Java byte code on-demand to improve performance over interpreting the byte code.

In order to do this efficiently, the JIT compiler is able to gather statistics for every known method. Each method can have a MethodData object that contains metrics on how the method is used and such an object is created once the JVM thinks the method is “warm” enough (i.e. has been called sufficiently in some sense).

The process of creating and maintaining MethodData is called “profiling”.

“Profile Pollution” occurs when the method is used substantially different between calls, including, but not limited to, providing alternating non-null/null elements and calling different polymorph methods (e.g. a parameter is generic of type T and the called method invokes T::equals). A cornerstone feature of Java is its ability to invoke methods dynamically. Thus, when Option::get is invoked, either EmptyOption::get or PresentOption::get is ultimately invoked depending on which implementation is present at the time of invocation.

Once the method has been invoked some 10,000 times, the JIT compiler is using the MethodData to create an efficient compiled code snippet that executes in the best way given the statistics gathered so far.

So, if elements are present all the time (using PresentOption) and the code is compiled with that in mind, but then there is an EmptyOption suddenly appearing, the code must “back out” and take a much slower code path.

With Optional in just one final class, there can never be any other implementation of the Optional methods and thus, no profile pollution due to different implementations. The JIT can make a deterministic and reasonably fast compiled code determination.

But wait, wouldn’t it be possible for the JVM to check all classes at startup and determine that there were, in fact, just two implementing classes of the Option and then it could figure the whole thing out? Well, no. We are free to add classes at any time so there would be no way of safely enumerating all possible implementations of a particular interface. At least not until we have real sealed classes in Java.

API pollution

If people were free to write custom implementations of Optional, then these implementations would most likely be suffering from design flaws/deviations compared to the built-in Optional. Also, people would likely let their own types implement the interface Optional adding to the burden of the JIT compiler/profiler and will thus tempt people to use composite types (e.g. Foo implements Bar, Optional<Bazz>) which was not intended.

Also, Optional is now an integral part of Java and as such, it can be made to efficiently evolve with the JDK itself including, perhaps, inline classes and other new upcoming Java features.

Optional vs. Streams

As opposed to Optional, java.util.stream.Stream and the specialized versions, like IntStream, are indeed interfaces. Why is not Stream a concrete single final class just like Optional?

Well, Streams have a completely different set of requirements. Streams can be obtained from a Collection or an array but there are far more powerful ways of obtaining a Stream. It is possible to acquire a Stream from a file, a socket, a random generator and even from tables in a database. These features would not be possible to implement if Stream was sealed.

Speedment Stream is an example of a library that allows standard Java Streams to be obtained from virtually any database. Read more about Speedment Stream here.

Conclusion

Optional is sealed and there are good reasons why. The internal implementation of Optional is less clear but that is a price worth paying with the benefits of better performance and clearer user code.

Streams are non-sealed interfaces that can be implemented by anyone and can be used to obtain elements from various sources including files and database tables. Speedment Stream ORM can be used to get Streams from database tables.

Download Speedment Stream here.

Monday, August 5, 2019

Java: Benefit from Inline Class Properties Starting from Java 8

In some years, Java will hopefully have an “inline class" feature which solves many challenges with the current state of Java. Read this article and learn how to use Java 8 and upwards today, and still benefit from some of the advantages of the coming inline object arrays such as; no indirect pointers, eliminated object header overhead and improved data locality.

In this article, we will learn how we can write a short class named InlineArray that supports many of the future inline class features. We will also take a look at Speedment HyperStream, an existing Java tool that is using similar means of operation.

Background

Since 1995, when it made perfect sense, an array of Objects in Java consists of an array which in turn holds a number of references to other objects which are ultimately spread out on the heap.

Here is how an array with two initial Point objects is laid out on the heap in Java today:
 Array
+======+
|Header|
+------+      Point 0
|ref 0 |---> +======+
+------+     |Header|       Point 1
|ref 1 |---- +------+ ---> +======+
+------+     |x     |      |Header|
|null  |     +------+      +------+
+------+     |y     |      |x     |
|null  |     +------+      +------+
+------+                   |y     |
|...   |                   +------+
+------+

However, over time, the execution pipeline of a typical CPUs has evolved tremendously with an incredible computation performance increase. On the other hand, the speed of light has remained constant and so, the latency of loading data from main memory has unfortunately remained within the same order of magnitude. The balance between computing and retrieving has skewed over in favor of computing.

Accessing main memory these days becomes a thing we want to avoid, much like we wanted to avoid loading data from spinning disks back in the days.

Evidently, the current Object array layout implies several drawbacks such as:

  • Double memory access (due to the indirect reference pointers in the array)
  • Reduced locality of data (because array objects are laid out on different places on the heap)
  • Increased memory footprint (because all the objects referenced in the array are Objects and therefore holds additional Class and synchronization information).


Inline Classes

Within the Java community, there is now a major effort going on to introduce “inline classes” (previously known as “value classes”). The current state of this effort (per July 2019) was presented by Brian Goetz in this video titled “Project Valhalla Update (2019 edition)”. No one knows when this feature will be available in an official Java release. My personal guess is sometime after 2021.

Here is how an array of inline Point objects would be laid out, once this feature becomes available:

 Array
+======+
|Header|
+------+
|x     |
+------+
|y     |
+------+
|x     |
+------+
|y     |
+------+
|...   |
+------+

As can be seen, this scheme consumes less memory (no Point headers), improves locality (data is sequentially laid out in memory) and data can be accessed directly without following indirect reference pointers. On the flip side, we lose the concept of object identity which will be discussed later in this article.

Emulating Some Inline Class Properties

In the following, we will implement an emulation of some of the properties of inline classes. Is should be noted that all examples below can be run on standard Java 8 and upwards already now.

Assume that we have an interface Point with X and Y getters as described here:
public interface Point { int x(); int y(); }
We could then trivially create an immutable implementation of the Point interface as shown hereunder:

public final class VanillaPoint implements Point {

    private final int x, y;

    public VanillaPoint(int x, int y) {
        this.x = x;
        this.y = y;
    }

    @Override public int x() { return x; }

    @Override public int y() { return y; }

    // toString(), equals() and hashCode() not shown for brevity

}
Further, assume that we are willing to give up the Object/identity properties of Point objects in arrays. This means, among other things, that we cannot synchronize or perform identity operations (such as == and System::identityHashCode)

The idea here is to create a memory region that we can work with directly at byte level and flatten out our objects there. This memory region could be encapsulated in a generic class called InlineArray<T> like this:

public final class InlineArray<T> {

    private final ByteBuffer memoryRegion;
    private final int elementSize;
    private final int length;
    private final BiConsumer<ByteBuffer, T> deconstructor;
    private final Function<ByteBuffer,T> constructor;
    private final BitSet presentFlags;

    public InlineArray(
        int elementSize,
        int length,
        BiConsumer<ByteBuffer, T> deconstructor,
        Function<ByteBuffer,T> constructor
    ) {
        this.elementSize = elementSize;
        this.length = length;
        this.deconstructor = requireNonNull(deconstructor);
        this.constructor = requireNonNull(constructor);
        this.memoryRegion = ByteBuffer.allocateDirect(elementSize * length);
        this.presentFlags = new BitSet(length);
    }

    public void put(int index, T value) {
        assertIndexBounds(index);
        if (value == null) {
            presentFlags.clear(index);
        } else {
            position(index);
            deconstructor.accept(memoryRegion, value);
            presentFlags.set(index);
        }
    }

    public T get(int index) {
        assertIndexBounds(index);
        if (!presentFlags.get(index)) {
            return null;
        }
        position(index);
        return constructor.apply(memoryRegion);
    }

    public int length() {
        return length;
    }

    private void assertIndexBounds(int index) {
        if (index < 0 || index >= length) {
            throw new IndexOutOfBoundsException("Index [0, " + length + "), was:" + index);
        }
    }

    private void position(int index) {
        memoryRegion.position(index * elementSize);
    }

}

Note that this class can handle any type of element (of type T) than can be deconstructed (serialized) to bytes provided that it has a maximum element size. The class is most efficient if all the elements have the same element size as Point does (i.e. always Integer.BYTES * 2 = 8 bytes). Further note that the class is not thread-safe, but that this can be added at the expense of introducing a memory barrier and, depending on solution, use separate views of the ByteBuffer.

Now, suppose we want to allocate an array of 10 000 Points. Armed with the new InlineArray class we can proceed like this:

public class Main {

    public static void main(String[] args) {

        InlineArray<Point> pointArray = new InlineArray<>(
            Integer.BYTES * 2, // The max element size
            10_000,
            (bb, p) -> {bb.putInt(p.x()); bb.putInt(p.y());},
            bb -> new VanillaPoint(bb.getInt(), bb.getInt())
        );

        Point p0 = new VanillaPoint(0, 0);
        Point p1 = new VanillaPoint(1, 1);

        pointArray.put(0, p0); // Store p0 at index 0
        pointArray.put(1, p1); // Store p1 at index 1

        System.out.println(pointArray.get(0)); // Should produce (0, 0)
        System.out.println(pointArray.get(1)); // Should produce (1, 1)
        System.out.println(pointArray.get(2)); // Should produce null

    }

}
As expected, the code will produce the following output when run:

VanillaPoint{x=0, y=0}
VanillaPoint{x=1, y=1}
null

Note how we provide an element deconstructor and element constructor to the InlineArray telling it how it should deconstruct and construct the Point objects to and from linear memory.

Emulation Properties

The emulation above will probably not get the same performance gains as real inline classes but the savings in terms of memory allocation and locality will be about the same. The emulation above is allocating memory off-heap so your garbage collection times will not be affected by element data put in the InlineArray. The elements in the ByteBuffer are laid out just like the proposed inline class array:
 Array
+======+
|Header|
+------+
|x     |
+------+
|y     |
+------+
|x     |
+------+
|y     |
+------+
|...   |
+------+

Because we use ByteBuffer objects that are indexed with an int, the backing memory region becomes limited to 2^31 bytes. This means, for example, we can only put 2^(31-3) = 2^28 ≈ 268 million Point elements in the array (because each point occupies 2^3 = 8 bytes) before we run out of address space. Real implementations can overcome this limitation by using several ByteBuffers, Unsafe or libraries like Chronicle Bytes.

Lazy Entities

Given the InlineArray class, it is fairly easy to provide elements from the InlineArray that are lazy, in the sense that they do not have to deserialize all fields eagerly when an element is retrieved from the array. This is how it can be done:

First, we create another implementation of the Point interface that takes its data from a backing ByteBuffer itself rather than from local fields:

public final class LazyPoint implements Point {

    private final ByteBuffer byteBuffer;
    private final int position;

    public LazyPoint(ByteBuffer byteBuffer) {
        this.byteBuffer = byteBuffer;
        this.position = byteBuffer.position();
    }

    @Override
    public int x() {
        return byteBuffer.getInt(position);
    }

    @Override
    public int y() {
        return byteBuffer.getInt(position + Integer.BYTES);
    }

    // toString(), equals() and hashCode() not shown for brevity

}

Then, we just replace the deserializer pased to the constructor of the InlineArray like this:

InlineArray pointArray = new InlineArray<>(
    Integer.BYTES * 2,
    10_000,
    (bb, p) -> {bb.putInt(p.x()); bb.putInt(p.y());},
    LazyPoint::new // Use this deserializer instead
);

If used in the same main method as above, this will produce the following output:

LazyPoint{x=0, y=0}
LazyPoint{x=1, y=1}
null

Cool. This is particularly useful for entities with tens or even hundreds of fields and where just a limited subset of the fields are ever accessed for the problem at hand.

A drawback with this approach is that if just a single LazyPoint reference is retained in our application, it prevents the entire backing ByteBuffer from being garbage collected. So, any lazy entities like these are best used as short-lived objects.

Using Large Collections of Data

What if we want to use very large collections of data (e.g. in the terabytes), perhaps from a database or from files, and store them efficiently in-JVM-memory and then be able to work with these collections to improve computational performance? Can we use this type of technology?

Speedment HyperStream is a product that leverages a similar technology to be able to provide database data as standard Java Streams and has been available for some time now. HyperStream lays out data similar to above and can hold terabytes of data in a single JVM with little or no Garbage Collection impact because data is stored off-heap. It can use in-place deserialization to obtain single fields directly from the backing memory region, thereby avoiding unnecessary full deserialization of entities. Its standard Java Streams are deterministic ultra-low latency that can construct and consume streams in under 100 ns in some cases.

Here is an example of how HyperStream (which implements a standard Java Stream) can be used in an application when paging between films. The Manager films variable is provided by Speedment automatically:


private Stream<Film> getPage(int page, Comparator<Film> comparator) {
    return films.stream()
        .sorted(comparator)
        .skip(page * PAGE_SIZE)
        .limit(PAGE_SIZE)
    }
Even though there might be trillions of films, the method will typically complete in less than a microsecond as the Stream is connected directly to RAM and are using in-memory indexes.

Read more about Speedment HyperStream performance here.

Evaluate the performance in your own database applications by downloading Speedment HyperStream here.

Resources

Project Valhalla https://openjdk.java.net/projects/valhalla/
Speedment HyperStream https://www.speedment.com/hyperstream/
Speedment Initializer https://www.speedment.com/initializer/

Thursday, August 1, 2019

Why Declarative Coding Makes You a Better Programmer

Declarative solutions with functional composition provide superior code metrics over legacy imperative code in many cases. Read this article and understand how to become a better programmer using declarative code with functional composition.

In this article, we will take a closer look at three problem examples and examine two different techniques (Imperative and Declarative) for solving each of these problems.

All source code in this article is open-source and available at https://github.com/minborg/imperative-vs-declarative. In the end, we will also see how the learnings of this article can be applied in the field of database applications. We will use Speedment Stream as an ORM tool, since it provides standard Java Streams that correspond to tables, views and joins from databases and supports declarative constructs.

There is literally an infinite number of example candidates that can be used for code metrics evaluation.

Problem Examples

In this article, I have selected three common problems we developers might face over the course of our job days:

SumArray

Iterating over an array and perform a calculation

GroupingBy

Aggregating values in parallel

Rest

Implementing a REST interface with pagination

Solution Techniques

As implied at the beginning of this article, we will be solving said problems using these two coding techniques:

Imperative

An Imperative Solution in which we use traditional code styles with for-loops and explicitly mutable states.

Declarative

A Declarative Solution where we compose various functions to form a higher-order composite function that solves the problem, typically using java.util.stream.Stream or variants thereof.

Code Metrics

The idea is then to use static code analysis applied to the different solutions using SonarQube (here SonarQube Community Edition, Version 7.7) ) so that we may derive useful and standardized code metrics for the problem/solution combinations. These metrics would then be compared.

In the article, we will be using the following code metrics:

LOC

“LOC” means “Lines-Of-Code” and is the number of non-empty lines in the code.

Statements

Is the total number of statements in the code. There could be zero to many statements on each code line.

Cyclomatic Complexity

Indicates the complexity of the code and is a quantitative measure of the number of linearly independent paths through a program's source code. For example, a single “if” clause presents two separate paths through the code. Read more here on Wikipedia.

Cognitive Complexity

SonarCube claims that “Cognitive Complexity breaks from the practice of using mathematical models to assess software maintainability. It starts from the precedents set by Cyclomatic Complexity, but uses human judgment to assess how structures should be counted and to decide what should be added to the model as a whole. As a result, it yields method complexity scores which strike programmers as fairer relative assessments of maintainability than have been available with previous models.” Read more here on SonarCube’s own page.

More often than not, it is desirable to conceive a solution where these metrics are small, rather than large.

For the record, it should be noted that any solution devised below is just one way of solving any given problem. Let me know if you know a better solution and feel free to submit a pull request via https://github.com/minborg/imperative-vs-declarative.

Iterating over an Array

We start off with an easy one. The object with this problem example is to compute the sum of the elements in an int array and return the result as a long. The following interface defines the problem:

public interface SumArray {

    long sum(int[] arr);
}

Imperative Solution

The following solution implements the SumArray problem using an imperative technique:

public class SumArrayImperative implements SumArray {

    @Override
    public long sum(int[] arr) {
        long sum = 0;
        for (int i : arr) {
            sum += i;
        }
        return sum;
    }

}

Declarative Solution

Here is a solution that implements SumArray using a declarative technique:

public class SumArrayDeclarative implements SumArray {

    @Override
    public long sum(int[] arr) {
        return IntStream.of(arr)
            .mapToLong(i -> i)
            .sum();
    }
}
Note that IntStream::sum only returns an int and therefore we have to apply the intermediate operation mapToLong().

Analysis

SonarQube provides the following analysis:






The code metrics for SumArray are shown in the following table (lower is generally better):

Technique LOC Statements Cyclomatic Complexity Cognitive Complexity
Imperative 12 5 2 1
Functional 11 2 2 0

This is how it looks in a graph (lower is generally better):


Aggregating Values in Parallel

The object with this problem example is to group Person objects into different buckets, where each bucket constitutes a unique combination of the birth year of a person and the country that a person is working in. For each group, the average salary shall be computed. The aggregation shall be computed in parallel using the common ForkJoin pool.

This is how the (immutable) Person class looks like:

public final class Person {

    private final String firstName;
    private final String lastName;
    private final int birthYear;
    private final String country;
    private final double salary;

    public Person(String firstName, 
                  String lastName, 
                  int birthYear, 
                  String country, 
                  double salary) {
        this.firstName = requireNonNull(firstName);
        this.lastName = requireNonNull(lastName);
        this.birthYear = birthYear;
        this.country = requireNonNull(country);
        this.salary = salary;
    }

    public String firstName() { return firstName; }
    public String lastName() { return lastName; }
    public int birthYear() { return birthYear; }
    public String country() { return country; }
    public double salary() { return salary; }

    // equals, hashCode and toString not shown for brevity
}

We have also defined another immutable class called YearCountry that shall be used as the grouping key:

public final class YearCountry {

    private final int birthYear;
    private final String country;

    public YearCountry(Person person) {
        this.birthYear = person.birthYear();
        this.country = person.country();
    }

    public int birthYear() { return birthYear; }
    public String country() { return country; }

    // equals, hashCode and toString not shown for brevity
}
Having defined these two classes, we can now define this problem example by means of this interface:

public interface GroupingBy {

    Map<YearCountry, Double> average(Collection<Person> persons);

}

Imperative Solution

It is non-trivial to implement an imperative solution to the GroupingBy example problem. Here is one solution that solves the problem:

public class GroupingByImperative implements GroupingBy {

    @Override
    public Map<YearCountry, Double> average(Collection<Person> persons) {
        final List<Person> personList = new ArrayList<>(persons);
        final int threads = ForkJoinPool.commonPool().getParallelism();
        final int step = personList.size() / threads;

        // Divide the work into smaller work items
        final List<List<Person>> subLists = new ArrayList<>();
        for (int i = 0; i < threads - 1; i++) {
            subLists.add(personList.subList(i * step, (i + 1) * step));
        }
        subLists.add(personList.subList((threads - 1) * step, personList.size()));


        final ConcurrentMap<YearCountry, AverageAccumulator> accumulators = new ConcurrentHashMap<>();
        // Submit the work items to the common ForkJoinPool
        final List<CompletableFuture<Void>> futures = new ArrayList<>();
        for (int i = 0; i < threads; i++) {
            final List<Person> subList = subLists.get(i);
            futures.add(CompletableFuture.runAsync(() -> average(subList, accumulators)));
        }

        // Wait for completion
        for (int i = 0; i < threads; i++) {
            futures.get(i).join();
        }

        // Construct the result
        final Map<YearCountry, Double> result = new HashMap<>();
        accumulators.forEach((k, v) -> result.put(k, v.average()));

        return result;
    }

    private void average(List<Person> subList, ConcurrentMap<YearCountry, AverageAccumulator> accumulators) {
        for (Person person : subList) {
            final YearCountry bc = new YearCountry(person);
            accumulators.computeIfAbsent(bc, unused -> new AverageAccumulator())
                .add(person.salary());
        }
    }

    private final class AverageAccumulator {
        int count;
        double sum;

        synchronized void add(double term) {
            count++;
            sum += term;
        }

        double average() {
            return sum / count;
        }
    }

}

Declarative Solution

Here is a solution that implements GroupingBy using a declarative construct:

public class GroupingByDeclarative implements GroupingBy {

    @Override
    public Map<YearCountry, Double> average(Collection<Person> persons) {
        return persons.parallelStream()
            .collect(
                groupingBy(YearCountry::new, averagingDouble(Person::salary))
            );
    }
}
In the code above, I have used some static imports from the Collectors class (e.g. Collectors::groupingBy). This does not affect the code metrics.

Analysis

SonarQube provides the following analysis:




The code metrics for GroupingBy are shown in the following table (lower is better):


Technique LOC Statements Cyclomatic Complexity Cognitive Complexity
Imperative 52 27 11 4
Functional 17 1 1 0

The corresponding graph looks like this (lower is generally better):


Implementing a REST Interface

In this exemplary problem, we are to provide a pagination service for Person objects. Persons appearing on a page must satisfy some (arbitrary) conditions and are to be sorted in a certain given order. The page shall be returned as an unmodifiable List of Person objects.

Here is an interface that captures the problem:

public interface Rest {

/**
 * Returns an unmodifiable list from the given parameters.
 *
 * @param persons as the raw input list
 * @param predicate to select which elements to include
 * @param order in which to present persons
 * @param page to show. 0 is the first page
 * @return an unmodifiable list from the given parameters
 */
 List<Person> page(List<Person> persons, 
                   Predicate<Person> predicate,
                   Comparator<Person> order,
                   int page);
}
The size of a page is given in a separate utility class called RestUtil:

public final class RestUtil {
    private RestUtil() {}

    public static final int PAGE_SIZE = 50;
}

Imperative Solution

Here is an imperative implementation of the Rest interface:

public final class RestImperative implements Rest {

    @Override
    public List<Person> page(List<Person> persons, 
                             Predicate<Person> predicate, 
                             Comparator<Person> order, 
                             int page) {
        final List<Person> list = new ArrayList<>();
        for (Person person:persons) {
            if (predicate.test(person)) {
                list.add(person);
            }
        }
        list.sort(order);
        final int from = RestUtil.PAGE_SIZE * page;
        if (list.size() <= from) {
            return Collections.emptyList();
        }
        return unmodifiableList(list.subList(from, Math.min(list.size(), from + RestUtil.PAGE_SIZE)));
    }
}

Declarative Solution

The following class implements the Rest interface in a declarative way:

public final class RestDeclarative implements Rest {

    @Override
    public List<Person> page(List<Person> persons,
                             Predicate<Person> predicate, 
                             Comparator<Person> order,
                             int page) {
        return persons.stream()
            .filter(predicate)
            .sorted(order)
            .skip(RestUtil.PAGE_SIZE * (long) page)
            .limit(RestUtil.PAGE_SIZE)
            .collect(collectingAndThen(toList(), Collections::unmodifiableList));
    }
}

Analysis

SonarQube provides the following analysis:




The following table shows the code metrics for Rest (lower is generally better):

Technique LOC Statements Cyclomatic Complexity Cognitive Complexity
Imperative 27 10 4 4
Functional 21 1 1 0

Here, the same numbers are shown in a graph (again lower is generally better):


Java 11 Improvements

The examples above were written in Java 8. With Java 11, we could shorten our declarative code using LVTI (Local Variable Type Inference). This would make our code a bit shorter but would not affect code metrics.

@Override
public List<Person> page(List<Person> persons,
                         Predicate<Person> predicate, 
                         Comparator<Person> order, 
                         int page) {
    final var list = new ArrayList<Person>();
    ...
Compared to Java 8, Java 11 contains some new collectors. For example, the Collectors.toUnmodifiableList() which would make our declarative Rest solution a bit shorter:

public final class RestDeclarative implements Rest {

@Override
public List<Person> page(List<Person> persons,
                         Predicate<Person> predicate, 
                         Comparator<Person> order, 
                         int page) {
    return persons.stream()
        .filter(predicate)
        .sorted(order)
        .skip(RestUtil.PAGE_SIZE * (long) page)
        .limit(RestUtil.PAGE_SIZE)
        .collect(toUnmodifiableList());
}
Again, this will not impact the code metrics.

Summary

Averaging the code metrics for our three exemplary problems yields the following result (lower is generally better) :



Given the input requirements in this article, there is a remarkable improvement for all code metrics when we go from imperative to declarative constructs.

Use Declarative Constructs in Database Applications

In order to reap the benefits of declarative constructs in database applications, we have used Speedment Stream. Speedment Stream is a Stream-based Java ORM tool that can turn any database table/view/join into Java streams and thereby allows you to apply your declarative skills in database applications.

Your database applications code will get much better. In fact, a pagination REST solution with Speedment and Spring Boot against a database might be expressed like this:

public Stream<Person> page(Predicate<Person> predicate, 
                           Comparator<Person> order, 
                           int page) {
    return persons.stream()
        .filter(predicate)
        .sorted(order)
        .skip(RestUtil.PAGE_SIZE * (long) page)
        .limit(RestUtil.PAGE_SIZE);
}

Where the Manager<Person> persons is provided by Speedment and constitutes a handle to the database table “Person” and can be @AutoWired via Spring.

Conclusions

Choosing declarative over imperative solutions can reduce general code complexity massively and can provide many benefits including faster coding, better code quality, improved readability, less testing, reduced maintenance costs and more.

In order to benefit from declarative constructs within database applications, Speedment Stream is a tool that can provide standard Java Streams directly from the database.

Mastering declarative constructs and functional composition is a must for any contemporary Java developer these days.

Resources

Article Source Code: https://github.com/minborg/imperative-vs-declarative
SonarQube: https://www.sonarqube.org/
Speedment Stream: https://speedment.com/stream/
Speedment Initializer: https://www.speedment.com/initializer/

Wednesday, July 31, 2019

Java: ChronicleMap Part 3, Fast Microservices

Standard Java Maps needs to be initialized upon startup. Learn how to leverage ChronicleMaps that is initializable from a file and reduce microservice startup times significantly and how to share Maps between JVMs.

The built-in Map implementations, such as HashMap and ConcurrentHashMap are fast but they must be initialized with mappings before they can be used for looking up values. Also, they are limited in size by practical means such as heap and RAM size. Lastly, they are local to the JVM it runs in.

The initialization process can slow down critical startup for microservices, especially when reading mappings from a remote REST interface or a remote database. In this article, you will learn how you can start your microservice applications in seconds instead of minutes by using memory-mapped ChronicleMap instances and how Maps can be shared between JVMs in this third article in an article series about CronicleMap.

Read more about the fundamentals of CronicleMap in the first article.

Read more about file mapped CronicleMap objects in the second article.

Creating a Shared Map

As described in the second article in the series, we can easily create a file mapped Map like this:
private static Map<Long, Point> createFileMapped() {
    try {
        return ChronicleMap
            .of(Long.class, Point.class)
            .averageValueSize(8)
            .valueMarshaller(PointSerializer.getInstance())
            .entries(10_000_000)
            .createPersistedTo(new File("my-map"));

    } catch (IOException ioe) {
        throw new RuntimeException(ioe);
    }
}
Created Map objects can now be accessed by any JVM that has access to the “my-map” file. Updates to the maps will be shared among the participating JVMs via the shared file.

Initializing the Map

As also shown in the second article, we could create and initialize a Map like this:

final Map<Long, Point> m3 = LongStream.range(0, 10_000_000)
    .boxed()
        .collect(
            toMap(
                Function.identity(),
                FillMaps::pointFrom,
                (u, v) -> {
                    throw new IllegalStateException();
                },
                FillMaps::createFileMapped
            )
        );
When running on my laptop (MacBook Pro mid 2015, 16 GB, 2.2 GHz Intel Core i7), it takes about 10 seconds to create and fill the Map with 10 million entries.

If the Map contents were retrieved externally (as opposed to being created locally by the pointFrom() method), it would likely take much longer time to fill the Map. For example, if we get 50 Mbit/s REST throughput and each JSON Point representation consumes 25 bytes, then it would take some 60 seconds to fill the Map.

Starting a new JVM

Now that there is a pre-existing mapped file, we can start directly off this file as shown in this snippet:

return ChronicleMap
    .of(Long.class, Point.class)
    .averageValueSize(8)
    .valueMarshaller(PointSerializer.getInstance())
    .entries(10_000_000)
    .createOrRecoverPersistedTo(new File("my-map"));
This will create a Map directly from the existing “my-map” file.

Running this on my laptop will yield a start time of 5 seconds. This could be compared to the 60 second REST example, yielding a 90% startup time reduction.

Running Several JVMs on the Same Node

We could elect to run several JVMs on the same physical server node. By doing so, we benefit from the OS’es ability to make mappings of the file available for each JVM by exposing shared memory. This constitutes an efficient and low latency means of communication between the JVMs. The fact that there is a common pool of mapped memory makes the memory management much more efficient compared to a situation where each and every JVM/OS would have to maintain its own separate mappings.

Summary

ChronicleMaps can be shared between participating JVM via shared files
Startup times can be reduced significantly using shared files
If JVMs are running on the same physical machine, performance and efficiency is further improved
Shared files via ChronicleMap provides a low latency means of communication between JVMs

Tuesday, July 30, 2019

Java: ChronicleMap Part 2, Super RAM Maps

The standard Java Maps, such as the ubiquitous HashMap, are ultimately limited by the available RAM. Read this article and learn how you can create Java Maps with virtually unlimited sizes even exceeding the target machine’s RAM size.

The built-in Map implementations, such as HashMap and ConcurrentHashMap work fine as long as they are relatively small. In all cases, they are limited by the available heap and therefore eventually the available RAM size. ChronicleMap can store its contents in files, thereby circumventing this limitation, opening up for terabyte-sized mappings as shown in this second article in an article series about CronicleMap.

Read more about the fundamentals of CronicleMap in my previous first article.

File Mapping

Mapping of a file is made by invoking the createPersistedTo() method on a ChronicleMap builder as shown in the method below:
private static Map<Long, Point> createFileMapped() {
   try {
        return ChronicleMap
            .of(Long.class, Point.class)
            .averageValueSize(8)
            .valueMarshaller(PointSerializer.getInstance())
            .entries(10_000_000)
            .createPersistedTo(new File("my-map"));

    } catch (IOException ioe) {
        throw new RuntimeException(ioe);
    }
}

This will create a Map that will layout its content in a memory-mapped file named “my-map” rather than in direct memory. The following example shows how we can create 10 million Point objects and store them all in a file mapped map:

final Map<Long, Point> m3 = LongStream.range(0, 10_000_000)
    .boxed()
    .collect(
        toMap(
            Function.identity(),
            FillMaps::pointFrom,
            (u, v) -> {
                throw new IllegalStateException();
           },
           FillMaps::createFileMapped
       )
   );
The following command shows the newly created file:

Pers-MacBook-Pro:target pemi$ ls -lart my-map 
-rw-r--r--  1 pemi  staff  330305536 Jul 10 16:56 my-map
As can be seen, the file is about 33 MB and thus, each entry occupies 33 bytes on average.

Persistence

When the JVM terminates, the mapped file is still there, making it easy to pick up a previously created map including its content. This works much like a rudimentary superfast database. Here is how we can start off from an existing file:

return ChronicleMap
    .of(Long.class, Point.class)
    .averageValueSize(8)
    .valueMarshaller(PointSerializer.getInstance())
    .entries(10_000_000)
    .createOrRecoverPersistedTo(new File("my-map"));

The Map will be available directly, including its previous content.

Java Map Exceeding RAM Limit

One interesting aspect of memory-mapped files is that they can exceed both the heap and RAM limits. The file mapping logic will make sure that the parts being currently used are loaded into RAM on demand. The mapping logic will also retain recent portions of accessed mapped memory in physical memory to improve performance. This occurs behind-the-scenes and need not be managed by the application itself.

My desktop computer is an older MacBook Pro with only 16GB of memory (Yes, I know that sucks). Nevertheless, I can allocate a Map with 1 billion entries potentially occupying 33 * 1,000,000,000 = 33 GB memory (We remember from above that each entry occupied 33 bytes on average). The code looks like this:

return ChronicleMap
    .of(Long.class, Point.class)
    .averageValueSize(8)
    .valueMarshaller(PointSerializer.getInstance())
    .entries(1_000_000_000)
    .createPersistedTo(new File("huge-map"));

Even though I try to create a Java Map with 2x my RAM size, the code runs flawlessly and I get this file:

Pers-MacBook-Pro:target pemi$ ls -lart | grep huge-map 
-rw-r--r--   1 pemi  staff  34573651968 Jul 10 18:52 huge-map

Needless to say, you should make sure that the file you are mapping to is located on a file system with high random access performance. For example, a filesystem located on a local SSD.

Summary

ChronicleMap can be mapped to an external file
The mapped file is retained when the JVM exits
New applications can pick up an existing mapped file
ChronicleMap can hold more data than there is RAM
Mapped files are best placed on file systems with high random access performance

Friday, July 26, 2019

Java: ChronicleMap Part 1, Go Off-Heap

Filling up a HashMap with millions of objects will quickly lead to problems such as inefficient memory usage, low performance and garbage collection problems. Learn how to use off-heap CronicleMap that can contain billions of objects with little or no heap impact.

The built-in Map implementations, such as HashMap and ConcurrentHashMap are excellent tools when we want to work with small to medium-sized data sets. However, as the amount of data grows, these Map implementations are deteriorating and start to exhibit a number of unpleasant drawbacks as shown in this first article in an article series about open-sourceed  CronicleMap.

Heap Allocation

In the examples below, we will use Point objects. Point is a POJO with a public default constructor and getters and setters for X and Y properties (int). The following snippet adds a million Point objects to a HashMap:

final Map<Long, Point> m = LongStream.range(0, 1_000_000)
    .boxed()
    .collect(
        toMap(
            Function.identity(),
            FillMaps::pointFrom,
            (u,v) -> { throw new IllegalStateException(); },
             HashMap::new
        )
    );

    // Conveniency method that creates a Point from
    // a long by applying modulo prime number operations
    private static Point pointFrom(long seed) {
        final Point point = new Point();
        point.setX((int) seed % 4517);
        point.setY((int) seed % 5011);
        return point;
    }

We can easily see the number of objects allocated on the heap and how much heap memory these objects consume:

Pers-MacBook-Pro:chronicle-test pemi$ jmap -histo 34366 | head
 num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:       1002429       32077728  java.util.HashMap$Node (java.base@10)
   2:       1000128       24003072  java.lang.Long (java.base@10)
   3:       1000000       24000000  com.speedment.chronicle.test.map.Point
   4:           454        8434256  [Ljava.util.HashMap$Node; (java.base@10)
   5:          3427         870104  [B (java.base@10)
   6:           185         746312  [I (java.base@10)
   7:           839         102696  java.lang.Class (java.base@10)
   8:          1164          89088  [Ljava.lang.Object; (java.base@10)
For each Map entry, a Long, a HashMap$Node and a Point object need to be created on the heap. There are also a number of arrays with HashMap$Node objects created. In total, these objects and arrays consume 88,515,056 bytes of heap memory. Thus, each entry consumes on average 88.5 bytes.

NB: The extra 2429 HashMap$Node objects come from other HashMap objects used internally by Java.

Off-Heap Allocation

Contrary to this, a CronicleMap uses very little heap memory as can be observed when running the following code:

final Map<Long, Point> m2 = LongStream.range(0, 1_000_000)
    .boxed()
    .collect(
        toMap(
            Function.identity(),
            FillMaps::pointFrom,
            (u,v) -> { throw new IllegalStateException(); },
            () -> ChronicleMap
                .of(Long.class, Point.class)
                .averageValueSize(8)
                .valueMarshaller(PointSerializer.getInstance())
                .entries(1_000_000)
                .create()
        )
    );
Pers-MacBook-Pro:chronicle-test pemi$ jmap -histo 34413 | head
 num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:          6537        1017768  [B (java.base@10)
   2:           448         563936  [I (java.base@10)
   3:          1899         227480  java.lang.Class (java.base@10)
   4:          6294         151056  java.lang.String (java.base@10)
   5:          2456         145992  [Ljava.lang.Object; (java.base@10)
   6:          3351         107232  java.util.concurrent.ConcurrentHashMap$Node (java.base@10)
   7:          2537          81184  java.util.HashMap$Node (java.base@10)
   8:           512          49360  [Ljava.util.HashMap$Node; (java.base@10)
As can be seen, there are no Java heap objects allocated for the CronicleMap entries and consequently no heap memory either.

Instead of allocating heap memory, CronicleMap allocates its memory off-heap. Provided that we start our JVM with the flag -XX:NativeMemoryTracking=summary, we can retrieve the amount off-heap memory being used by issuing the following command:

Pers-MacBook-Pro:chronicle-test pemi$ jcmd 34413 VM.native_memory | grep Internal
-                  Internal (reserved=30229KB, committed=30229KB)
Apparently, our one million objects were laid out in off-heap memory using a little more than 30 MB of off-heap RAM. This means that each entry in the CronicleMap used above needs on average 30 bytes.

This is much more memory effective than a HashMap that required 88.5 bytes. In fact, we saved 66% of RAM memory and almost 100% of heap memory. The latter is important because the Java Garbage Collector only sees objects that are on the heap.

Note that we have to decide upon creation how many entries the CronicleMap can hold at maximum. This is different compared to HashMap which can grow dynamically as we add new associations. We also have to provide a serializer (i.e. PointSerializer.getInstance()), which will be discussed in detail later in this article.

Garbage Collection

Many Garbage Collection (GC) algorithms complete in a time that is proportional to the square of objects that exist on the heap. So if we, for example, double the number of objects on the heap, we can expect the GC would take four times longer to complete.

If we, on the other hand, create 64 times more objects, we can expect to suffer an agonizing 1,024 fold increase in expected GC time. This effectively prevents us from ever being able to create really large HashMap objects.

With ChronicleMap we could just put new associations without any concern of garbage collection times.

Serializer

The mediator between heap and off-heap memory is often called a serializer. ChronicleMap comes with a number of pre-configured serializers for most built-in Java types such as Integer, Long, String and many more.

In the example above, we used a custom serializer that was used to convert a Point back and forth between heap and off-heap memory. The serializer class looks like this:

public final class PointSerializer implements
    SizedReader<Point>,
    SizedWriter<Point> {

    private static PointSerializer INSTANCE = new PointSerializer();

    public static PointSerializer getInstance() { return INSTANCE; }

    private PointSerializer() {}

    @Override
    public long size(@NotNull Point toWrite) {
        return Integer.BYTES * 2;
    }

    @Override
    public void write(Bytes out, long size, @NotNull Point point) {
        out.writeInt(point.getX());
        out.writeInt(point.getY());
    }

    @NotNull
    @Override
    public Point read(Bytes in, long size, @Nullable Point using) {
        if (using == null) {
            using = new Point();
        }
        using.setX(in.readInt());
        using.setY(in.readInt());
        return using;
    }

}
The serializer above is implemented as a stateless singleton and the actual serialization in the methods write() and read() are fairly straight forward. The only tricky part is that we need to have a null check in the read() method if the “using” variable does not reference an instantiated/reused object.

How to Install it?

When we want to use ChronicleMap in our project, we just add the following Maven dependency in our pom.xml file and we have access to the library.

<dependency>
    <groupId>net.openhft</groupId>
    <artifactId>chronicle-map</artifactId>
    <version>3.17.3</version>
</dependency>
If you are using another build tool, for example, Gradle, you can see how to depend on ChronicleMap by clicking this link.

The Short Story

Here are some properties of ChronicleMap:

Stores data off-heap
Is almost always more memory efficient than a HashMap
Implements ConcurrentMap
Does not affect garbage collection times
Sometimes needs a serializer
Has a fixed max entry size
Can hold billions of associations
Is free and open-source