Minborg

Minborg
Minborg

Wednesday, December 19, 2018

Who’s Been Naughty, Who’s Been Nice? Santa Gives You Java 11 Advice!

Who’s Been Naughty, Who’s Been Nice? Santa Gives You Java 11 Advice!


Ever wondered how Santa can deliver holiday gifts to all kids around the world? There are 2 billion
kids, each with an individual wishlist, and he does it in 24 hours. This means 43 microseconds per kid on average and he needs to check whether every child has been naughty or nice.

You do not need to wonder anymore. I will reveal the secret. He is using Java 11 and a modern stream ORM with superfast execution.

Even though Santa’s backing database is old and slow, he can analyze data in microseconds by using standard Java streams and in-JVM-memory technology. Santa’s database contains two tables; Child which holds every child in the world, and HolidayGift that specifies all the items available for production in Santa’s workshop. A child can only have one wish, such are the hash rules.

Viewing the Database as Streams

Speedment is a modern stream based ORM which is able to view relational database tables as standard Java streams. As we all know, only nice children get gifts, so it is important to distinguish between those who’s been naughty and those who’s been nice. This is easily accomplished with the following code:
var niceChildren = children.stream()
        .filter(Child.NICE.isTrue())
        .sorted(Child.COUNTRY.comparator()) 
        .collect(Collectors.toList());

This stream will yield a long list containing only the kids that have been nice. To enable Santa to optimize his delivery route, the list is sorted by country of residence.

Joining Child and HolidayGift

This list seems incomplete though. How does Santa keep track of which gift goes to whom? Now the HolidayGift table will come in handy. Since some children provided Santa with their wish list, we can now join the two tables together to make a complete list containing all the nice children and their corresponding gift. It is important to include the children without any wish (they will get a random gift), therefore we make a left join.
var join = joinComponent
    .from(ChildManager.IDENTIFIER)
        .where(Child.NICE.isTrue())
    .leftJoinOn(HolidayGift.GIFT_ID).equal(Child.GIFT_ID)
    .build(Tuples::of);

Speedment is using a builder pattern to create a Join<T> object which can then be reused over and over again to create streams with elements of type T. In this case, it is used to join Child and HolidayGift. The join only includes children that are nice and matches rows which contain the same value in the gift_id fields.

This is how Santa deliver all packages:
join.stream()
    .parallel() 
    .forEach(SleighUtil::deliver);
As can be seen, Santa can easily deliver all the packages with parallel sleighs, carried by reindeers.

This will render the stream to an efficient SQL query but unfortunately, it is not quick enough to make it in time.

Using In-JVM-Memory Acceleration

Now to the fun part. Santa is activating the in-JVM-memory acceleration component in Speedment, called DataStore. This is done in the following way:
var santasWorkshop = new ApplicationBuilder()
    .withPassword("north-pole")
    // Activate DataStore
    .withBundle(DataStoreBundle.class)
    .build();

    // Load a snapshot of the database into off-heap memory
    santasWorkshop.get(DataStoreComponent.class)
        .ifPresent(DataStoreComponent::load);
This startup configuration is the only needed adjustment to the application. All stream constructs above remain the same. When the application is started, a snapshot of the database is pulled into the JVM and is stored off-heap. Because the data is stored off-heap, it will not influence garbage collection and the amount of data is only limited by available RAM. Nothing prevents Santa from loading terabytes of data since he is using a cloud service and can easily expand his RAM. Now the application will run order of magnitudes faster and Santa will be able to deliver all packages in time.

Run Your Own Projects with In-JVM-Memory Acceleration

If you want to try for yourself how fast a database application can be, there is an Initializer that can be found here. Just tick in your desired database type (Oracle, MySQL, MariaDB, PostgreSQL, Microsoft SQL Server, DB2 or AS400) and you will get a POM and an application template automatically generated for you.

If you need more help setting up your project, check out the Speedment GitHub page or explore the user guide.

Authors

Thank you, Julia Gustafsson and Carina Dreifeldt for co-writing this article.

Tuesday, December 18, 2018

Java: Aggregate Data Off-Heap

Java: Aggregate Data Off-Heap

Explore how to create off-heap aggregations with a minimum of garbage collect impact and
maximum memory utilization.

Creating large aggregations using Java Map, List and Object normally creates a lot of heap memory overhead. This also means that the garbage collector will have to clean up these objects once the aggregation goes out of scope.

Read this short article and discover how we can use Speedment Stream ORM to create off-heap aggregations that can utilize memory more efficiently and with little or no GC impact.

Person

Let’s say we have a large number of Person objects that take the following shape:

public class Person {
    private final int age;
    private final short height;
    private final short weight;        
    private final String gender;
    private final double salary;
    …
    // Getters and setters hidden for brievity
}

For the sake of argument, we also have access to a method called persons() that will create a new Stream with all these Person objects.

Salary per Age

We want to create the average salary for each age bucket. To represent the results of aggregations we will be using a data class called AgeSalary which associates a certain age with an average salary.

public class AgeSalary {
     private int age;
     private double avgSalary;
     … 
    // Getters and setters hidden for brievity
}

Age grouping for salaries normally entails less than 100 buckets being used and so this example is just to show the principle. The more buckets, the more sense it makes to aggregate off-heap.

Solution

Using Speedment Stream ORM, we can derive an off-heap aggregation solution with these three steps:

Create an Aggregator

var aggregator = Aggregator.builderOfType(Person.class, AgeSalary::new)
    .on(Person::age).key(AgeSalary::setAge)
    .on(Person::salary).average(AgeSalary::setAvgSalary)
    .build();

The aggregator can be reused over and over again.

Compute an Aggregation

var aggregation = persons().collect(aggregator.createCollector());

Using the aggregator, we create a standard Java stream Collector that has its internal state completely off-heap.

Use the Aggregation Result

aggregation.streamAndClose()
    .forEach(System.out::println);

Since the Aggregation holds data that is stored off-heap, it may benefit from explicit closing rather than just being cleaned up potentially much later. Closing the Aggregation can be done by calling the close() method, possibly by taking advantage of the AutoCloseable trait, or as in the example above by using streamAndClose() which returns a stream that will close the Aggregation after stream termination.

Everything in a One-Liner

The code above can be condensed to what is effective a one-liner:

persons().collect(Aggregator.builderOfType(Person.class, AgeSalary::new)
    .on(Person::age).key(AgeSalary::setAge)
    .on(Person::salary).average(AgeSalary::setAvgSalary)
    .build()
    .createCollector()
).streamAndClose()
    .forEach(System.out::println);

There is also support for parallel aggregations. Just add the stream operation Stream::parallel and aggregation is done using the ForkJoin pool.

Resources

Download Speedment here

Read more about off-heap aggregations here

Tuesday, December 4, 2018

Java 11: JOIN Tables, Get Java Streams

Java 11: JOIN Tables, Get Java Streams

Ever wondered how you could turn joined database tables into a Java Stream? Read this short article and find out how it is done using the Speedment Stream ORM. We will start with a Java 8 example and then look into the improvements with Java 11.

Java 8 and JOINs

Speedment allows dynamically JOIN:ed database tables to be consumed as standard Java Streams. We begin by looking at a solution for Java 8 using the Sakila exemplary database:

    Speedment app = ...;
    
    JoinComponent joinComponent = app.getOrThrow(JoinComponent.class);
     
    Join<Tuple2OfNullables<Language, Film>> join = joinComponent
        .from(LanguageManager.IDENTIFIER)
        .innerJoinOn(Film.LANGUAGE_ID).equal(Language.LANGUAGE_ID)
        .build();

        join.stream()
            .forEach(System.out::println);

This will produce the following output (reformatted and shortened for readability):

Tuple2OfNullablesImpl {
    LanguageImpl { languageId = 1, name = English, ... }, 
    FilmImpl { filmId = 1, title = ACADEMY DINOSAUR, ... }
}
Tuple2OfNullablesImpl {
    LanguageImpl { languageId = 1, name = English, ... }, 
    FilmImpl { filmId = 2, title = ACE GOLDFINGER, ... }
}
Tuple2OfNullablesImpl {
    LanguageImpl { languageId = 1, name = English, ... },
    FilmImpl { filmId = 3, title = ADAPTATION HOLES, ... }
}
...

Java 11 and JOINs

In the new Java version 11 there is Local-Variable-Type-Inference (aka var declaration) which makes it even easier to write joins with Speedment. We do not have to explicitly state the type of the join variable:

    Speedment app = ...;
    
    JoinComponent joinComponent = app.getOrThrow(JoinComponent.class);
     
    var join = joinComponent
        .from(LanguageManager.IDENTIFIER)
        .innerJoinOn(Film.LANGUAGE_ID).equal(Language.LANGUAGE_ID)
        .build();

        join.stream()
            .forEach(System.out::println);

Code Breakdown

The from() method takes the first table we want to use (Language). The innerJoinOn() method takes a specific column of the second table we want to join. Then, the equal() method takes a column from the first table that we want to use as our join condition. So, in this example, we will get matched Language and Film entities where the column Film.LANGUAGE_ID equal Language.LANGUAGE_ID.

Finally, build() will construct our Join object that can, in turn, be used to create Java Streams. The Join object can be re-used over and over again.

JOIN Types and Conditions

We can use innerJoinOn()leftJoinOn(), rightJoinOn() and crossJoin() and tables can be joined using the conditions equal(), notEqual(), lessThan(), lessOrEqual(), greaterThan() and lessOrEqual().

What's Next?

Download open-source Java 11 here.
Download Speedment here.
 Read all about the JOIN functionality in the Speedment User's Guide.

Monday, November 12, 2018

Shortest Code and Lowest Latency

Shortest Code and Lowest Latency

Who can write the shortest Java code with the lowest latency, and what tools are used?

At Oracle Code One two weeks ago, I promoted a code challenge during my speech. The contestants were given a specific problem and the winner would be the one with lowest possible latency multiplied with the number of code lines that was used  (i.e. having low latency and at the same time using as few lines as possible is good. I also shared the contest on social media to get as many developers as possible involved.

The input data was to be taken from the table “film” in the open-source Sakila database.

More specifically, the object was to develop a Java application that computes the sum, min, max, and average rental duration for five films out of the existing 1,000 films using a general solution. The five films should be the films around the median film length starting from the 498:th and ending with the 502:th film (inclusive) in ascending film length.

Contestants were free to use any Java library and any solution available such as Hibernate, JDBC or other ORM tools.

A Solution Based on SQL/JDBC

One way of solving the problem would be using SQL/JDBC directly as shown hereunder. Here is an example of SQL code that solves the computational part of the problem:

SELECT 
  sum(rental_duration),
  min(rental_duration),
  max(rental_duration),
  avg(rental_duration)
FROM 
  (select rental_duration from sakila.film 
  order by length 
  LIMIT 5 OFFSET 498) as A

If I run it on my laptop, I get a latency of circa 790 microseconds on the server side (standard MySQL 5.7.16). To use this solution we also need to add Java code to issue the SQL statement and to read the values back from JDBC into our Java variables. This means that the code will be even larger and will take a longer time to execute as shown hereunder:

try (Connection con = DriverManager
                       .getConnection("jdbc:mysql://somehost/sakila?"
                            + "user=sakila-user&password=sakila-password")) {

        try (Statement statement = con.createStatement()) {

            ResultSet resultSet = statement
                .executeQuery(
                    "SELECT " +
                        "  sum(rental_duration)," +
                        "  min(rental_duration)," +
                        "  max(rental_duration)," +
                        "  avg(rental_duration)" +
                        "FROM " +
                        "  (select rental_duration from sakila.film " +
                        "  order by length " +
                        "limit 5 offset 498) as A");

            if (resultSet.next()) {
                int sum = resultSet.getInt(1);
                int min = resultSet.getInt(2);
                int max = resultSet.getInt(3);
                double avg = resultSet.getDouble(4);
                // Handle the result
            } else {
                // Handle error
            }
      }
}

To give this alternative a fair chance, I reused the connection between calls in the benchmark rather than re-creating it each time (recreation is shown above but was not used in benchmarks).

Result: ~1,000 us and ~25 code lines

The Winning Contribution

However, the SQL example above was without a chance compared to the winning contribution. The winner was Sergejus Sosunovas (@SergejusS)  from Switzerland who currently works with developing an optimization and management system. He used Speedment in-JVM-memory acceleration and states: "It took less than an hour to start and build a solution." Here is the winning code:

IntSummaryStatistics result = app.getOrThrow(FilmManager.class).stream()
     .sorted(Film.LENGTH)
     .skip(498)
     .limit(5)
     .mapToInt(GeneratedFilm::getRentalDuration)
     .summaryStatistics();

This was much faster than SQL/JDBC and completed in as little as 6 microseconds.

Result: 6 us and 6 code lines

A Solution with Only Five Lines

One of the contestants, Corrado Lombard from Italy, is worth an honorary mention since he was able to solve the problem in only five lines. Unfortunately, there was a slight error in his original contribution, but when fixed, the solution looked like this:

IntSummaryStatistics result = app.getOrThrow(FilmManager.class).stream()
    .sorted(Film.LENGTH)
    .skip(498)
    .limit(5)
    .collect(summarizingInt(Film::getRentalDuration));

This fixed solution had about the same performance as the winning solution.

Optimized Speedment Solution

As a matter of fact, there is a way of improving latency even more than the winning contribution. By applying an IntFunction that is able to do in-place-deserialization (in situ) of the int value directly from RAM, improves performance even more. In-place-deserialization means that we do not have to deserialize the entire entity, but just extract the parts of it that are needed. This saves time, especially for entities with many columns. Here is how an optimized solution could look like:

IntSummaryStatistics result = app.getOrThrow(FilmManager.class).stream()
     .sorted(Film.LENGTH)
     .skip(498)
     .limit(5)
     .mapToInt(Film.RENTAL_DURATION.asInt()) // Use in-place-deserialization
     .summaryStatistics();

This was even faster and completed in a just 3 microsecond.

Result: 3 us and 6 code lines

GraalVM and Optimized Speedment Solution

GraalVM contains an improved C2 Compiler that is known to improve stream performance for many workloads. In particular, it seems like the benefits from inlining and escape-analysis are much better under Graal than under the normal OpenJDK.

I was curious to see how the optimized solution above could benefit from GraalVM. With no code change, I run it under GraalVM (1.0.0-rc9) and now latency was down to just 1 microsecond! This means that we could perform 1,000,000 such queries per second per thread on a laptop and presumably much more on a server grade computer.

Result: 1 us and 6 code lines

Overview

When we compare SQL/JDBC latency against the best Speedment solution, the speedup factor with Speedment was about 1,000 times. This is the same difference as comparing walking to work or taking the worlds fastest manned jet plane (SR-71 "Blackbird"). A huge improvement if you want to streamline your code.

To be able to plot the solutions in a diagram, I have removed the comparatively slow SQL/JDBC solution, and only showed how the different Speedment solutions and runtimes measure up:


Try it Out

The competition is over. However, feel free to challenge the solutions above or try them out for yourself. Full details of the competition and rules can be found here. If you can beat the fastest solution, let me know in the comments below.

Download Sakila database.

Download Speedment

Benchmark Notes

The benchmark results presented above were obtained when running on my MacBook Pro Mid 2015, 2.2 GHz Intel Core i7, 16 GB 1600 MHz DDR3, Java 8, JMH (1.21) and Speedment (3.1.8)

Conclusion

It is possible to reduce latencies by orders of magnitude and reduce code size at the same time using in-JVM-memory technology and Java streams.

GraalVM can improve your stream performance significantly under many conditions.

Monday, October 8, 2018

Java: Gain Performance Using SingletonStream

Java streams with just one element sometimes create unnecessary overhead in your applications. Learn how to use SingletonStream objects and gain over tenfold performance for some of these kinds of streams and learn how, at the same time, you can simplify your code.

Background

The Stream library in Java 8 is one of the most powerful additions to the Java language ever. Once you start to understand its versatility and resulting code readability, your Java code-style will change forever. Instead of bloating your code with all the nitty and gritty details with for, if and switch statements and numerous intermediate variables, you can use a Stream that just contains a description of what to do, and not really how it is done.

Some years ago, we had to make an API decision for a Java project: Which return type should we select for the two fast local in-memory data cache methods with;

  • a unique search key which returns either a value or no value
  • a non-unique search key which returns any number of values (zero to infinity). 


This was the initial idea:

Optional<T> searchUnique(K key); // For unique keys
Stream<T> search(K key);         // For non-unique keys

But, we would rather have the two methods look exactly the same and both return a Stream<T>. The API would then look much cleaner because a unique cache would then look exactly the same as a non-unique cache.

However, the unique search had to be very efficient and able to create millions of result objects each second without creating too much overhead.

The Solution

By implementing a SingletonStream that only takes a single element (and therefore can be highly optimized compared to a normal Stream with any number of elements), we were able to let both methods return a Stream while retaining performance. The method searchUnique(K key) would return an empty stream (Stream.empty()) if the key was not found, and it would return a SingletonStream with the value associated with the key if the key existed. We would get:

Stream<T> searchUnique(K key); // For unique keys
Stream<T> search(K key);       // For non-unique keys

Great! We can eat the cookie and still have it!

The Implementation

The SingletonStream is a part of the Speedment Stream ORM and can be viewed here on GitHub.  Feel free to use Speedment and any of it's component in your own projects using the Speedment initializer.

The SingletonStream is a good candidate for stack allocation using the JVM's Escape Analysis (read more on Escape Analysis in my previous posts here and here). The implementation comes in two shapes. if we set the STRICT value to true, we will get a completely lazy Stream, but the drawback is that we will lose the Singleton Property once we call some Intermediate Operations like .filter(), map() etc. If we, on the other hand, set the STRICT value to false, the SingletonStream will perform many of the Intermediate Operations eagerly and it will be able to return a new SingletonStream thereby retaining the Singleton Property. This will give better performance in many cases.

The solution devised here for reference streams could also easily be modified to the primitive incarnations of singleton streams. So, it would be almost trivial to write a SingletonIntStream, a SingletonLongStream and a SingletonDoubleStream. Here is a SingletonLongStream.

It should be noted that the class could be further developed so it could support lazy evaluation while still being always high performant. This is a future work.

Performance

There are many ways one could test the performance of the SingletonStream and compare it with a standard Stream implementation with one element.

Here is one way of doing it using JMH. The first tests (count) just counts the number of elements in the stream and the second tests (forEach) does something with one element of a stream.

@Benchmark
public long singletonStreamCount() {
    return SingletonStream.of("A").count();
}

@Benchmark
public long streamCount() {
    return Stream.of("A").count();
}

@Benchmark
public void singletonStreamForEach() {
    SingletonStream.of("A")
        .limit(1)
        .forEach(blackHole());
}

@Benchmark
public void streamForEach() {
   Stream.of("A")
        .limit(1)
        .forEach(blackHole());
}

private static <T> Consumer<T> blackHole() {
    return t -> {};
}


This will produce the following result when run on my MacBook Pro laptop:
...
Benchmark                               Mode  Cnt           Score   Error  Units
SingletonBench.singletonStreamCount    thrpt        333419753.335          ops/s
SingletonBench.singletonStreamForEach  thrpt       2312262034.214          ops/s
SingletonBench.streamCount             thrpt         27453782.595          ops/s
SingletonBench.streamForEach           thrpt         26156364.956          ops/s
...

That's a speedup factor over 10 for the "count" operation. For the "forEach" operation, it looks like the JVM was able to completely optimize away the complete code path for the SingletonStream.

Test It

Download Speedment using the Speedment initializer.

The complete test class is available here.

Conclusions

The SingletonStream works more or less as an extended Optional and allows high performance while retaining the benefits of the Stream library.

You can select two versions of it by setting the STRICT value to your preferred stringency/performance choice.

The SingletonStream could be further improved.

Thursday, October 4, 2018

Blow up Your JUnit5 Tests with Permutations

Blow up Your JUnit5 Tests with Permutations

Writing JUnit tests can be a tedious and boring process. Learn how you can improve your tests classes using permutations in combination with TestFactory methods and DynamicTest objects with a minimum of coding effort.

In this article, I will use the Java stream ORM Speedment because it includes a ready-made Permutation class and thereby helps me save development time. Speedment otherwise allows database tables to be connected to standard Java streams. Speedment is an open-source tool and is also available in a free version for commercial databases.

Testing a Stream

Consider the following JUnit5 test:

@Test
void test() {

    List<String> actual = Stream.of("CCC", "A", "BB", "BB")
        .filter(string -> string.length() > 1)
        .sorted()
        .distinct()
        .collect(toList());

    List<String> expected = Arrays.asList("BB", "CCC");

    assertEquals(actual, expected);
}

As can be seen, this test creates a Stream with the elements “CCC”, “A”, ”BB’ and “BB” and then applies a filter that will remove the “A” element (because its length is not greater than 1). After that, the elements are sorted, so that we have the elements “BB”, “BB” and “CCC” in the stream. Then, a distinct operation is applied, removing all duplicates in the stream, leaving the elements “BB” and “CCC” before the final terminating operator is invoked whereby these remaining elements are collected to a List.

After some consideration, it can be understood that the order in which the intermediate operations filter(), sorted() and distinct() are applied is irrelevant. Thus, regardless of the order of operator application, we expect the same result.

But, how can we wite a JUnit5 test that proves that the order is irelevant for all permutations without writing individual test cases for all six permutations manually?

Using a TestFactory

Instead of writing individual tests, we can use a TestFactory to produce any number of DynamicTest objects. Here is a short example demonstrating the concept:

@TestFactory
Stream<DynamicTest> testDynamicTestStream() {
    return Stream.of(
        DynamicTest.dynamicTest("A", () -> assertEquals("A", "A")),
        DynamicTest.dynamicTest("B", () -> assertEquals("B", "B"))
    );
}

This will produce two, arguably meaningless, tests named “A” and “B”. Note how we conveniently can return a Stream of DynamicTest objects without first having to collect them into a Collection such as a List.

Using Permutations

The Permutation class can be used to create all possible combinations of items of any type T. Here is a simple example with the type String:

Permutation.of("A", "B", "C")
            .map(
                is -> is.collect(toList())
            )
            .forEach(System.out::println);

Because Permutation creates a Stream of a Stream of type T, we have added an intermediary map operation where we collect the inner Stream to a List. The code above will produce the following output:

[A, B, C]
[A, C, B] 
[B, A, C] 
[B, C, A] 
[C, A, B] 
[C, B, A]

It is easy to prove that this is all the ways one can combine “A”, “B” and “C” whereby each element shall occur exactly one time.

Creating the Operators

In this article, I have opted to create Java objects for the intermediate operations instead of using lambdas because I want to override the toString() method and use that for method identification. Under other circumstances, it would have sufficed to use lambdas or method references directly:

UnaryOperator<Stream<String>> FILTER_OP = new UnaryOperator<Stream<String>>() {
    @Override
    public Stream<String> apply(Stream<String> s) {
        return s.filter(string -> string.length() > 1);
    }

    @Override
    public String toString() {
        return "filter";
    }
 };


UnaryOperator<Stream<String>> DISTINCT_OP = new UnaryOperator<Stream<String>>() {
    @Override
    public Stream<String> apply(Stream<String> s) {
        return s.distinct();
    }

    @Override
    public String toString() {
        return "distinct";
    }
};

UnaryOperator<Stream<String>> SORTED_OP = new UnaryOperator<Stream<String>>() {
    @Override
    public Stream<String> apply(Stream<String> s) {
        return s.sorted();
    }

    @Override
    public String toString() {
        return "sorted";
    }
};

Testing the Permutations

We can now easily test the workings of Permutations on our Operators:

void printAllPermutations() {

     Permutation.of(
        FILTER_OP,
        DISTINCT_OP,
        SORTED_OP
    )
    .map(
        is -> is.collect(toList())
    )
    .forEach(System.out::println);
}

This will produce the following output:

[filter, distinct, sorted]
[filter, sorted, distinct]
[distinct, filter, sorted]
[distinct, sorted, filter]
[sorted, filter, distinct]
[sorted, distinct, filter]

As can be seen, these are all permutation of the intermediate operations we want to test.

Stitching it up

By combining the learnings above, we can create our TestFactory that will test all permutations of the intermediate operations applied to the initial stream:

@TestFactory
Stream<DynamicTest> testAllPermutations() {

    List<String> expected = Arrays.asList("BB", "CCC");

    return Permutation.of(
        FILTER_OP,
        DISTINCT_OP,
        SORTED_OP
    )
        .map(is -> is.collect(toList()))
        .map(l -> DynamicTest.dynamicTest(
            l.toString(),
            () -> {
                List<String> actual = l.stream()
                    .reduce(
                        Stream.of("CCC", "A", "BB", "BB"),
                        (s, oper) -> oper.apply(s),
                        (a, b) -> a
                    ).collect(toList());

                assertEquals(expected, actual);
            }
            )
        );
}

Note how we are using the Stream::reduce method to progressively apply the intermediate operations on the initial Stream.of("CCC", "A", "BB", "BB"). The combiner lambda (a, b) -> a is just a dummy, only to be used for combining parallel streams (which are not used here).

Blow up Warning

A final warning on the inherent mathematical complexity of permutation is in its place. The complexity of permutation is, by definition, O(n!) meaning, for example, adding just one element to a permutation of an existing eight element will increase the number of permutations from 40,320 to 362,880.

This is a double-edged sword. We get a lot of tests almost for free but we have to pay the price of executing each of the tests on each build.

Code

The source code for the tests can be found here.

Speedment ORM can be downloaded here

Conclusions

The Permutation, DynamicTest and TestFactory classes are excellent building blocks for creating programmatic JUnit5 tests.

Take care not to use too many elements in your permutations.  “Blow up” can mean two different things ...

Tuesday, September 25, 2018

Java: GraalVM Database Stream Performance

Java: GraalVM Database Stream Performance

GraalVM is the new kid on the JVM block. It is an open-source Virtual Machine that is able to run many programming languages, such as Java, Rust and JavaScript, at the same time. GraalVM has also a new internal code optimizer pipeline that can improve performance significantly compared to other JVMs under some conditions. Learn how to reap the benefits of GraalVM and execute your code faster with no code modification.

What is GraalVM?

Previous JVMs, such as Oracle JVM and OpenJDK JVM (both called “HotSpot”), has been around for a long time. They have evolved considerably over time and, over the course of the decades, we have seen performance rocketed compared to the Java 1.0 JVM. Significant JVM improvements include just-in-time compiling (JIT), C2 compiler, escape analysis etc. that all contributed to this positive development. But as with all technology, they will start to plateau at some point.

GraalVM is a fresh start whereby a new internal architecture has been developed from the ground up. In particular, the JIT compiler, called Gaal, has been reworked. Unsurprisingly, the JIT compiler itself is written in Java, just like all the other GraalVM components. As it turns out, Graal is sometimes able to optimize your code better than some existing JVMs. In particular, some Stream types appear to benefit from running under Graal.

Database Stream Performance

There are a number of ways to write Java streams. The most obvious way is to use one of the built-in Java functions Stream::of or Collection::stream methods. These methods, however, requires that the elements in the Stream are present a-priori in the shape of Java objects. This means that the compiler cannot optimize them away under most conditions.

I have therefore instead chosen to use the stream based ORM tool Speedment. This tool works with a technology that pulls in database content into an in-JVM-memory snapshot and creates Java streams directly from RAM. Thus, database tables are stored off-heap, thereby potentially avoiding the creation of Java Objects. Because Graal has an improved performance optimization pipeline, it is likely that it can better optimize away temporarily intermediary stream objects. In theory, Speedment and Graal would, therefore, be a perfect fit. I was therefore very eager to test how the already extreme performance of Speedement would be affected when running under GraalVM rather than running under HotSpot.

The following Speedment database streams were used to test performance. Read more on these streams and how they work in one of my previous article that you can find here.

private static final Predicate RATING_EQUALS_PG_13 =
    Film.RATING.equal(GeneratedFilm.Rating.PG13);

private static final Comparator LENGTH_DESCENDING = 
    Film.LENGTH.reversed();

@Benchmark
public long filterAndCount() {
    return films.stream()
        .filter(RATING_EQUALS_PG_13)
        .count();
}

@Benchmark
public IntSummaryStatistics Complex() {
    return films.stream()
        .sorted(LENGTH_DESCENDING)
        .skip(745)
        .limit(5)
        .mapToInt(Film.RENTAL_DURATION.asInt())
        .summaryStatistics();
}

The following JMH output was obtained for runs under GraalVM and HotSpot respectively:

Graal:
Benchmark              Mode  Cnt         Score        Error  Units
Bench.Complex         thrpt    5   8453285.715 ± 383634.200  ops/s
Bench.filterAndCount  thrpt    5  29755350.558 ± 674240.743  ops/s

HotSpot:
Benchmark              Mode  Cnt         Score        Error  Units
Bench.Complex         thrpt    5   5334041.755 ± 176368.317  ops/s
Bench.filterAndCount  thrpt    5  20809826.960 ± 963757.357  ops/s

Being able to produce and consume over 30 million database streams per second with GraalVM/Speedment on a laptop with 4 CPU cores is quite astonishing. Imagine the performance on a server grade node with 24 or 32 CPU cores.

Here is how it looks in a chart (higher is better):



Ordinary Stream Performance

Initial tests show varying relative performance figures for built-in Java streams like Stream.of(“A”, “B”, “C”) or List::stream with various operations applied, for the different JVMs. I expect also these stream types to gain performance across the board once GraalVM has matured. Perhaps I will cover this in a future article.

Setup

The following JMH setup was used for GraalVM and HotSpot:

# Detecting actual CPU count: 8 detected
# JMH version: 1.21
# VM version: JDK 1.8.0_172, GraalVM 1.0.0-rc6, 25.71-b01-internal-jvmci-0.48
# *** WARNING: JMH support for this VM is experimental. Be extra careful with the produced data.
# VM invoker: /Applications/graalvm-ce-1.0.0-rc6/Contents/Home/jre/bin/java
# VM options: -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 8 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time


# Detecting actual CPU count: 8 detected
# JMH version: 1.21
# VM version: JDK 1.8.0_171, Java HotSpot(TM) 64-Bit Server VM, 25.171-b11
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_171.jdk/Contents/Home/jre/bin/java
# VM options: -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 8 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time

The tests above were performed on a MacBook Pro (Retina, 15-inch, Mid 2015), 2.2 GHz Intel Core i7, 16 GB 1600 MHz DDR3 with 4 CPU cores and 8 threads. As can be seen in the logs, we should be careful to draw conclusions using JMH figures for Graal as the JMH support is experimental at this time.

Give it a Spin

Use the Speedment initializer to create a Speedment project template here.

Download the latest version of GraalVM here.

The source code for the benchmarks is available here.

Feel free to reproduce the performance tests on another hardware platform and report the outcome in the comments below.

Conclusions

GraalVM seams to be a promising technology that can improve performance for certain Java stream types.

GraalVM in combination with Speedment’s in-JVM-memory acceleration can enable significant stream performance for data analytic applications.